ISO/IEC SC22/WG15 N657 Title: Data specification format for transliteration and transcription Source: Keld Simonsen Distribution: WG15, WG15 RIN, (possibly IEEE .2b group) Status: expert contribution In the following a format for describing translitteration and transscription is given. The format is intended to be included in POSIX locales. The format allows for cultural dependent transliteration and transcription both dependent on the culture and language it transforms from and the culture it transforms into. It was considered whether a more elaborate transscription could be specified, but it was recognized that beyound the facilities described here the transcription specification should be based on a database. LC_TRANS section: Transformation of characters, suitable for fallback in coded character set conversion, transliteration and simple transliteration can be specified with the following syntax in the LC_TRANS section of the locale: The following keywords shall be recognized in the transformation definition. They are described in detail in the following subclauses. transform_start The name of the culture to transform from, if no culture is specified the transformation is the default transformation. The "transform" keyword is followed by one or more transformation statements assigning character transformation values to transformating elements, and include statements copying transformation specifications from other locales. transform_end The end of the transformation statements. include The name of the locale in text form and culture to transform from and the repertoiremap for the locale to be used for the definition of this category. Other specifications may follow to replace specification of the copied locale. This keyword is optional. Transform_start keyword The "transform_start" keyword shall precede transformation statements and "include" statements. It defines the culture to be transformed from. The syntax of the "transform_start" keyword shall be: "transform_start %s\n", If no operand is given, this is the default transformation. Transform_end keyword The transformation entries shall be terminated by the "transform_end" keyword. Transformation statements The "transform_start" keyword may be followed by transformation identifier entries, The syntax for the transformation identifier entries is: "%s %s;%s;...;%s\n",,, ,... Each shall consist of one or more characters (in any of the forms defined in POSIX-2 2.5.5 ). The order the transformtion-strings is defined in defines the precedence of transformations, the first transformation-string that satisfies the transformation by for example having characters that are all in the coded character set that is transformed into and having the desired string length, is chosen. If more than one transformation statement is given for a given this is an error, unless the C-option is given - then a warning is given and the last transformation statement is assumed. A transformation statement may be terminated by a trailing followed by a number of characters and a character. Example: ;;;"" ; The first line defines a number of transformations for the LATIN LETTER AE, including into LATIN LETTER A WITH DIAERESIS, GREEK LETTER EPSILON, the two Latin letters A and E, and finally the LATIN LETTER E. The second line defines transformation of the LATIN LETTER S into GREEK LETTER SIGMA, and CYRILLIC LETTER ES. The 3rd line transforms the two Latin letters K and O into the Japanese Hiragana character KO Include keyword The "include" keyword specifies a set of transformation statements in text form to be included in the current transformation. The syntax of the "include" statement is: "include %s;%s;%s\n",,, os a string identifying the locale to be included from. is a string identifying the repertoiremap used in the locale being included, and is used to map character specifications from the locale into the current locale. specifies the transformation specification in the transformation section of the included locale, where the transformation specification with the same is included. This operand is optional, and if omitted, the default transformation is included. Keld