October 16 1992
Milos Lalovic
Short symbolic character names represent an important element of the locale definition syntax. Many national standards groups have already defined their national locales using their own choice of short symbolic character names. It is therefore, impossible to define standard short symbolic character names that will satisfy everyone's taste, and preserve investment in already defined locales. The solution is to allow national standards groups to define the national standard locales using the short symbolic character names of their choice, provided that locale definitions are accompanied by a reference table that uniquely and unambiguously describes the short symbolic character names in terms of ISO 10646 hexadecimal identifiers. An ISO 10646 hexadecimal identifier has the following form: <Uxxxxxxxx>, where "xxxxxxxx" represents eight hexadecimal digits expressing the code point value of the corresponding ISO 10646 character in canonical form.
Most standard character sets are already defined in ISO 10646, so most short symbolic character names will have a corresponding ISO 10646 hexadecimal identifier. In cases where a standard character set has not yet been defined in ISO 10646, the ISO 10646 hexadecimal identifier will be substituted by the name of the standard character set, followed by a string of hexadecimal digits representing the code point value in the standard character set (e.g. <ISO6429_xx> ).
This method does not require the use of ISO 10646 character encoding scheme, only the ISO 10646 hexadecimal identifiers are required.
The following is the syntax for the reference table:
<Uxxxxxxxx> <short-name>
"blank" would be the separator (or possibly the horizontal tab) and the entry would be terminated by a new line character.
If there is more than one short name for a given ISO 10646 hexadecimal identifier there would be one entry for each short name, e.g.
<Uxxxxxxxx> <short-name1>
<Uxxxxxxxx> <short-name2>
Currently only UCS-2 form of ISO 10646 has been assigned, so all ISO
10646 hexadecimal identifiers will ahve four leading zeros. This
may be avoided if the syntax is extended to allow an alternate form for
ISO 10646 hexadecimal identifiers that will have only four hexadecimal
digits following the letter U (e.g. <Uxxxx> ). There is no danger
of ambiguity since the identifiers with four hexadecimal digits are synonyms
for the eight digit identifiers with four leading zeros.