From greger@iuk Wed Nov 28 21:39:31 1990 Received: from ism.isc.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA06189; Wed, 28 Nov 90 21:39:31 +0100 Received: by ism.isc.com (Sendmail5.61/1.35) id AA24843; Wed, 28 Nov 90 12:42:37 -0800 Received: from friherr by iuk.isc.com (5.61/smail2.2/11-14-88) id AA07560; Wed, 28 Nov 90 19:56:03 GMT Received: by (5.61/1.35/jcb-s) id AA01540; Wed, 28 Nov 90 20:11:14 GMT Date: Wed, 28 Nov 90 20:11:14 GMT Message-Id: <9011282011.AA01540@> To: erik%sra.co.jp@ism Cc: seki%sysrap.cs.fujitsu.co.jp@ism, wg15rin%dkuug.dk@ism, XoTGinter@xopen From: greger@ism.isc.com ("greger@ism.isc.com (Greger Leijonhufvud, ISC, High Wycombe, U.K.)") Subject: Re: (wg15rin 59) Re: Japanese Profile X-Charset: ASCII X-Char-Esc: 29 In reply to your message of Wed Nov 28 08:27:18 1990 ------- >Sekiguchi-san, >Thank you very much for forwarding the Japanese locale definition to >WG15 RIN. We were trying to write a profile for Japan. Your >contribution is very welcome at this stage. >> # Based on POSIX.2 D10 syntax with X/Open extension. >I hope you have proposed these "X/Open extensions" to the Posix >people. It would be better for X/Open and Posix to be compatible with >each other. I have not seen the current X/Open specification for this; as we have not seen it we don't know how "general" it is (see Donn Terry's latest comment). We are thinking of some possible extensions (such as the symbols for ordinal numbers, which will help in many locales (using e.g. Indian digits for date instead of Arabic in an Arabic locale). See also comment below. >> # This definition implicitly assume that underlying encoding >> # is UJIS (EUC-JIS) or similar one. (Although characters in >> # G2 and G3 are completely ignored.) The definition may not >> # work if the systems uses other encoding. >Which parts of your definition depend on the encoding? >I believe that, in general, locale definitions should be independent >of encodings. That's why we have charmaps. The charmap provides the >mapping between the symbolic names of the characters and the >codepoints. The locale definitions should only contain references to >the symbolic names, and are therefore independent of the encoding. At >least, this is my understanding of the current Posix draft. That is clearly the intent of the current draft. (We do support the actual characters, but it is not recommended). >On the other hand, I have heard rumors that some people have commented >that most implementations will probably only support one or a few >encodings, and the full generality of the charmap system will probably >be compromised. Perhaps the first implementations will not support the >charmap system completely. This is understandable, since it takes some >time to implement this new system. However, if people do not think >that the charmap system will ever be fully implemented, then I find >the very existence of this concept in the Posix draft highly >questionable. I urge the WG15 RIN members responsible for the >above-mentioned rumors to respond. This will most probably vary between implementors. Note that there is very little extra effort involved in supporting (at the system level) codesets which are based on ASCII (such as the whole 8859 family, or the IBM PC codesets). If you support one of them, you can support many; the charmaps is a very useful tool for that, and will probably be used to provide that kind of support. However, support of a codeset does also imply terminal support/mapping, which may be more restricted (Arabic terminals require more than a translation table!) Multibyte codeset support does also require other changes in the system, such as string parsing routines... As far as POSIX is concerned, it does not matter whether implementations support many or few codesets; charmaps are useful as tools for locale portability. They allow for national locales which will work regardless of whether the codeset is 8859.x or PC 850 or others. >> upper ;;;;;;;;;;;;;\ >> ;;

;;;;;;;;;;;\ >> <2341>;...;<235A>;\ >Japanese is not the only language that uses two bytes for the >representation of its characters. For example, China also uses two >bytes. So the names of the Japanese characters should contain >something that distinguishes them from the names of other characters. >Keld has suggested that we use names like "j1625" for the Japanese >characters. The numbers are in decimal, so that it is easy to compare >the names with the numbers that appear in the JIS table. I agree, my current thinking in this area is as follows: "Each non-comment line of the character set mapping definition (i.e., between the CHARMAP and END CHARMAP lines of the file) shall be in either of two forms: "%s %s %s\n",,, or "%s...%s %s %s\n",,,, In the first format, the line in the character set mapping definition defines a single symbolic name and a corresponding encoding. or two symbolic names separated by an ellipsis (...). A symbolic name is one or more characters from the set shown with visible glyphs in Table 2, enclosed between angle brackets. A character following an escape character is interpreted as itself; for example, the sequence "<\\\>>" represents the symbolic name "\>" enclosed between angle brackets. In the second format, the line in the character set mapping definition defines a range of two or more symbolic names. In this form, the symbolic names shall consist of zero or more non-numeric characters from the set shown with visible glyphs in Table 2, followed by an integer formed by one or more decimal digits. The characters preceding the integer shall be identical in the two symbolic names, and the integer formed by the digits in the second symbolic name shall be equal to or greater than the integer formed by the digits in the first name. This shall be interpreted as a series of symbolic names formed from the common part and each of the integers between the first and the second integer, inclusive. As an example, ... shall be interpreted as the symbolic names , , and in that order. ....... In lines defining ranges of symbolic names, the encoded value is the value for the first symbolic name in the range (the symbolic name preceeding the ellipis). Subsequent symbolic name defined by the range shall have encoding values in increasing order. For example, the line ... \d129\d254 shall be interpreted as \d129\d254 \d129\d255 \d130\d0 \d130\d1" The Japanese proposal included a hexadecimal form for ranges; that may cause problems in distinguishing between the "alpha" part and the numeric part. By using strict decimal notation this problem can be handled more easily. >> # Era year definition: THIS IS AN X/OPEN EXTENSION >> # This definition handles these 4 era only, i.e., HEISEI, >> # SHOWA, TAISHO and MEIJI. Years befor MEIJI are printed >> # as SEIREKI (which is ``A.D.'') or KIGENZEN (which is ``B.C.'') >> era "+:2:1990/01/01:+*:<4A3F><402E>:%N%o<472F>";\ >> "+:1:1989/01/08:1989/12/31:<4A3F><402E>:%N<3835><472F>";\ >This is all very well for Japan, but what if some African tribe wants >to define their locale and decide that they also need some kind of >"era year" system, but find that their requirements are slightly >different and are not met by this proposal? I don't mean to offend >anyone by comparing the Japanese with the Africans; I just want to >make my point absolutely clear by giving an extreme example. (Also, I >don't mean to offend the Africans by saying that this example is >extreme. :-) >If it is possible that a country other than Japan may want to have a >slightly different way of defining their era year, then I think that >this keyword should not be called "era". It is unfair for any one >country to reserve a general word like "era". >Perhaps it would be better to take "era" out of the general LC_COLLATE >rules, and add a hook to the rules for defining locale-specific rules. >I can hear all of you saying "But how can you internationalize >programs then?" Well, I think that it is likely that only Japanese >programs will use %E and %o (for the era year), so in some sense, >these programs would be localized rather than internationalized. If we can make it sufficiently general, and (in addition) it would be used also outside Japan, then it may be worthwile to put in. I do not know of any other country that uses the Gregorian calendar with a different era; that is the *only* thing we talk about here (no solar time..., no Muslim era). Maybe what we should do is to reserve the identifiers %E and %o for era-based processing, and let the national profiles handle it. Comments? -greger- >Erik -------