From keld@dkuug.dk Thu Sep 19 20:42:50 1991 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA03246; Thu, 19 Sep 91 20:42:50 +0200 Date: Thu, 19 Sep 91 20:42:50 +0200 From: Keld J|rn Simonsen Message-Id: <9109191842.AA03246@dkuug.dk> To: donn@hpfcrn.fc.hp.com, greger@ism.isc.com, hlj@posix, wg15rin@dkuug.dk Subject: Re: (wg15rin 142) Re: Ballot resolution X-Charset: ASCII X-Char-Esc: 29 Donn wrote: > Keld: > > In the case of the $ (dollar and "twinkle") and # (pound and pound sterling) > cells, there are only two possible characters authorized by ISO. No > other substitutions are legal. That is true. However in the EBCDIC world the codes for these characters are shifted around for many places in the code in the various national EBCDICS, and that may cause problems. The EBCDIC world is big and of increasing importance to POSIX after the recent IBM announcement of POSIX support for MVS. > It seems unnecessary to me to introduce a mechanism that costs extra, > that in an ideal (no 646 character set problems) would not be needed, > and is subject to abuse. (How about setting the comment character to > "a", just for grins; let's confuse everyone! That's a job for Obfuscated > C.) > > Since for the comment character there are exactly two possibilities, my > original objection, and the one I'd like to see accepted, is that > EITHER character, or both if they appear in the current character set, > is the comment character. Yes, this is also a line of thoughts that DS have been following. it will need that these characters always be valid as the comment character, also in e.g. ISO 8859-1 where all of these symbols have a code. There is thus both a code for "number sign" and "pounds sterling" (or escudos as the Portuguese say:-). The localedef program should then accept both. Another thing is for portability, and where automatic conversion happens between the ASCII/8859 and EBCDIC worlds - the "number sign" and the other cause problems. You may very well risk a program sent by email in this world to be screwed up when it is received (this would happen in Denmark for instance) and it would then be more portable to be able to specify a comment character that was (EBCDIC) invariant. > This has the advantages that: > > Users have a constant comment character (or at worst two). > > Translation between character sets is simplified (at least > in that case) because it often goes across as a bit pattern. Yes, but often it does not just go over as a bit pattern. And in those cases we create a portability problem. > In addition, translation is simplified because you don't have > the problem of dealing with the situation where you are translating > from a character set that has only one (e.g. any 646 set) to one > that has both. You can translate # to either # or sterling, > and it will still work. True. The use DS have done however of the comment-char specification is to specify an 646 invariant and EBCDIC invariant character, to improve portability by actually eliminating character set conversion problems. > I wish there were as simple a solution to the problem for backslash, but > since it is unrestricted national usage, there doesn't seem to be one > that doesn't step on someone's toes. (I'd rather that 646 just went away > completely in favor of 8859 or better.) One way of getting away with 646 and all its national variants is to provide good support for 8859 and better, and we should work further in RIN and other places to faciliate this. Anyway the current wordings does not remove an "expensive" construct like the "comment_char" - it is still needed to be POSIX compliant. The use is just discouraged - in some vaguely defined instances. I do not find "comment-char" very expensive to implement, and the specification is not a lot of lines either. I would be happier, tho, and everything would be simpler, if we chose an invariant non-EBCDIC-problem character like percent-sign as the comment character. The localedef/charmap syntax does not need many metacharacters, and the ones used could be chosen with care for good engineering results in portability. I do not think we have a long historic tradition (for localedef/charmap) to take into account, like we have for the shell. Best regards, Keld