From donn@hpfcrn.fc.hp.com Fri Sep 20 16:07:33 1991 Received: from relay.hp.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA06276; Fri, 20 Sep 91 16:07:33 +0200 Received: from hpfcrn.fc.hp.com by relay.hp.com with SMTP (16.6/15.5+IOS 3.13) id AA26922; Fri, 20 Sep 91 07:07:28 -0700 Received: from hpfcdonn.fc.hp.com by hpfcrn.fc.hp.com with SMTP (16.7/15.5+IOS 3.22) id AA04427; Fri, 20 Sep 91 08:07:07 -0600 Message-Id: <9109201407.AA04427@hpfcrn.fc.hp.com> To: Keld J|rn Simonsen Cc: greger@ism.isc.com, hlj@posix, wg15rin@dkuug.dk Subject: Re: (wg15rin 145) Re: Ballot resolution In-Reply-To: Your message of "Thu, 19 Sep 91 20:42:50 N." <9109191842.AA03246@dkuug.dk> Date: Fri, 20 Sep 91 08:07:06 MDT From: Donn Terry X-Charset: ASCII X-Char-Esc: 29 In response to Keld in response to me.... >> In the case of the $ (dollar and "twinkle") and # (pound and pound sterling) >> cells, there are only two possible characters authorized by ISO. No >> other substitutions are legal. >That is true. However in the EBCDIC world the codes for these characters >are shifted around for many places in the code in the various national >EBCDICS, and that may cause problems. The EBCDIC world is big and >of increasing importance to POSIX after the recent IBM announcement of >POSIX support for MVS. However, if the EBCDICs are shuffled, a translation of most files will be needed anyway, and I don't see why charmap and localedef files would be an exception. (There is simply no reason to *expect* portability of files to machines running different character sets, no matter how similar they are.) I presume that all EBCDICs contain #. There is a fundamentally different problem for EBCDIC than there is for 646: EBCDIC doesn't (to my knowledge) overload in the same way 646 does, so that a (reletively common) character such as # is replaced with another in some contexts. Translation may be necessary, but it's 1:1, rather than 1:2 or 1:0 depending on how you look at it. >> It seems unnecessary to me to introduce a mechanism that costs extra, >> that in an ideal (no 646 character set problems) would not be needed, >> and is subject to abuse. (How about setting the comment character to >> "a", just for grins; let's confuse everyone! That's a job for Obfuscated >> C.) >> Since for the comment character there are exactly two possibilities, my >> original objection, and the one I'd like to see accepted, is that >> EITHER character, or both if they appear in the current character set, >> is the comment character. >Yes, this is also a line of thoughts that DS have been following. >it will need that these characters always be valid as the comment >character, also in e.g. ISO 8859-1 where all of these symbols have >a code. There is thus both a code for "number sign" and "pounds >sterling" (or escudos as the Portuguese say:-). The localedef program >should then accept both. >Another thing is for portability, and where automatic conversion >happens between the ASCII/8859 and EBCDIC worlds - the "number sign" >and the other cause problems. You may very well risk a program sent >by email in this world to be screwed up when it is received (this would >happen in Denmark for instance) and it would then be more portable >to be able to specify a comment character that was (EBCDIC) invariant. How about being more concrete... are you saying that the (net) translation from 646/8859 to/from EBCDIC is wrong? (Or is it that a translation from 646 (ASCII) to (US) EBCDIC is actually occurring when a translation from Danish 646 to Danish EBCDIC is what should be occurring?) >> This has the advantages that: >> Users have a constant comment character (or at worst two). >> Translation between character sets is simplified (at least >> in that case) because it often goes across as a bit pattern. >Yes, but often it does not just go over as a bit pattern. >And in those cases we create a portability problem. ASCII won't go over to EBCDIC as bit patterns, period. I don't see why translation shouldn't be expected in all cases where the character set changes. (Sure, it's nice to be able to get away with being sloppy about translations, but supporting such sloppyness is not a goal of standardization that I can identify, particularly when that support has a future cost while addressing a current (and expected to be temporary) problem.) >> In addition, translation is simplified because you don't have >> the problem of dealing with the situation where you are translating >> from a character set that has only one (e.g. any 646 set) to one >> that has both. You can translate # to either # or sterling, >> and it will still work. >True. The use DS have done however of the comment-char specification >is to specify an 646 invariant and EBCDIC invariant character, to >improve portability by actually eliminating character set conversion >problems. >> I wish there were as simple a solution to the problem for backslash, but >> since it is unrestricted national usage, there doesn't seem to be one >> that doesn't step on someone's toes. (I'd rather that 646 just went away >> completely in favor of 8859 or better.) >One way of getting away with 646 and all its national variants >is to provide good support for 8859 and better, and we should work >further in RIN and other places to faciliate this. I agree, however as long as we keep doing things to accomodate 646, there won't be much reason to go to 8859. Many vendors already support 8859 (or something like it, e.g. EBCDIC or Roman 8), and it won't be long until (nearly) all new systems do. The support for 646 seems to be addressed more to users of existing hardware than it is to any vendor changes. As such, until users get rid of their existing hardware, the issue won't be resolved, and one way to encourage that is to stop encouraging 646! >Anyway the current wordings does not remove an "expensive" construct >like the "comment_char" - it is still needed to be POSIX compliant. >The use is just discouraged - in some vaguely defined instances. I certainly agree with this, and that's why I said I might object to the current (11.2) wording more than I do to the old (11.1) wording. (I still havn't made up my mind.) >I do not find "comment-char" very expensive to implement, and the >specification is not a lot of lines either. It's technically not that expensive, I agree, but it has secondary costs (in terms of support and abuse) that seem to be expensive. A standard shouldn't go overboard in doing things that aren't really necessary. >I would be happier, tho, and everything would be simpler, if we chose >an invariant non-EBCDIC-problem character like percent-sign as >the comment character. The localedef/charmap syntax does not need >many metacharacters, and the ones used could be chosen with care >for good engineering results in portability. I do not think we have >a long historic tradition (for localedef/charmap) to take into >account, like we have for the shell. I agree that there isn't much history for these files. However there are lots and lots of well-trained fingers that think that comments are spelled # or /* .. */. Introducing a new convention seems to be very poor erganomics. Donn