From donn@hpfcrn.fc.hp.com Fri Sep 20 16:31:49 1991 Received: from relay.hp.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA07168; Fri, 20 Sep 91 16:31:49 +0200 Received: from hpfcrn.fc.hp.com by relay.hp.com with SMTP (16.6/15.5+IOS 3.13) id AA27476; Fri, 20 Sep 91 07:31:46 -0700 Received: from hpfcdonn.fc.hp.com by hpfcrn.fc.hp.com with SMTP (16.7/15.5+IOS 3.22) id AA04623; Fri, 20 Sep 91 08:31:26 -0600 Message-Id: <9109201431.AA04623@hpfcrn.fc.hp.com> To: Keld J|rn Simonsen Cc: wg15rin@dkuug.dk, hlj@posix.com Subject: Re: (wg15rin 146) request for "ident" specification part in LC_CTYPE In-Reply-To: Your message of "Thu, 19 Sep 91 21:23:31 N." <9109191923.AA04492@dkuug.dk> Date: Fri, 20 Sep 91 08:31:24 MDT From: Donn Terry X-Charset: ASCII X-Char-Esc: 29 Concerning Keld's note below... 1) I agree with the sentiment completely... programming languages (big or little) should support native character sets. 2) However, I don't think that this is the right time to introduce a new (and potentially very controversial) feature. (I don't think the *need* is controversial in Europe (although it might be here, I'm sorry to say)). However, the requirement for implementation and working out the secondary effects WILL be controversial. CD 9945-2.2 is nearly on its way to ISO, and this could cause a 2.3, with lots of further delay as the details are worked out. I don't think anyone wants that. As such, I recommend that it be deferred into 1003.2b, which will reflect itself in ISO as corrections applied during the DIS balloting. (Technically, some of this may have to come thru the PDAD route, and be folded into the final IS, but that's a technicality, not intent.) 3) I have a technical pick at it as well... Identifier characters almost universally follow the following rules: Leading alphabetic (or certain special characters, sometimes), followed by alpahbetics, digits, and sometimes additional special characters. However, the special characters vary from language to language (some allow $, some don't; some don't even allow _ ). A few allow ".", however many more use "." as an operator. (C in one way, perl in a very different way, etc.) Some use _ as an operator. (I forget where I've seen it used as the concatenation operator.) I claim that the set of special characters in an identifier is a property of the programming language, not of the natural language. I also claim that the set of alphabetics is a property of the natural language (646 overloading ignored for the moment). Thus, rather than "ident", the already existing "alpha" should be the means by which the characters are added to the legal characters in an identifier. (It might be reasonable to be able to somehow inform the complier that although your data is in a non-POSIX locale, the program is not; I don't see that ident addresses that too well because it would require changing locales to compile, and gets REALLY sticky in interpreters.) I havn't fully thought out the issue of Hindi digits in identifiers yet; is that even a concern? Is a leading Hindi digit legal in an identifier? Do programs deal with numbers represented as constants in Hindi digits? What about Japanese numbers? And, of course, appropriate wording about the portability of the program to different locales would need to be developed. All these issues would need to be worked out before it was approved; I think that the proper timeframe for that is "not now". Donn ---------------------- Hello WG15RIN! (Hello Hal!) At recent ISO/IEC JTC1/SC22/WG14 (programming language C) and ISO/IEC JTC1/SC22/WG21 (Programming language C++) meetings it has been decided to allow identifiers in these languages to include characters dependent on the translation-time locale, e.g. for which a function is_ident() has a return value of "true". Danish Standards would like to have support for this in POSIX.2 locales, and this should also be reflected in the "small" programming languages defined in POSIX, so that characters belonging to the "ident" class would be allowed in identifiers in these "small" programming languages (sh, awk etc.). So DS proposes a new standard class in the LC_CTYPE section of the locale, for inclusion in POSIX.2 section 2.5.2.1: ident: Define characters to be classified as identifier characters (allowed in identifiers in a programming language or the like). Characters specified for the keyword alpha shall automatically belong to this character class. Keld Simonsen