From keld@dkuug.dk Sat Sep 21 20:55:21 1991 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA17945; Sat, 21 Sep 91 20:55:21 +0200 Date: Sat, 21 Sep 91 20:55:21 +0200 From: Keld J|rn Simonsen Message-Id: <9109211855.AA17945@dkuug.dk> To: donn@hpfcrn.fc.hp.com Subject: Re: (wg15rin 146) request for "ident" specification part in LC_CTYPE Cc: hlj@posix.com, wg15rin@dkuug.dk X-Charset: ASCII X-Char-Esc: 29 Donn writes: > 2) However, I don't think that this is the right time to introduce a new > (and potentially very controversial) feature. (I don't think the > *need* is controversial in Europe (although it might be here, I'm > sorry to say)). However, the requirement for implementation and > working out the secondary effects WILL be controversial. > > CD 9945-2.2 is nearly on its way to ISO, and this could cause a 2.3, > with lots of further delay as the details are worked out. I don't > think anyone wants that. I do not believe it could cause a 2.3, a CD document is meant to be something that can be changed and this feature is not a change to existing specification, but an additional orthogonal issue. DS will be happy to accept such an inclusion of enhanced feature in a forthcoming DIS, and I think there would be not much quarreling from other national member bodies. This extension is clean and simple. If DS is not allowed to make such suggestions for this CD, I would find the whole ISO ballotting needless. DS has been quite involved in POSIX I18N, but have not had timely access to the latest 2 IEEE drafts. And most other national member bodies are in a much worse situation than DS, just having the chance to look at the first DP of 9945-2, which was something like draft 8. A lot has happened since then, and IEEE should be willing to accept international input from ISO member bodies, who actually have the resonsibility to make this into an international standard. > As such, I recommend that it be deferred into 1003.2b, which will > reflect itself in ISO as corrections applied during the DIS balloting. > (Technically, some of this may have to come thru the PDAD route, and > be folded into the final IS, but that's a technicality, not intent.) The problem is also that the "ident" specification is required in other ISO standardisation work, like C and C++, and postphoning a definition of "ident" to POSIX.2b will severely delay the emergence of these standards. > 3) I have a technical pick at it as well... Identifier characters almost > universally follow the following rules: > > Leading alphabetic (or certain special characters, sometimes), > followed by > alpahbetics, digits, and sometimes additional special characters. > > However, the special characters vary from language to language (some > allow $, some don't; some don't even allow _ ). A few allow ".", > however many more use "." as an operator. (C in one way, perl in a > very different way, etc.) Some use _ as an operator. (I forget where > I've seen it used as the concatenation operator.) > > I claim that the set of special characters in an identifier is a > property of the programming language, not of the natural language. So there will be a C locale, a POSIX locale, a perl locale, an Ada locale ...., with possible national variations. This is certainly within the scope of locale usage. > I also claim that the set of alphabetics is a property of the natural > language (646 overloading ignored for the moment). Is it? I will claim the opposite. A Greek alpha is an alphabetic in Danish, and so is Cyrillic characters and the rest. > Thus, rather than "ident", the already existing "alpha" should be the > means by which the characters are added to the legal characters in an > identifier. Some would also allow special characters to be part of the name. These are certainly not alpha's. > (It might be reasonable to be able to somehow inform the complier that > although your data is in a non-POSIX locale, the program is not; I don't > see that ident addresses that too well because it would require changing > locales to compile, and gets REALLY sticky in interpreters.) The two languages in question have a notion of compile-time and runtime locales. That should suffice. Yes, there is a need to do further work on locales, so you may have more than one locale (non-global locales) in a process. X/Open & UniForum JIG is working hard on these issues, as X Windows badly needs this. > I havn't fully thought out the issue of Hindi digits in identifiers yet; > is that even a concern? Is a leading Hindi digit legal in an identifier? > Do programs deal with numbers represented as constants in Hindi digits? > What about Japanese numbers? I do not know about that, maybe the Japanese can tell us more. > And, of course, appropriate wording about the portability of the program > to different locales would need to be developed. Yes, but that is in the hands of WG14 and WG21. > All these issues would need to be worked out before it was approved; > I think that the proper timeframe for that is "not now". WG14 and WG21 would like something to build upon, so there is support for their specifications in POSIX. I find it quite important to do the definition of "ident" in POSIX now. Keld