From dominic@british-national-corpus.oxford.ac.uk Mon Sep 23 11:30:41 1991 Received: from kestrel.ukc.ac.uk by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA24055; Mon, 23 Sep 91 11:30:41 +0200 Received: from convex.oxford.ac.uk by kestrel.Ukc.AC.UK via Janet (UKC CAMEL FTP) id aa19815; 23 Sep 91 10:25 BST Received: from onions.natcorp by convex.oxford.ac.uk; Mon, 23 Sep 91 10:24:29 +0100 Received: by onions.natcorp.ox.ac.uk (4.1/SMI-4.1) id AA07472; Mon, 23 Sep 91 10:23:01 BST Message-Id: <9109230923.AA07472@onions.natcorp.ox.ac.uk> From: dominic@british-national-corpus.oxford.ac.uk Date: Mon, 23 Sep 1991 10:23:01 +0100 In-Reply-To: Keld J|rn Simonsen "(wg15rin 146) request for "ident" specification part in LC_CTYPE" (Sep 19, 21:23) X-Fax: +44 865 273275 X-Phone: +44 865 273280 X-Project: British National Corpus X-Organization: Oxford University Computing Service X-Address: 13 Banbury Road, Oxford OX2 6NN, U.K. X-Mailer: Mail User's Shell (7.2.2 4/12/91) To: Keld J|rn Simonsen , wg15rin@dkuug.dk, wg15-uk@xopen.co.uk Subject: Re: (wg15rin 146) request for "ident" specification part in LC_CTYPE Cc: hlj@posix.com X-Charset: ASCII X-Char-Esc: 29 Keld Simonsen writes: > Danish Standards would like to have support for [extended character sets in identifiers] in POSIX.2 > locales, and this should also be reflected in the "small" programming > languages defined in POSIX, so that characters belonging to the > "ident" class would be allowed in identifiers in these "small" > programming languages (sh, awk etc.). My view is that it is far too late in the ballot cycle of 1003.2 to suggest such a change, and that to insist on it now would delay 1003.2, and hence further delay 9945-2. On balance, I consider it considerably more important to publish 9945-2 in a timely manner than to delay it in order to incorporate further changes. That said, the change that DS suggests is highly desirable in the medium term, provided that it can be demonstrated: a. That the change would not break existing conforming applications (scripts). (And it might: I think I'm correct in saying that the ISO 646 IRV string ${file}name would suddenly become a legal single identifier in some national variants of IS 646 if new rules were badly drafted); and b. That applications using extended character sets in identifiers can either i. Reliably and in a standardized manner identify the locale with respect to which characters in a particular script may be identified as being part of an identifier or not. (This would presumably require some means of announcing the script's source locale and passing this information to the ``little language'' running it); or ii. Be shown to produce the same results, independent of the locale used for the identification of valid member characters for identifiers. (This seems highly unlikely.) To prepare for a. above, it would be a good idea to review the existing 1003.2 draft in order to identify and fix potential problem areas which might result in conforming scripts breaking due to the later incorporation of extended character sets in identifiers. In formulating its proposal, has DS noticed any such problem areas? If so, it should inform the development agency. The UK national position that I would suggest, and which I am canvassing by copy of this mail, is that the lack of the features proposed by DS in a future DIS 9945-2 which is otherwise acceptable should not be a reason to oppose its acceptance as an IS. However, the affirmative vote should be accompanied by a comment to the effect that provision for extended character sets in identifiers should be incorporated in the next revision of the IS, and that the UK would oppose acceptance of a future revision unless it incorporated such provisions for all relevant subsystems, or unless it can be conclusively demonstrated that such a provision is impractical in general, or in the case of particular ``little languages''. -- Dominic Dunlop