From keld@dkuug.dk Mon Apr 12 20:12:33 1993 Received: by dkuug.dk id AA14937 (5.65c8/IDA-1.4.4j for sc22wg15); Mon, 12 Apr 1993 18:12:33 +0200 Message-Id: <199304121612.AA14937@dkuug.dk> From: keld@dkuug.dk (Keld J|rn Simonsen) Date: Mon, 12 Apr 1993 18:12:33 +0200 In-Reply-To: John C Klensin "Re: (SC22.316) characters in C - forwarded" (Apr 12, 13:54) X-Charset: ASCII X-Char-Esc: 29 Mime-Version: 1.0 Content-Type: Text/Plain; Charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Mnemonic-Intro: 29 X-Mailer: Mail User's Shell (7.2.2 4/12/91) To: John C Klensin Subject: Re: (SC22.316) characters in C - forwarded Cc: sc22@dkuug.dk, sc22wg15@dkuug.dk, sc22wg14@dkuug.dk John C Klensin writes: > Keld, > > I'm going to try to keep this short, rather than review several of your > comments in detail. Oh? I have not written any comments, previous to this one. I have just tried to forward messages regarding Johans message between the email lists of SC22 and WG14 and WG15. Please do not read any of the previous messages as representing my opinion or even Danish Standards. > I read Johan's piece --which I thought was very helpful-- not as > demanding an "obvious" change in existing standards, nor as demanding > that one accept his model and terminology. Instead, I saw it as saying > "here is this terminology problem that may be confusing a lot of us, we > should at least be clear about it". Johans comments were comments accompanying two formal "no" votes on very important documents, so we have to deal with it seriously. > I think it is "obvious" that there is confusion about terminology and > relationships in this area--among character set experts, among > technically sophisticated experts in other areas who have to work with > character set issues, and among educated but non-expert users-- and I > think it is obvious to anyone who has watched (and listened to) > character set discussions in multiple forums for more than a few months. > The only people I've encountered who think that the issues, or even the > definitions, are completely clear, seem to have looked at the issues > from a single, narrow, point of view only. You know from other conversations that I also think that the "character" concept needs further refinement. For the time being, I think we in SC22 should stick to the SC2 terminology, and refine it if nessecary. That is, in principle I agree with Johans comments, and Johan and I have also discussed it earlier. WG15 has a resolution and an action item on this subject (from Reading meeting October 1992): WG15> RESOLUTION 92-223 9945 multibyte/wide character handling WG15> WG15> Whereas the current ISO/IEC 9945-1 (POSIX.1) does not support any APIs for WG15> multibyte/wide character handling that are defined by ISO/IEC 9899 (C Language), WG15> and WG15> WG15> Whereas the DIS 9945-2 (POSIX.2) does specify generic character handling features WG15> based upon a character definition that "a character means a sequence of one or more WG15> bytes representing a single symbol", and WG15> WG15> Whereas an amendment to ISO/IEC 9899 is scheduled in 1993, in which Multibyte WG15> character Support Extensions (MSE) are proposed to provide a set of functions for WG15> multibyte/wide character handling, aiming at improvement of worldwide portability WG15> of C programs that need generic character handling capabilities, and WG15> WG15> Whereas the CD 9945-2 Ballot Dispositions and the POSIX.2b Draft has indicated WG15> that certain extensions will be needed in conjunction with the proposed ISO C MSE WG15> and its derivatives in the POSIX environment, and an API part of which should be WG15> included in a future amendment to 9945-1, WG15> Therefore, SC22/WG15 requests that the US: WG15> WG15> 1. Consider the LIS and language-binding interface changes necessary to WG15> handle character-oriented features as symbol and not storage patterns WG15> for a future revision of 9945-1. WG15> WG15> 2. Inform SC22/WG15 of any plans for supporting such features in future WG15> revisions of all parts of the 9945 Standard. WG15> WG15> Action 9210-32: RIN Lead Rapporteur: Investigate the production of guidelines WG15> for standards developers for the usage of the terms character WG15> and byte in the definition of interfaces, with especial attention WG15> to the internationalisation issues arising from character-based WG15> interfaces. So I think the issue is well in hand with WG15 POSIX, they are working on solving it. For WG14 the problem is not recognized. My personal belief is that it was wrong to introduce multibyte and widechar concepts in the ISO C standard, and there were other proposals on character set handling which were closer to the SC2 terms when the standard was written. The Japanese have addressed this problem in WG14 many times. Maybe the problem can be solved by the revision of 9899. It is not in the scope of the addendum to solve it. In WG21 C++ they have inherited the C character concepts, but they are (as I see it) also trying to get away from the C concepts (like POSIX) with a new string class. > Finally --I can't resist-- ANSI X3/TR-1-82 isn't a normative document. As said, I have not been using ANSI X3/TR-1-82 in any discussion. I think it was used in Doug Gwyn's contribution. > john > > Note that, to a degree greater than usual, these comments probably do > not represent the positions of the UN Member Body. So the UN became a member body of ISO? Now we can have truly international standards :-) Keld