From Clive@cisga48.demon.co.uk Wed Jul 10 19:21:45 1996 Received: from relay-4.mail.demon.net (relay-4.mail.demon.net [158.152.1.108]) by dkuug.dk (8.6.12/8.6.12) with SMTP id TAA20159; Wed, 10 Jul 1996 19:21:38 +0200 Received: from post.demon.co.uk ([158.152.1.72]) by relay-4.mail.demon.net id bo01780; 10 Jul 96 17:21 GMT Received: from cisga48.demon.co.uk ([194.159.208.70]) by relay-3.mail.demon.net id aa02589; 10 Jul 96 17:49 +0100 Message-ID: Date: Wed, 10 Jul 1996 00:36:29 +0100 To: Keld J|rn Simonsen Cc: sc22wg15@dkuug.dk, sc22wg14@dkuug.dk From: "Clive D.W. Feather" Reply-To: "Clive D.W. Feather" Subject: Re: (SC22WG14.2683) WG14 N586: POSIX Alignment In-Reply-To: <199607071531.RAA25002@dkuug.dk> MIME-Version: 1.0 X-Mailer: Turnpike Version 1.10 <81yImECxEkLwWotvkdN7a29E6a> Keld J|rn Simonsen writes >This paper gives proposals for changes to the C standard to align it with >the POSIX standards POSIX System API (C language) (POSIX-1), and ISO/IEC >9945-2:1993 POSIX Shell and Utilities (POSIX-2). It strikes me that this proposal goes more than just aligning with these parts of POSIX. To me, aligning means ensuring there are no incompatibilities, not importing pieces into C. That doesn't mean that the proposals are necessarily bad, of course. In many cases, seeing proposed wording would help. >7.3.1 Character testing functions > >POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of >the characters and . This character class should be added, >possibly by adding a function isblank() that is similar to the isspace() >function except that the test is for a standard blank character, and the >characters covered initially only are space (' ') and horizontal tab >('\t'). Similary a function iswblank() should be added. Can you give a POSIX-independent description of "blank" ? All the other classes have such descriptions. How does it differ from isspace ? What is its definition in the "C" locale ? >7.3.2 Character case mapping functions > >C has no statement on locale dependence, nor how the correspondence is >defined. This can be described by adding: "as specified by the current >locale" to both the toupper() and tolower() descriptions, so it reads (for >tolower): > >If the argument is a character for which isupper is true and there is a >corresponding character as specified by the current locale for which >islower is true, the tolower function returns the corresponding character; >otherwise, the argument is returned unchanged. I'm not sure that the introduction to 7.3 doesn't already cover this, but I'm happy to have the wording clarified. To be consistent, there ought to be something about the mapping being defined as the obvious one when in the "C" locale. >7.4 Localization > >The POSIX-2 standard was approved after adoption of the C standard, and it >contains a format for specifying locales and accompanying charmaps. This is >a valuable and standardized way of specifying locales, on the other hand >many C compilers do not operate under a POSIX operating system. It is >proposed to add in 7.4 after the macro (LC_ALL etc) section: > >"POSIX-2 specifies locale and charmap formats that may be used to specify >locales for C." This doesn't belong in the Standard. Perhaps a reference could be given in the informative Annex giving the bibliography, or it could be in the Rationale material. >7.4.1 Locale control > >POSIX-2 adds a new category LC_MESSAGES to setlocale() in B.11; with two >strings yesexpr and noexpr, it is further meant to invoke the right >messages corresponding to a locale. > >This should be added in the macros description in 7.4: > > LC_MESSAGES > >and in 7.4.1 in the first section add at the end: "LC_MESSAGES may be used >to identify messages." There is no proposal at this time to specify further >functionality, but yesexpr and noexpr should be included as strings in >struct lconv (7.4), and further described somewhere, possibly in a new >section. There is no way a strictly conforming program can make use of any of this. There is no proposal to provide functions to make use of these strings, and therefore they don't belong in the C standard. We are already aligned with POSIX to the extent that a conforming implementation can provide all of this if it wants, so I don't see why we need to do anything else. >7.4.1.1 setlocale() > >There is no reference to standardized locales, except the "C" locale. A >reference to registered locales of the international cultural registry >should be done: > >Insert after the sentence "A value of "C" for locale specifies the minimal >environment for C translation..." : >"Locales that start with the string "std/" references POSIX locale entries >in the international cultural register, CEN ENV 12005". > >A similar proposal has been done to the POSIX WG. This just does not belong here in this form. As worded, it isn't possible to construct a conformance test for it, so it shouldn't be there. I can see three possible meanings: (1) This is intended as a suggestion of good practice. Put it in the Rationale. (2) It means "all locales in CEN ENV 12005 must be provided, and are given names of the form ...". I will resist this bitterly, and I think many others will as well. (3) It means "if a locale has a name beginning with 'std/', it will be the one in CEN ENV 12005; an implementation need not provide any such locales". In other words, reserving the namespace for this use. This is not unreasonable, but since it is not possible to enumerate the locales available, it's not actually much use. *** Side issue: do we need the following functions added in C9X ? If so, I am willing to work on a proposal. (A) a function to enumerate the locales available; (B) a function to state whether two locales are equivalent in some sense. >7.4.2 Numeric formatting > >In POSIX-2 all strings values of the monetary/numeric specifications can be >with multiple characters. It is proposed to clarify this, by in the 2nd >paragraph after "The members of the structure with type char * are pointers >to strings, any of which" add: "may be more than one character and" and >change the following "can" to a "may". Members decimal_point and thousands_sep are described as being "[a] character". Thus they must have length 0 or 1. The former cannot have length 0, as is explicitly stated. There are no other restrictions given on the length of any member, and so all the other members are unrestricted (including zero length). This change is not needed. >In POSIX-2 all char variables may use -1 to indicate that the value is not >available, instead of the value CHAR_MAX, so to be campatible, it is >proposed that -1 is added to the values indicating this. Thus after >"CHAR_MAX" add: "or -1". Also here change "can" to "may". If we make this change (or rather, say "(unsigned char)-1"), we break any application which has been written to the interface we have promised it. This is a Quiet Change, and I see no reason to make it. The alternative is for the code parsing POSIX locale files to do a little more work. This seems vastly preferable to me; this code has got to be written carefully anyway, and its external specifications are being left constant rather than changing under its feet. >7.4.2.1 p_sign_posn and n_sign_posn >POSIX has added a 5th value and thus it is proposed to add: >"5 A space separates the symbol and the sign string, if adjacent." POSIX is talking nonsense. Firstly, this is what [np]_sep_by_space are for. Secondly, it doesn't tell you where to put the sign relative to the symbol and value. >7.4.2.1 int_curr_symbol different from currency_symbol [...] Sounds okay to me. >7.12.3.5 strftime > >The date utility in POSIX-2 4.15 has all of the of formats C strftime() >plus more, all of which are proposed to be added to strftime: I don't have a problem with any of these. However, can both these and the ones already there, wherever possible, be expressed in terms of each other (though without loops, of course). For example, replace: >%D is replaced by the date in the format mm/dd/yy by %D is equivalent to "%m/%d/%y" Are all of these in POSIX ? If not, which ones aren't, and how likely are they ever to be ? How can we be sure POSIX won't keep changing ? >%V is replaced by the week of the year (Monday as the first day of the > week) as a decimal number (00-53). The method for determining the > week number shall be as specified in ISO 8601. Can this be replaced by a self-contained algorithm like the rest ? >A number of modified field descriptors %O and %E are also defined >(4.15.4.2). Some field descriptors can be modified by the E and O modifier >characters to indicate a different format or specification as specified in >the LC_TIME locale description. If the corresponding keyword (see era, >era_year, era_d_fmt, and alt_digits) is not specified or not supported for >the current locale, the unmodified field descriptor value shall be used. I'm not clear what's going on here. If I understand correctly, there are two issues: (1) Some locales use a calendar other than modern Gregorian (e.g. the Islamic calendar). (2) Some locales want to use a different set of 10 characters for the digits. Presumably the intent is to show these with E and O respectively. So why not just allow either of these with any descriptor ? And what's all this "alt_digits" stuff, anyway ? Or this "alternate representation": >%Ow Weekday as number in the locale's alternate representation > (Sunday=0). ? How does it differ from the previous two ? >POSIX-1 > >POSIX-1 defines the kernel interface, given in C language binding. For a >lot of functionality, it does not define the I18N functionality, but relies >on the C standard, which is included normatively with the C binding option >of the standard. There is a separate section (8) in POSIX-1 giving the >extensions defined by POSIX-1 in relation to the C standard. > >The extensions cover the following functions: setlocale, rename, getenv, >ctime, gmtime, localtime, mktime and strftime. Also fseek and exit are >specified further. > >Extensions to the time functions concern the use of the environment >variable TZ, to override system defaults. > >A number of operating system considerations is done for various C I/O >functions. > >POSIX-1 section 8 (12 pages) should be considered for technical >corrigenda, or for inclusion in amendment/revision of the C standard. From memory (it's been a long time), many of these things rightly belong in an OS standard, and not in a language one. Nothing like that (for example, fdopen) belongs here. Can we please have some specific proposals as to what you want to add ? -- Clive D.W. Feather | You should reply to Associate Director | (the Reply-To: header has been set to this); Demon Internet Limited | this account is on my laptop and is only used | occasionally when I am travelling.