From keld@dkuug.dk Fri Feb 14 22:14:54 1992 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA13112; Fri, 14 Feb 92 22:14:54 +0100 Date: Fri, 14 Feb 92 22:14:54 +0100 From: Keld J|rn Simonsen Message-Id: <9202142114.AA13112@dkuug.dk> To: wg15@dkuug.dk Subject: Danish ballot on 9945-2 2CD X-Charset: ASCII X-Char-Esc: 29 The Danish member body votes "YES" to the ISO/IEC 9945-2 2nd DP. A general comment is that we want the document to proceed to the DIS and IS stage as soon as possible. We have the following comments: 1. date specification: Sunday is not the first day in the week, according to ISO 8601. So please change p.90 l.2702, l.2707 l.2658 l.2662 p.374 l.4025 p375 l.4056 - to reference Monday as the first day of the week. And introduce a new format in date, eg %u specifying Monday as first day. Page 374 4027: there is no week 00 in ISO 8601. So please change it to 01. 2. 8bit identifiers: We would like to use extended character sets in identifiers in the shell and in other POSIX.2 small languages. For that purpose we want an additional class in LC_CTYPE called "ident" which specifies which characters can be used in place for a-zA-Z (together with digits etc.) in identifiers. Characters in the classes punct and space should not be allowed in this class. Characters in class alpha should automatically be included. This is in accordance with general SC22 principles as outlined in ISO/IEC JTC1/SC22 N623. The proposal will help on the situation where normal letters in e.g. the Danish language is intended to be used in identifiers. Users find it strange that some letters can and some cannot be used in identifiers, and it is very culturally biased that this limitation exist in POSIX interfaces. So DS proposes a new standard class in the LC_CTYPE section of the locale, for inclusion in POSIX.2 section 2.5.2.1: ident: Define characters to be classified as identifier characters (allowed in identifiers in a programming language or the like) as an extension to the letters normally allowed. Characters specified for the keyword shall automatically belong to this character class. Characters specified from the classes ,,, shall be excluded from this character class. Each programming language etc. may allow additional characters, such as digits or underscores in identifiers, according to separate rules specified by the language. The table page 71 lines 1889-1899 should be adjusted accordingly. 3. The "substitute" statement in LC_COLLATE is needed for describing higher levels of Danish Standard DS 377 sorting, and should be included again. 4. Date miss formats to specify the width, which is needed eg. in Danish long form dates, eg "2. oktober 1972" - the first day number here is without a leading space. Also other formats need this, including the month number, which may be without leading space or zero, and dayname, which may be 2, 3 or 4 characters. We propose the following to be added on page 373 about line 3997: After the % the following may appear: - an optional minimum field width . If the converted value has fewer characters than the field width, it will be padded with zeroes on the left to the field width, unless otherwise noted. The field width take the form of an integer. If no minimum field width is given, the field width is the maximum width given for numbers. - an optional precision that gives the minimum number of digits to appear for numbers and the maximum number of characters to be written from a string. The precision takes form of a period (.) followed by a decimal integer. (modelled after the ISO C standard ISO/IEC 9899:1990 page 132). 5. We miss a utility that can convert files based on charmaps or locales. The charmaps are the formal place to specify the character sets, and this information should be used also to convert files. As heterogeneous environments become more commonplace, viz. world-wide networking, and some frequent Danish letters occur in different positions in various character sets, there is much need for such a specification for scripts and for user extensibility. We intend to have a proposal ready for a later issue of 9945-2, and we see a place for this in a revised "tr" utility. We would like a statement in 9945-2 that this is an area where work is to be done. 6. The word "immediate" should be deleted on page 85-86 in lines 2533,2535,2547 and 2549, as a space can precede the currency symbol, see p_sep_by_space and n_sep_by_space with value 2. 7. We want the text for "pax -e" (in previous drafts) included, as we need a better quasi-portable way of transporting such files. It may be included in Annex F. It could be included in the normative part of the standard at a later stage, and we would like indications in the standard that an extended exchange format is being planned. 8. IEEE documents that are also official WG15 documents should be clearly marked as ISO documents, with ISO document number on each page, and also with copyright notices that are in accordance with ISO copyright rules. ISO documents are copyright of ISO, via the Member bodies. 9. Much work is done on locales, and making them quite general. WG15 RIN has on its programme of work to harmonize locales as far as it is feasible. The POSIX.2 Draft 11 introduced a copy command for all sections of the locale. This is good for many purposes and it ensures that two locales are equivalent for this category. A further step in building on previous art is proposed here. The collating sequences vary a bit from country to country, but generally much of the collating sequence is the same. For instance the Danish sequence is quite equal to the German, English or French, but for about a dozen letters it differs. The same can be said for Swedish or Spanish: generally the Latin collating sequence is the same, but a few characters are collated differently. With the advent of the quite general coded character set independent locales like the example Danish in POSIX.2 draft 11 annex F, it would be convenient if the few differences could be specified just as changes to an existing one. This would also improve the overview of what the changes really are. Therefore DS propose the following. For the LC_COLLATE definition, a new command is allowed: replace_after ... ... ... replace_after ... .... replace_end This construct is allowed also when a "copy" statement has been given. More then one replace_after / replace_end construct can be given The ... are removed from the current collating sequence and inserted after in the collating sequence. For this to work the "copy" statement should be allowed to be used together with other statements in the LC_COLLATE section. This implies the semantics of a C #include directive, as is indicated in the description of "copy" with the word "source". Clarification of this semantic is sought, and a request for a new keyword, "include" is hereby made, if "copy" does not work on source level. The replace-after proposal can be included in the Annex F, where its use is demonstrated. Then the specification can be moved to the normative part of 9945-2 in a later issue. 10. Page 30 lines 397-400 , page 38 lines 649-653 $ and # page 49 lines 1065-1073. It should either not be allowed to use currency sign and pound sign, or currency sign and pound sign should be allowed as one of the characters in the portable character set. The clauses as they stand are very character set dependent (they talk about substitution of positions in character sets!), and character set dependence should be removed as much as possible from POSIX. One thing the current specification will mean, is that you cannot see on a piece of paper or on other output media if a program is valid or not, as it depends on the encoding. Eg. a shell script written with British 7-bit ISO 646 will be valid with pound signs, while a completely equally-looking script written in ISO 8859-1 will be non-conformant. 11. Page 78 line 2212-2213, 2125, page 55 line 1249-1250. We see no need for a specific encoding and collating order for a character NUL, and we request this to be removed. The current specifications make the POSIX specification character encoding dependent, and make unnecessary constraints on this character when collating. 12. Appropriate ISO information should also be included in the acknowledgement section of participants on page xiii, eg. the list of members of WG15 and its rapporteur groups. 13. We are still not satisfied with the current regular expression syntax, but have no better solution at present. 14. We would have liked to see a general encode/decode-utility like an enhanced uuencode in POSIX.2, but realize that time is running out. To compensate for this we would like to see that this utility would be able to move to section 4 (in .2a first, in .2 when merged). 15. We are not happy about not having the specification of the PAX-format in .2, but mostly unhappy about having to wait for .2b (several years from now). This is just a comment.