From keld@dkuug.dk Thu Mar 12 15:55:34 1992 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA25111; Thu, 12 Mar 92 15:55:34 +0100 Date: Thu, 12 Mar 92 15:55:34 +0100 From: Keld J|rn Simonsen Message-Id: <9203121455.AA25111@dkuug.dk> To: wg15rin@dkuug.dk Subject: proposedm resolutions to danish comments X-Charset: ASCII X-Char-Esc: 29 I am forwarding this, after agreement with Hal, for discussion here. /Keld ----- Keld, here are my proposed resolutions for all the Danish comments to 9945-2. These are offered in my role as IEEE working group chair and do not necessarily reflect the position of the United States. They are in IEEE resolution format and have obviously not been approved by WG15, but I hope they serve as a starting point for discussions in NZ. Those listed as Accepted are reflected in IEEE Draft 11.3, although this is taking a chance on my part, because they have not yet been approved by WG15 and they might have to be removed it if WG15 so directs. Best Regards, Hal @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-8 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 0/0 c 8 m 8. IEEE documents that are also official WG15 documents should be clearly marked as ISO documents, with ISO document number on each page, and also with copyright notices that are in accordance with ISO copyright rules. ISO documents are copyright of ISO, via the Member bodies. ------------------------------------------------------ RESOLUTION: The document approved as the DIS will follow the ITTF/IEEE formatting and copyright agreements. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-12 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 0/0 c 12 m 12. Appropriate ISO information should also be included in the acknowledgement section of participants on page xiii, eg. the list of members of WG15 and its rapporteur groups. ------------------------------------------------------ RESOLUTION: We will add participant information in the place agreed to by the ITTF/IEEE formatting agreements. The xiii page is an IEEE list that may be deleted from the ISO DIS version, so the Acknowledgments may go behind the index, as in 9945-1. @====================================================== @ Final= Comment, Original= Comment, TR= donn, BG= 174-10 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 2/2 c 10 m 10. Page 30 lines 397-400 , page 38 lines 649-653 $ and # page 49 lines 1065-1073. It should either not be allowed to use currency sign and pound sign, or currency sign and pound sign should be allowed as one of the characters in the portable character set. The clauses as they stand are very character set dependent (they talk about substitution of positions in character sets!), and character set dependence should be removed as much as possible from POSIX. One thing the current specification will mean, is that you cannot see on a piece of paper or on other output media if a program is valid or not, as it depends on the encoding. Eg. a shell script written with British 7-bit ISO 646 will be valid with pound signs, while a completely equally-looking script written in ISO 8859-1 will be non-conformant. ------------------------------------------------------ RESOLUTION: We propose that this issue be discussed by WG15 in its resolution meeting. We agree that we should be as much as possible codeset- independent, but on the other hand we don't want to lock out things arbitrarily. A strict reading of POSIX.2 would disallow an implementation that provided pounds-sterling instead of octothorpe, unless this clause was in the definition. This is because POSIX.2 is stated in terms of characters, not code points. (And for very good reasons.) However, in this one special case, there are several world character sets that would be conformant if a 1:1 character subsitution were allowed in the limited case of those two characters. It is a known, well understood, common, and safe special case. Allowing that substitution broadens the range of applicability of the standard without any significant damage. We agree that the representation on paper is nearly irrelevant, as even if the paper is scanned in, the translation from graphic to code point is trivially under the control of the user. (Worst case, tr provides the means to get to the right code points.) There is no ambiguity if there has been a substitution (different graphic at the same code point) because that is permitted solely in a particular special case. If there has been an addition then the standard is equally clear: use the character required by the standard. If, due to a mistake on the part of someone porting software, the wrong code point is represented in the program, POSIX already provides tools to make the transition (tr) back to the right code point, and this is no worse (and actually easier) than having the rest of the program represented in the wrong codeset (e.g. EBCDIC). (Due to the unsrestricted nature of the other national variant code points in 646, this doesn't work for any other characters.) @====================================================== @ Final= Comment, Original= Comment, TR= greger, BG= 174-1 @ Resolution= Accepted @------------------------------------------------------ @174 2/5 c 1 a 1. date specification: Sunday is not the first day in the week, according to ISO 8601. So please change p.90 l.2702, l.2707 l.2658 l.2662 p.374 l.4025 p375 l.4056 - to reference Monday as the first day of the week. And introduce a new format in date, eg %u specifying Monday as first day. Page 374 4027: there is no week 00 in ISO 8601. So please change it to 01. ------------------------------------------------------ RESOLUTION: Changed text for 2.5 and date utility in Draft 11.3. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-2 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 2/5 c 2 m 2. 8bit identifiers: We would like to use extended character sets in identifiers in the shell and in other POSIX.2 small languages. For that purpose we want an additional class in LC_CTYPE called "ident" which specifies which characters can be used in place for a-zA-Z (together with digits etc.) in identifiers. Characters in the classes punct and space should not be allowed in this class. Characters in class alpha should automatically be included. This is in accordance with general SC22 principles as outlined in ISO/IEC JTC1/SC22 N623. The proposal will help on the situation where normal letters in e.g. the Danish language is intended to be used in identifiers. Users find it strange that some letters can and some cannot be used in identifiers, and it is very culturally biased that this limitation exist in POSIX interfaces. So DS proposes a new standard class in the LC_CTYPE section of the locale, for inclusion in POSIX.2 section 2.5.2.1: ident: Define characters to be classified as identifier characters (allowed in identifiers in a programming language or the like) as an extension to the letters normally allowed. Characters specified for the keyword shall automatically belong to this character class. Characters specified from the classes ,,, shall be excluded from this character class. Each programming language etc. may allow additional characters, such as digits or underscores in identifiers, according to separate rules specified by the language. The table page 71 lines 1889-1899 should be adjusted accordingly. ------------------------------------------------------ RESOLUTION: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred for the following reasons: 1. Alternate methods can be considered (such as allowing any chars in class [:alpha:] where the standard currently refers to alphabetics in the portable character set). 2. Many of the little languages have different identifier requirements unrelated to alphabetics and these need to be analyzed for implementation difficulty and the specific restrictions required for each language. One example is differing requirements for the first character in an identifier. 3. Proposals are being considered to allow application- specified locale categories and keywords and this proposal should be harmonized with that. 4. This proposal needs study for internal inconsistencies, such as the exclusion of [:punct:] class characters, but the inclusion of the underline in some languages. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-3 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 2/5 c 3 m 3. The "substitute" statement in LC_COLLATE is needed for describing higher levels of Danish Standard DS 377 sorting, and should be included again. ------------------------------------------------------ RESOLUTION: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because there currently exists no firm consensus on its necessity within the US or international communities. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-6 @ Resolution= Accepted @------------------------------------------------------ @174 2/5 c 6 a 6. The word "immediate" should be deleted on page 85-86 in lines 2533,2535,2547 and 2549, as a space can precede the currency symbol, see p_sep_by_space and n_sep_by_space with value 2. ------------------------------------------------------ RESOLUTION: D11.3 changed as requested. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-9 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 2/5 c 9 m 9. Much work is done on locales, and making them quite general. WG15 RIN has on its programme of work to harmonize locales as far as it is feasible. The POSIX.2 Draft 11 introduced a copy command for all sections of the locale. This is good for many purposes and it ensures that two locales are equivalent for this category. A further step in building on previous art is proposed here. The collating sequences vary a bit from country to country, but generally much of the collating sequence is the same. For instance the Danish sequence is quite equal to the German, English or French, but for about a dozen letters it differs. The same can be said for Swedish or Spanish: generally the Latin collating sequence is the same, but a few characters are collated differently. With the advent of the quite general coded character set independent locales like the example Danish in POSIX.2 draft 11 annex F, it would be convenient if the few differences could be specified just as changes to an existing one. This would also improve the overview of what the changes really are. Therefore DS propose the following. For the LC_COLLATE definition, a new command is allowed: replace_after ... ... ... replace_after ... .... replace_end This construct is allowed also when a "copy" statement has been given. More then one replace_after / replace_end construct can be given The ... are removed from the current collating sequence and inserted after in the collating sequence. For this to work the "copy" statement should be allowed to be used together with other statements in the LC_COLLATE section. This implies the semantics of a C #include directive, as is indicated in the description of "copy" with the word "source". Clarification of this semantic is sought, and a request for a new keyword, "include" is hereby made, if "copy" does not work on source level. The replace-after proposal can be included in the Annex F, where its use is demonstrated. Then the specification can be moved to the normative part of 9945-2 in a later issue. ------------------------------------------------------ RESOLUTION: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because there currently exists no firm consensus on its necessity within the US or international communities. The following statement was received in opposition to this proposed change, and represents technical issues that must be addressed in studying the issue: The intent of the copy statement was to allow the user to create a locale where some categories are identical to the same category in another, already existing locale. The intent is not to copy a source description, but an actual object description or category. The reason for this is that no source for the category may exist on the system or that it is a category for an implementation- defined locale. (If source copy is desired, an actual copy of the source is probably easier and more descriptive and subject to normal UNIX text control tools: sed, diff, SCCS, etc.). With OBJECT copy, the above functionality becomes quite difficult, if not impossible. In collation, for instance, it would probably require "object to source translation," replacement, and "compilation." Depending on the implementation, this may be impossible (note that the object doesn't know anything about source names (collation-elements, collation-names, or charmap names). The above requirement places restraints on implementations that are not warranted by the potential advantages; which is only in the amount needed to be specified in the source file. It also makes the documentation of the locales quite confusing. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-11 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 2/5 c 11 m 11. Page 78 line 2212-2213, 2125, page 55 line 1249-1250. We see no need for a specific encoding and collating order for a character NUL, and we request this to be removed. The current specifications make the POSIX specification character encoding dependent, and make unnecessary constraints on this character when collating. ------------------------------------------------------ RESOLUTION: We propose that this issue be discussed by WG15 in its resolution meeting. NUL is the only special character, and that is because it has a special meaning in POSIX: it cannot be included in text files, and it is used to delimit strings in C. Its value is required by ISO/IEC 9899, on which most POSIX.2 implementations will be based. Consequently, it IS special (see also regular expressions). Most of the utilities using the collation definition are processing text strings; certainly neither strxfrm() or strcoll() can handle nulls except as string terminators. Making NUL the lowest character makes the end-of-string processing simpler and in line with the standard POSIX sorting rules (shorter string sorts before longer). Also, leading ellipsis doesn't work if NUL isn't first. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-13 @ Resolution= Accepted @------------------------------------------------------ @174 2/8 c 13 a 13. We are still not satisfied with the current regular expression syntax, but have no better solution at present. ------------------------------------------------------ RESOLUTION: No action proposed. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-5 @ Resolution= Accepted @------------------------------------------------------ @174 4/0 c 5 a 5. We miss a utility that can convert files based on charmaps or locales. The charmaps are the formal place to specify the character sets, and this information should be used also to convert files. As heterogeneous environments become more commonplace, viz. world-wide networking, and some frequent Danish letters occur in different positions in various character sets, there is much need for such a specification for scripts and for user extensibility. We intend to have a proposal ready for a later issue of 9945-2, and we see a place for this in a revised "tr" utility. We would like a statement in 9945-2 that this is an area where work is to be done. ------------------------------------------------------ RESOLUTION: We have added a statement to the tr rationale. Such a statement of future intentions is limited by ISO rules to a footnote or informative annex. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-14 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 4/0 c 14 m 14. We would have liked to see a general encode/decode-utility like an enhanced uuencode in POSIX.2, but realize that time is running out. To compensate for this we would like to see that this utility would be able to move to section 4 (in .2a first, in .2 when merged). ------------------------------------------------------ RESOLUTION: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because there are difficult synchronization problems associated with moving functionality from .2a to .2 (i.e., they are being considered by separate balloting groups). In the meantime, a national profile can easily require that some or all utilities from section 5 be included in all systems. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-4 @ Resolution= Accepted[Modified] @------------------------------------------------------ @174 4/15 c 4 m 4. Date miss formats to specify the width, which is needed eg. in Danish long form dates, eg "2. oktober 1972" - the first day number here is without a leading space. Also other formats need this, including the month number, which may be without leading space or zero, and dayname, which may be 2, 3 or 4 characters. We propose the following to be added on page 373 about line 3997: After the % the following may appear: - an optional minimum field width . If the converted value has fewer characters than the field width, it will be padded with zeroes on the left to the field width, unless otherwise noted. The field width take the form of an integer. If no minimum field width is given, the field width is the maximum width given for numbers. - an optional precision that gives the minimum number of digits to appear for numbers and the maximum number of characters to be written from a string. The precision takes form of a period (.) followed by a decimal integer. (modelled after the ISO C standard ISO/IEC 9899:1990 page 132). ------------------------------------------------------ RESOLUTION: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because study of the existing %O modifier in satisfaction of these requirements is warranted. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-7 @ Resolution= Accepted @------------------------------------------------------ @174 4/48 c 7 a 7. We want the text for "pax -e" (in previous drafts) included, as we need a better quasi-portable way of transporting such files. It may be included in Annex F. It could be included in the normative part of the standard at a later stage, and we would like indications in the standard that an extended exchange format is being planned. ------------------------------------------------------ RESOLUTION: The text has been added to Annex G (the previous F). Statements about future plans are already in the draft (see D11.2 page 551 lines 9965-68 and page 558 lines 10251-65). @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 174-15 @ Resolution= Accepted @------------------------------------------------------ @174 4/48 c 15 a 15. We are not happy about not having the specification of the PAX-format in .2, but mostly unhappy about having to wait for .2b (several years from now). This is just a comment. ------------------------------------------------------ RESOLUTION: The US member body intends to work closely with WG15RIN to complete .2b as soon as agreement can be reached on the outstanding issues. Substantive reviews of early drafts will accelerate the process.