From hlj@posix.com Tue Jul 21 22:57:03 1992 Received: from netcomsv.netcom.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA13823; Tue, 21 Jul 92 22:57:03 +0200 Received: from posix.COM by netcomsv.netcom.com with UUCP (4.1/SMI-4.1) id AA25839; Tue, 21 Jul 92 13:56:37 PDT Received: by posix.COM (5.64/A/UX-AMR-1.0) id AA26985; Tue, 21 Jul 92 13:46:11 PDT Message-Id: <9207212046.AA26985@posix.COM> Subject: POSIX.2 update To: sc22wg15@dkuug.dk Date: Tue, 21 Jul 92 13:46:09 PDT From: Hal Jespersen Organization: POSIX Software Group, 447 Lakeview Way, Redwood City, CA 94062 Phone: +1 (415) 364-3410 FAX: +1 (415) 364-4498 X-Mailer: ELM [version 2.2 PL0] X-Charset: ASCII X-Char-Esc: 29 Hi. This mail updates WG15 members on the status of DIS 9945-2 and the IEEE P1003.2b project. DIS 9945-2:1992 is printed and should make it through channels to SC22 in the next few weeks for IS balloting registration. It is technically identical to IEEE P1003.2 Draft 12, which has been sent to the IEEE Standards Board for approval in September. (It is the merger of .2 and .2a). The IEEE P1003.2 working group met in Chicago and began processing the list of WG15 requirements for the next revision--what we call P1003.2b (although the IEEE might rename it P1003.2 someday). The list, extracted from Annex H of the DIS, is shown below. Each numbered item is followed by a status comment. Please note that a few require proposals from the member bodies shown; these correspond to action items assigned at the WG15 New Zealand meeting. Also, note that the P1003.2b working group is meeting in Utrecht on 22-23 October--just before WG15--expressly for the purpose of discussing these international requirements. It would be very useful to receive proposals at or before this meeting and for as many WG15 people as possible to attend. Please recall that you have an action item to inform me of your proposed attendance at this meeting. Thanks for your cooperation. Hal Jespersen Chair, P1003.2 Project Editor, WG15 Annex H List: (1) Provisions should be made to allow characters beyond those in the portable character set in user-supplied identifiers for the shell, awk, bc, lex, make, and yacc. A proposal has been made by Denmark to extend the locale definition to specify the set of identifier characters for all programming languages. We [P1003.2 group] have asked for input on this subject from the various I18N groups, and are also expecting a proposal from Denmark that addresses the concerns raised in the Disposition of Comments document. (2) The shell, awk, other small languages, and regular expressions should be supported by national variants of ISO/IEC 646 {1}. A proposal from Denmark is expected in this area. A Danish proposal is expected for Draft 5 or later. (3) The LC_CTYPE (2.5.2.1) locale definition should be enhanced to allow user-specified additional character classes, similar in concept to the proposed C Standard {7} Multibyte Support Extension (MSE) is_wctype() function. We have adopted the localedef material related to is_wctype() in X/Open Portability Guide 4. This will appear in P1003.2b Draft 4. No further proposal is needed at this time. (4) The LC_COLLATE (2.5.2.2) locale definition should be enhanced to allow user-specified names for collation weights. A proposal from Japan is expected in this area. A Japanese proposal is expected for Draft 5 or later. (5) The collation substitute facility, removed from 2.5.2.2 in an early draft, should be restored. A Danish proposal is expected for Draft 5 or later. (We will need, at the least, rationale on why this is needed. Various I18N experts we consulted indicate that this might be convenient in some cases, but is no requirement for the types of sorting needed by portable applications. And we need to have a definition that does not recursively rely on the definition of regular expressions.) (6) A facility should be added to allow simple modifications to existing locale collation definitions. A proposal for such a replace_after keyword in LC_COLLATE is being developed by Denmark. A Danish proposal is expected for Draft 5 or later. (7) The specific encoding and collation requirements for the character NUL should be removed. Draft 4 will remove the the collation requirements, but we cannot justify diverging from ISO 9899 for the NUL encoding requirement of all zero bits. We would require additional rationale to do so. (8) The support of state-dependent (shift encoding) character sets should be addressed fully. See descriptions of these in 2.4. If such character encodings are supported, it is expected that this will impact 2.4 (charmap), 2.5 (locale definition), 2.8 (regular expressions), and the comm, cut, diff, grep, head, join, paste, and tail utilities. A proposal from Japan is expected in this area. A Japanese proposal is expected for Draft 5 or later. (9) The definition of column position (see 2.2.2.36) relies on the implementation's knowledge of the integral width of the characters. The charmap (2.4) or LC_CTYPE (2.5.2.1) locale definitions should be enhanced to allow application specification of these widths. A proposal from Japan is expected in this area. Draft 4 will adopt the solution given in Japanese ballot comment ITSCJ.11. No further proposal is needed at this time. (10) A utility (or feature of another utility, such as tr) should be provided that converts between character sets encodings based on two charmap files. A proposal from Denmark is expected in this area. Draft 4 will adopt the XPG iconv utility, modified to allow for charmap inputs. No further proposal is needed at this time. (11) The date utility should allow width specifications for its fields, similar to those in the printf() function. Draft 4 will include this change. (12) The file utility should allow user-specified algorithms for file type recognition, similar to those used in the historical /etc/magic file. We would like to ask for a specific proposal from Denmark (or other interested bodies). We found that: existing systems have incompatible extensions to their magic files; there is a problem specifying what byte order to use; there is disagreement about whether the user-provided magic information should add to or replace the system file; if it adds to, will this cause some existing scripts to break?; if it replaces it, how do we specify which tests have to be in that file (versus built into the command itself)? (13) The pax utility should provide a new file interchange format, in addition to cpio and ustar, that allows extended characters in file, user, and group names. Rules should be given for the cases where an archived name cannot be represented by the local character set in the file system. NOTE: The example Danish Profile annex contains a feature that implements a form of this proposal. It is not clear whether this is the solution that will be selected for the next revision. An updated version of pax will be in Draft 4. This is an area in urgent need of WG15 comment. The solution we are using currently is based on ISO 10646 UTF, not the full charmap translation in the Danish annex. (Charmaps are used only in unloading file names from the tape where the system cannot itself convert from 10646.) (14) The uuencode utility should support the BASE64-encoding specified in the MIME-RFC currently under consideration for Internet use. The uudecode utility should allow the user to override the output file name that is embedded in the file. Both utilities should be moved from Section 5 to Section 4. Draft 4 will include the BASE64 format. No further proposal is needed at this time. However, we cannot move the section location without additional rationale; we already have complaints that the mandatory Section 4 is too large (72 utilities) for some profiles as it is. (15) The functions in Annex B that use the C-language char type should be modified to allow wide character (wchar_t) encodings, as suggested by the proposed MSE amendment to the C Standard {7}. A proposal from Japan is expected in this area. A Japanese proposal is expected for Draft 5 or later.