Japanese Comments on ISO/IEC CD 9945-2.2 (POSIX.2/D11.2) SSI/POSIX WG IPSJ/ITSCJ, Japan February 14, 1992 1. Introduction This is the comments for ISO/IEC CD 9945-2.2 (POSIX.2 Draft 11.2), from IPSJ/ITSCJ SSI/POSIX WG, which is the Japanese National Body of JTC1 SC22/WG15 (POSIX). Since 1988, we've been sending various comments for ISO/IEC 9945 series of standard, concerning about byte and character issues in terms of internationalization of POSIX specification in order to make POSIX standards really "internationalized" and hence acceptable by all National Member Bodies and the related industry worldwide. As usual, we have been eagerly reviewing the CD 9945-2.2 in terms of internationalization, namely "byte vs. character" and "multibyte character" issues. Due to shortage of time, however, we've not finished our review yet, but come up with various comments as attached hereafter. Although we strongly supports POSIX.2 approach about character handling in a sense that "a character is a character, not a byte", we think the current draft is insufficient in this point. At this moment we Japan therefore would like to cast a negative vote against the circulation of the draft as a DIS. We appreciate if these comments are carefully reviewed at appropriate SC22/WG15 and/or IEEE/POSIX meetings, and as such we can contribute to POSIX standardization efforts. 2. Overview We have been carefully reviewing the CD 9945-2.2 mainly from an "internationalization" or "standardization of national/regional language support" point of view. We Japan believes that the most important thing of internationalization is to achieve a kind of "character independency". In the light of this, the following aspects should be taken into consideration when defining the POSIX specifications. o Character counts / byte counts - 1 - o Character counts / display width o Byte counts / display width o Only the "wchar_t" type in C language (known as a "wide character") corresponds to the concept of a character. Our major concerns can be categorized into the following parts: o ISO/IEC Conformance and Conformance o Byte and character o Character encoding o Width of a character/string o Wide character support and ISO/C MSE o others 3. Comments The pages below offer a collection of our objections and comments on the CD 9945-2.2. All comments have been reviewed by our members. However, some of these are not unanimously agreed. So, each comment has name of originator. To make clear the issues and to consider appropriate approaches to solve them, we have been applying the following guidelines in the review process. And thus we also strongly recommend the SC22/WG15 and the CD 9945-2.2 developing member body (US - IEEE) to follow the guidelines in order to resolve the issues and the Japanese comments. Guidelines for the standard interface review and design: o Determine which interfaces are character-oriented (arguments or operands, input data, output data, I/O format and etc.) o Classify the features of character-oriented interfaces as follows: - Character boundary recognition - Limit check & truncation in various units, in particular, make clear what units (byte, character, column, width, and etc.) shall be applied. - Character/string width recognition - 2 - - Character/string parsing & manipulation - Language dependency of text data including message data - Culture dependency of representations The following email address is available for the ITSCJ/SSI/POSIX WG. Please send questions and comments on this ballot to: Email: posix@ccut.cc.u-tokyo.ac.jp SSI/POSIX WG Information Technology Standards Commission of Japan Kikai-Shinko Kaikan Bldg., 3-5-8 Shiba-Koen, Minato-Ku, Tokyo 105, Japan Tel: +81 3-3431-2808 Fax: +81 3-3431-6493 - 3 - ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 1 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 0 c 1 Sect Global COMMENT. Problem: Shift encoding It is unclear that the POSIX.2 has some undocumented assumptions and/or restrictions about the encoding schemes including shift encoding. Hence, in Japan (and maybe in other countries where additional character sets beyond the ISO 646 IRV or ASCII are mandatory in the market), a lot of implementators/users of POSIX system are facing the difficulties in supporting shift encodings. The following may be possible interpretations of the current POSIX.2 position. (a) Shift encoding is out of scope. (b) Shift encoding is allowed, but it is a feature of implementation defined. (c) To support shift encoding is one of the issues, and it would be considered in the future draft. Such assumptions/restrictions, if any, should be clearly stated in the POSIX.2. Action: Consider the above interpretations and take a suitable action. Japan will probably then cooperate with the POSIX.2 developing member body (US - IEEE) on how to solve this issue. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 2 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 0 o 2 Sect Global OBJECTION. Problem: Width of a character/string There are several utilities (e.g. fold) in the POSIX.2 and POSIX.2a, which are character width sensitive. And, the definition of "column" seems appropriate for so called 1992-02-14 Page 1 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 character cell terminals. However, there is no suitable way to specify/identify the width of a character either in POSIX.2 (LC_CTYPE or charmap file) or in POSIX.1. Action: A certain specification should be developed in the future version with appropriate collaboration between IEEE/POSIX.2 WG and ISO WG15/RIN. Japan is now investigating the following two different possibilities: Approach (A): introduce an additional "width" field or operand for each character in the current charmap definition file. e.g. "%s %s:%s %s0 , , , where ":%s" for may be optional. See another ballot comment for more details. Approach (B): introduce an ability of character grouping, and then specify some rules to give width information to the character groups. This character grouping capability may be also useful for another locale information of characters, such as defining collation by character groups among several different character sets; e.g. Latin < Katakana < Hiragana < Kanji ... _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 3 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 0 o 3 Sect Global OBJECTION. Problem: Wide character support and the ISO/C MSE I strongly supports the POSIX.2 approach about character handling in a sense that "a character is a character, not a byte", which means that a character is a multibyte character in general. The most important thing of I18N in achieving a kind of "character independency" is taking into account of the following aspects. - Character counts != byte counts - Character counts != display width 1992-02-14 Page 2 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 - Byte counts != display width - Only the wchar_t type in C (a wide character) corresponds to the concept of a character. In this sense, a wide character support is essential to implement the POSIX.2 specifications of "character" handling. And such wide character supporting interfaces are useful not only for POSIX.2 implementation, but also for all POSIX applications which really want to be portable world-wide. However, neither the current POSIX.2 nor the POSIX.1 (ISO 9945-1) provide such wide character supporting interfaces. This will cause unnecessary non-standard divergence of multibyte character support in an incompatible way, and hence the POSIX systems/ applications in single-byte character environments will be never accepted as the ones in multibyte character environments. Action: Consider the above and take a suitable action. The following approach is highly requested. - Add the ISO/C MSE features in one of the near future POSIX.2/POSIX.1 extensions. The POSIX.2b would be most preferable. - Re-examine all APIs which handle "character" (not byte) stream/text, from a wide character point of view. In particular, the following functions in POSIX.2 are candidates of such enhancement, as of now. - regcomp, regex Japan will probably be able to cooperate with the POSIX.2 developing member body (US - IEEE) on how to solve these issues. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 4 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 0 o 4 Sect Global OBJECTION. Problem: Additional character classes In the market, there is a need to provide a suitable interface to handle various character classes which are dependent on national languages and cultural specific representations. Such additional character classes are expected to be valid in the POSIX RE and ERE pattern matching as well. Unfortunately, the current POSIX.2 does not provide such interface/ mechanisms to define/handle additional character classes beyond the current ANSI/C and/or Latin based character classes. The current 1992-02-14 Page 3 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 draft says that such additional character classes may be supported by implementation, but which is implementation defined. Action: As the ISO/C Multibyte Support Extension (MSE) is going to provide a new function is_wctype(), some corresponding enhancement of LC_CTYPE description file should be considered so that "user/ implementation definable character classes" can be supported in the POSIX environments in a standard manner. Japan will probably be able to cooperate with the POSIX.2 developing member body (US - IEEE) on how to solve these issues. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 5 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 1.3 c 5 Sect 1.3.2 (Application Conformance) COMMENT. page 15-16, line 542-590 Problem: Conformance Section 1.3.2 Application Conformance says that there are four categories of application conformance. - 1.3.2.1 Strictly Conforming POSIX.2 Application - 1.3.2.2.1 ISO/IEC Conforming POSIX.2 Application - 1.3.2.2.2 Conforming POSIX.2 Application - 1.3.2.3 Conforming POSIX.2 Application Using Extensions The idea of " Conforming" is acceptable, but we think that it is necessary to reconsider about the relationship between ISO/IEC Conforming POSIX Application and Conforming POSIX Application, since without suitable "guidelines" it will mislead to incompatibilities among "nations". For example, we think character encoding and character handling using "wchar_t" are very important for applications as well as implementations. However, if one defines a codeset in its "National Profile" while others do not, there will be serious problems of international portability and/or compatibility of POSIX Conforming Applications. Action: Provide suitable guidelines on what and how to specify some features and options for such ISO/IEC Conformance and Conformance. 1992-02-14 Page 4 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 6 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.2 o 6 Sect 2.2.2.91 (NUL) OBJECTION. page 37, line 647: Problem: "NUL: A character with all bits set to zero" is ambiguous, since by the POSIX definition "a character" means "a multibyte character" in general. It is unclear that with the phrase "with all bits .. zero" this definition specifies a single byte null character, a multibyte null character (in generic), or both/neither (regardless of number of bits). cf. ISO 9899 (or ANSI/C): "null character: a byte with all bits set to 0." ISO 9945-1 (POSIX.1): the same as ISO/C, isn't it? In the mixed character sets environment with single byte characters and strictly multibyte characters, there is a need to distinguish a single byte null and a multibyte null character. Action: If it implies a single byte null character, change to: "NUL: A single byte character with all CHAR_BIT set to zero." If it specifies a unique null characters regardless of number of bits in the POSIX environment, change to: "NUL: A character with all bits set to zero, which is defined as in the character set description file." _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 7 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.4 o 7 Sect 2.4 (Character set) OBJECTION. page 54, line 1232-1234: Problem: As the line 1232 implies, implementations may support more than one coded character sets. However, there are two cases: 1992-02-14 Page 5 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Case 1: Supported multiple coded character sets are exclusively used at one time. e.g. ASCII/EBCDIC/ISO 646 variants/8859-1/ ... Case 2: Supported multiple coded character sets are mixedly used. e.g. ISO 646IRV + JIS X 0208 + JIS X 0201 Katakana, ... In case 2, each supported code character set may not necessarily have the *portable character set*. So, the requirement in line 1233-1234 is too strong. It just addresses the case 1 above. Action: Change the line 1233-1234 to: "Each supported locale shall include the portable character set specified in Table 2-3." _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 8 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.4 o 8 Sect 2.4 (Character set) OBJECTION. page 54-55, line 1232-1252: Problem: In the current draft, is there an assumption that the portable character set shall consist of single-byte characters? The only requirements that POSIX.2 places on coded character set in page 55 seem to allow that the portable character set consists of strictly multibyte characters. However, the current POSIX.1 and POSIX.2 specifications do not fit the case where the portable characters are implemented as two-or-more-byte characters, even though the character set definition file itself can allow such definition. Action: If there is an undocumented assumption that the portable character be single-byte, add clearly such statements in section 2.5. If no requirement on byte/bit size of the portable character set is assumed except that the minimum value of CHAR_BIT is 8, please say so and then specify more "character" oriented interfaces both in POSIX.2 and POSIX.1, instead of the current "byte" oriented interfaces. For example, the proposed getopt() function does not work well in two-byte portable character set environment. And the several utility syntax guidelines in section 2.10.2 will cause a conflict and fail. 1992-02-14 Page 6 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 9 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.4 o 9 Sect 2.4.2 (Char Set Rationale) OBJECTION. page 58-56, line 1400-1405: Problem: This paragraph discusses several things about a shift encoding, which seems to be the only place in the document where the standard addresses such topics. Besides, this is a mere rationale section. As such, the current draft does not provide any clear and concrete descriptions on how to specify shift encoded characters in the character set description file. Action: If the standard has a certain underlying philosophy (a part of which happens to be described in the rationale section 2.4.2), please write it down more clearly as specifications not in a rationale section, but in a normative section. In particular, the description about an example of shift encoding in lines 1402-1405 is not sufficient. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 10 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.4 o 10 Sect 2.4.2 (Char Set Rationale) OBJECTION. page 59-60, line 1426-1446: Problem: Several assumptions about character sets and the portable character set are described here, in a rationale section. Such assumptions should be described in a normative section, e.g. in the beginning of section 2.4 on page 55-56. Action: Move the assumptions in 2.4.2 in an appropriate format to the beginning of section 2.4. It is recommended that such assumptions are written in a list form, like the list of requirements on encoded value on page 55. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Akio Kido) Phone: +81-462-73-5436 Seq: 11 of 57 Email: jl01376@ymtvm8.vnet.ibm.com FAX: +81-462-73-7425 ------------------------------------------------------------------------------- @ 2.4 o 11 1992-02-14 Page 7 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Sect 2.4 (Character Set) OBJECTION. page 56, line 1312 - 1315: Problem: Although there are several column sensitive utilities specified in the standard, e.g. 'fold', no mechanism is provided to define column width (see column position 2.2.2.31) of a character. This attribute of character shall be defined in LC_CTYPE category or charmap file. Action: Enhance the syntax of charmap file to allow user to define the column width of the character. A proposed syntax is as follows: "%s %s:%d %s0,,,, or "%s...%s %s:%d %s0,,,, , : A non negative integer The column width and preceding colon (:) are optional. When the column width is omitted, the value of column width shall be assumed one for printable characters and zero for non-printable characters. This modification should be considered in POSIX.2b project. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 12 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.5 o 12 Sect 2.5.2.1 (LC_CTYPE) OBJECTION. page 67, line 1730-1731: Problem: There is no definition/specification in this standard on how to distinguish one encoded character set from another. So, the sentence here: "The ellipsis specification only shall be valid within a single encoded character set." is unclear and meaningless. Action: Delete this sentence, or restate clearly the intention. 1992-02-14 Page 8 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 13 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 2.5 o 13 Sect 2.5.2.1 (LC_CTYPE) OBJECTION. page 67 - 71 Problem: There is a need in an international environment to handle additional character classes beyond the currently specified ones in this standard. However, the current specification of the LC_CTYPE category definition fine seems to not allow "implementation/user definable" character classes, especially its BNF grammar of character classes on page 95, line 2910 - 2913 seems to preclude such extensions. Action: Add a capability of such entensions, or at least add a sentence mentioning that such extensions may be possible and provided by implementation or future standard. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 14 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 14 Sect 2.5 (LC_CTYPE) OBJECTION. page 67, line 1737 - 1738: Problem: Current draft uses the term "automatically included" in the text of LC_CTYPE. The definition and the usage of the term seems inappropriate in some places. It may cause multiple interpretations possible. The following points seem to be ambiguous. They should be made clear and needs to be specified accordingly. 1. Are there any differences between "automatically included" and "automatically belong to" in the text? Current specification gives the special meaning only for "automatically included". 2. In the definition of "automatically included", it is ambiguous whether the term "missing" covers the case where the keyword is not specified. In the alpha class description, as a example, the term "automatically belong to" is used. It is ambiguous whether it is required or not that the implementation provide upper and lower characters when the keyword alpha is not specified. This is relevant to the M (always) class in the table 2-6. 3. Is it required that upper class include A-Z of PCS, 1992-02-14 Page 9 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 lower class include a-z of PCS, digit class include 0-9 of PCS, space class include standard white space characters of PCS, xdigit class include 0-9, A-F and a-f of PCS, toupper include the mapping from a-z to A-Z, when the keyword is specified and their specification is missing in the definition of each class? 4. The definitions of character classification are related to the behavior of the is*() functions defined in C standard. Current specifications for the classes listed in 3. are described as "If the keyword is not specified, ..... automatically included.". It seems that the specification does not prohibit the possibility of defining upper class without A-Z and etc., that does not conform to C standard. Is it the intention of the text? 5. If "automatically included" is effective only when the keyword is not specified for the classes listed in 3., is it possible to replace "automatically included" with "shall be included"? 6. If "automatically included" characters are intended to be provided for the classes listed in 3. even if the keyword is specified and the definitions of them are missing, it is possible to specify the keyword alone without any character specifications. Current locale definition grammar does not allow it. Action: If the answer to 3. above is "YES", the draft needs to be changed according to the following successive comments. If "NO", editorial changes are necessary to remove the ambiguous points described above. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 15 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 15 Sect 2.5 (LC_CTYPE) OBJECTION. page 67, line 1737 - 1738: Problem: Both "automatically included" and "automatically belong to" are used in the description. The description using the terms should also cover the case where the keyword is not specified. Action: Replace "In the description, the term "automatically included" means that it shall not be an error to either include the referenced 1992-02-14 Page 10 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 characters or to omit them; the implementation shall provide them if missing and accept them silently if present." with "In the description, the term "automatically included" and "automatically belong to" means that it shall not be an error to either include the referenced characters or to omit them; the implementation shall provide them if missing (including the case where the keyword is not specified) and accept them silently if present.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 16 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 16 Sect 2.5 (LC_CTYPE) OBJECTION. page 67, line 1746 - 1749: Problem: If the keyword is not specified, no character can be specified in the class. So it's not necessary to specify as "shall automatically belong to". It can be just "shall belong to". However, it seems that the letters A-Z are required to be in upper class in any case according to C language standard. Action: Replace "If this keyword is not specified, the uppercase letters A through Z, as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class." with "The uppercase letters A through Z, as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class; implementation-defined character values are used if this keyword is not specified or if the characters are omitted from the uppercase definition.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 17 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 17 Sect 2.5 (LC_CTYPE) OBJECTION. page 67, line 1752 - 1755: Problem: If the keyword is not specified, no character can be specified in the class. So it's not necessary to specify as "shall automatically belong to". It can be just "shall belong to". However, it seems that the letters a-z are required to be in 1992-02-14 Page 11 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 lower class in any case according to C language standard. Action: Replace "If this keyword is not specified, the lowercase letters a through z, as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class." with "The lowercase letters a through z, as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class; implementation-defined character values are used if this keyword is not specified or if the characters are omitted from the lowercase definition.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 18 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 18 Sect 2.5 (LC_CTYPE) EDITORIAL COMMENT. page 69, line 1813: Problem: If the keyword is not specified, no character can be specified in the class. So it's not necessary to specify as "shall automatically belong to". It can be just "shall belong to". Action: Replace "shall automatically belong to" with "shall belong to". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 19 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 19 Sect 2.5 (LC_CTYPE) OBJECTION. page 69, line 1822 - 1826: Problem: If the keyword is not specified, no character can be specified in the class. So it's not necessary to specify as "shall automatically belong to". It can be just "shall belong to". However, it seems that the characters listed (standard white-space characters) are required to be in space class in any case according to C language standard. Action: Replace "If this keyword is not specified, the characters , , , , , and , as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class, with implementation-defined character values." 1992-02-14 Page 12 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 with "The characters , , , , , and , as defined in Table 2-3 (see 2.4.1), shall automatically belong to this class; implementation-defined character values are used if this keyword is not specified or if the characters are omitted from the space definition.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 20 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 20 Sect 2.5 (LC_CTYPE) OBJECTION. page 67, line 1867 - 1871: Problem: If the keyword is not specified, no mapping can be specified. So it's not necessary to specify as "shall automatically be included". It can be just "shall be included". However, if the mapping between the lowercase letters a-z and their corresponding upper case letters A-Z is always required, the wording should be changed. if not, only editorial change described above is required. Action: Replace "If this keyword is not specified, the lowercase letters a through z, and their corresponding uppercase letters A through Z, as defined in Table 2-3 (see 2.4.1), shall automatically be included, with implementation-defined character values." with "The mapping of the lowercase letters a through z to their corresponding uppercase letters A through Z, as defined in Table 2-3 (see 2.4.1), shall automatically be included; implementation-defined character values are used if this keyword is not specified or if the characters are omitted.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 21 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 21 Sect 2.5 (LC_CTYPE) EDITORIAL COMMENT. page 69, line 1843: Action: Replace "provided" with "specified". 1992-02-14 Page 13 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 22 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 22 Sect 2.5 (LC_CTYPE) OBJECTION. page 70, line 1856: Problem: If the keyword is not specified, no character can be specified in the class. So it's not necessary to specify as "shall automatically belong to". It can be just "shall belong to". Action: Replace "shall automatically belong to" with "shall belong to". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 23 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 23 Sect 2.5 (LC_CTYPE) EDITORIAL COMMENT. page 70, line 1859: Action: Replace "unspecified" with "not specified". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 24 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 24 Sect 2.5 (LC_CTYPE) EDITORIAL COMMENT. page 70, line 1874: Action: Replace "are separated" with "shall be separated". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 25 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 25 Sect 2.5 (LC_CTYPE) OBJECTION. page 70, line 1879 - 1881: Problem: It is not specified as a requirement that the mapping of letters a-z to A-Z be specified explicitly for toupper keyword. So, the description of tolower should be changed so as to be in line with toupper's. Action: Replace "If specified, the uppercase letters A through Z, as defined in Table 2-3, and their corresponding lowercase letter, shall be specified. If this keyword is not specified, the mapping shall be the reverse mapping of the one specified for toupper." 1992-02-14 Page 14 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 with "If this keyword is not specified, the mapping shall be the reverse mapping of the one specified for toupper. The mapping of the uppercase letters A through Z to their corresponding lowercase letters a through z, as defined in Table 2-3 (see 2.4.1), shall automatically be included.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 26 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 26 Sect 2.5 (LC_CTYPE) OBJECTION. page 71, line 1885 - 1899: Problem: In the table 2-6, each class has a code - (permitted) against the same class. The code for the same class should be M (always) or no code should be assigned. Action: Replace "-" code with "M" in the columns where "In class" and "Can Also Belong To" class is the same. Or remove the columns. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 27 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 27 Sect 2.5 (LC_CTYPE) OBJECTION. page 71, line 1903: Problem: "if not specified" is ambiguous. Does it mean that a character is not specified or a keyword is not specified? Action: Replace "brings to class if not specified" with an appropriate expression. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 28 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 28 Sect 2.5 (LC_CTYPE) OBJECTION. page 71, line 1907: Problem: "The character, which is part of the space and blank classes, cannot belong to punct or graph, but automatically shall belong to the print class." is not correct. Because it is not necessary to define the character as a member of blank class in other than POSIX locale. Moreover, it might be possible not to define 1992-02-14 Page 15 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 the character as a member of space class according to the specification of current draft. And "automatically shall belong to" is ambiguous. No such description is found in the specification of the print class. And it is ambiguous whether the term "automatically" has special meaning or not. Action: Replace the text with correct and more precise description. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 29 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 29 Sect 2.5 (LC_COLLATE) OBJECTION. page 73, line 2004 - 2005: Problem: There are some inconsistencies between some examples in the text of LC_COLLATE part and the locale grammar in 2.5.3.2. In the definition of collating element(page 73, line 2000), operand should be enclosed by "" according to the grammar. The example in the line 2152-2154 of page 76 does not have "". If the example text is collect, it has following problem. - can be "from" which is lexically same sequence as the preceding keyword "from" and can be a problem for parsing. Thinking about the semantics of the collating-element definition, the is not a character string constant but a sequence of characters. To make the concept of multicharacter collating element clearer as well as avoiding the problem above, it is required that only charmap symbols be specified in the operand. The same comment can be applied to the collating weight definition. (Text example in Page 80, line 2299 vs grammar in Page 96, line 2978) Action: Change line 2004 - 2005: from "The string operand shall be a string of two or more characters that shall collate as an entity." to "The string operand shall be a string of two or more charmap symbols that shall collate as an entity." Remove line 2154: "collating-element from ll" Change page 95, line 2944 - 2945: from collating_elements : 'collating-element' COLLELEMENT 'from' '"' char_list '"' EOL 1992-02-14 Page 16 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 ; to collating_elements : 'collating-element' COLLELEMENT 'from' symbol_list EOL ; symbol_list : CHARSYMBOL | symbol_list CHARSYMBOL Change Page 96, line 2978: from | '"' char_list '"' to | symbol_list At least, the inconsistencies between the grammar and the text should be resolved. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 30 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 30 Sect 2.5 (LC_COLLATE) OBJECTION. page 77, line 2168 - 2203: Problem: In most cases of ideographic characters, it is a requirement that a user be able to specify collation weights as he/she wants. In case of Japanese characters (Kanji), for example, there are five possible collation weights for supporting Japanese SORT. The five weights are On-yomi (pseudo-Chinese pronunciation), Kun-yomi (Japanese pronunciation), number of strokes, radical (components of Kanji), and Kanji character code. There could be more weights. The LC_COLLATE part of localedef specifications should allow a user to describe these weights and give names to the weights. Any combinations of the defined weights should be able to be specified by the user at run time. Action: Add a new sort-rule directive "name" with the "name = value" syntax. Line 2179: Add the following sentence before "Operands shall be ...." "If an operand has a name directive, the order of the sort-rules applied when comparing strings can be changed." After line 2199: Add the following text: name Specifies a name of the collation weight by a string. An order of weights may be specified by 1992-02-14 Page 17 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 using the name at run time. The syntax of the name directive is: "name = %s", An implementation may allow a user to specify an order of weights to be used for collation by specifying an order of the given names. A way of specifying an order of weights is implementation defined. A possible example of this would be: order_start forward,name="kunyomi";forward,name="radical" A locale for collation can be set as follows: LC_COLLATE = ja_JP.ujis@weights=radical,kunyomi With this example, the sort-rule "radical" is used as the primary weight and "kunyomi" is used as the secondary weight. The behaviors of functions which are affected by LC_COLLATE category such as strcoll() shall be changed as such. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 31 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 31 Sect 2.5 (LOCALE Grammar) COMMENT. page 93, line 2826 - 2827: Problem: The sentence line 2826-2827 is not accepted. All discrepancies should be resolved in the drafting stage. Action: Remove "Any discrepancies found between this grammar and other descriptions in this clause shall be resolved in favor of this grammar" ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 32 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 c 32 Sect 2.5 (LOCALE Grammar) EDITORIAL COMMENT. page 93, line 2840 -2841: Action: Replace "CHARSYMBOL" with "COLLSYMBOL". _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 33 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 2.5 o 33 1992-02-14 Page 18 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Sect 2.5 (LOCALE Grammar) OBJECTION. page 95, line 2907: Problem: If a char-class keyword can be specified alone without any character definition, the locale definition grammar for LC_CTYPE needs to be modified. According to the current specification, specifying space keyword and providing no character specification for the class will get the space class definition which includes only the characters defined in the blank class(automatically included characters). Action: Add the following line after the line 2907: " | charclass_keyword EOL" _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 34 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.11 o 34 Sect 4.11 (comm) COMMENT. page 350, line 3249 - 3254: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. For example, in the case of two lines comparison, one line contains extra shift sequences ( like shift out and shift in without data among them) than another, implementation could deal them as same or different (removing extra shift sequences, they are identical). Collating should handle extra shift codes appropriately. Action: Add following sentence to description: If the shift coding were applied, extra shift codes are ignored. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 35 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.14 o 35 Sect 4.14 (cut) COMMENT. page 368, line 3828 - 3830: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. In cutting out data from files, appropriate shift codes should be supplied in the head and tail of data. Action: 1992-02-14 Page 19 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Add following sentence to description: In cutting out data from files, appropriate shift codes should be supplied in the head and tail of data. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 36 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.17 o 36 Sect 4.17 (diff) COMMENT. page 388, line 4521 - 4523: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. For example, in the case of two lines comparison, one line contains extra shift sequences ( like shift out and shift in without data among them) than another, implementation could deal them as same or different (removing extra shift sequences, they are identical). Action: Add following sentence to description: If the shift coding were applied, extra shift codes are ignored. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 37 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.28 o 37 Sect 4.28 (grep) COMMENT. page 453, line 6710 - 6741: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. For example, in the case of two patterns comparison, one pattern contains extra shift sequences (like shift out and shift in without data among them) than another, implementation could deal them as same or different (removing extra shift sequences, they are identical). Searching should handle extra shift codes appropriately. Action: Add following sentence to description: If the shift coding were applied, extra shift codes are ignored. 1992-02-14 Page 20 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 38 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.29 o 38 Sect 4.29 (head) COMMENT. page 459, line 6926 - 6930: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. Appropriate shift codes shall be supplied, if necessary, to the tail of data. Action: Add following sentence to description: Appropriate shift codes shall be supplied, if necessary, to the tail of data. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 39 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.31 o 39 Sect 4.31 (join) COMMENT. page 466, line 7153 - 7171: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. In the case of concatenating fields, extra shift codes could be occur. Implementation may remove extra shift codes. Handling of extra shift code in the join field should be specified. Action: Add following sentence to description: If the shift coding were applied, extra shift codes are removed and/or ignored. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 40 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.31 o 40 Sect 4.31 (join) OBJECTION. page 467, line 7178 and page 469, line 7246-7248: Problem: It is unclear what output shall be produced for "unpairable" lines 1992-02-14 Page 21 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 under the -a option. The 4.31.6.1 Standard Output clause on page 469 seems to just provide formats for "pairable" lines. Since the most existing implementations may result in the following example. file1: file2: A A1 A2 A a1 a2 B B1 B2 C c1 D D1 D d1 d2 "join -a1 -a2 -e '-' file1 file2" may output: A A1 A2 a1 a2 B B1 B2 C c1 - D D1 - d1 d2 If the intention of the output format on page 7248, line 7248 also specify "unpairable" lines, the output of the above example shall be: A A1 A2 a1 a2 B B1 B2 - - C - - c1 - D D1 - d1 d2 Action: Clearly specify the output format for unpairable lines. The following change would be recommended for line 7247-7248 on page 469. When the -o option is not specified, the output including for unpairable lines shall be: "%s%s%s\n", , , _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 41 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.31 o 41 Sect 4.31 (join) OBJECTION. page 467, line 7182: Problem: It is unclear that the "-e string" option also effects the fields specified as the join field in combination of the "-a" option. Consider the following example: file1: file2: A A1 A2 A a1 a2 1992-02-14 Page 22 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 B B1 B2 C c1 D D1 D d1 d2 What does the command join -a1 -a2 -e '-' -o 1.1 1.2 1.3 2.2 2.3 file1 file2 output? While some implementation produce the following result (which I am expecting): A A1 A2 a1 a2 B B1 B2 - - C - - c1 - D D1 - d1 d2 other implementations may output: A A1 A2 a1 a2 B B1 B2 - - - - - c1 - D D1 - d1 d2 This implies two things: (1) Someone interprets the -e option not effective to the (non-empty) join field (in combination with the -a flag), while others don't. (2) There seems to be no way in the current standard (and existing practice, maybe) to explicitly specify the join field properly in the output list. (In the above example, only -o 1.1 *or* -o 2.1 is allowed, however for unpairable lines, those fields are not the same and the output may differ.) Action: 1. Clearly specify the -e option's effect on the join field. The following change would be recommended. -e "string" Replace empty output fields by string "string" except the join field. 2. Clarify how to handle the join field specified in the -o "list" by either "file_number.field" of each file, for unpairable lines. The following change would be recommended. -o "list" ....... For unpairable lines under the -a option, the specified join field in the "list" shall be 1992-02-14 Page 23 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 interpreted as the non-empty field out of the two join fields. If both the join fields are empty, it is unspecified whether such unpairable lines are output separately or "combined" by the "empty" join field. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 42 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.34 o 42 Sect 4.34 (locale) OBJECTION. page 480, line 7617-7622: Problem: While multiple "name" operands are allowed in the Synopsis, no description about the multiple "names" is provided. Action: Add a description about the multiple combination of the "category" names and/or the "keyword" names after line 7622 as an another role. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 43 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.34 o 43 Sect 4.34 (locale) OBJECTION. page 481, line 7637-7645: Problem: As no concrete definitions of name spaces of the "category" names and the "keyword" names are described in "Sect 2.5 - Locale" and in "Sect 4.35 - localedef", there is no standard way to identify given operands as the "category" names or the "keyword" names. Action: (1) If there is an undocumented assumption that the name spaces of the "category" names and the "keyword" names are orthogonal or clearly separated (I believe so), add such definitions in Sect 2.5 - Locale and in Sect 4.35 - localedef. (2) If there is a consensus that all category names shall be preceded by "LC_", change the Synopsis line 7608 as described below and replace the description of the "name" on line 7637-7645 with the descriptions of "LC_name" and "keyword". locale .... [ LC_name ... ] [ keyword ... ] (3) If there is no intention to isolate these name spaces including implementation definable names of "category" and "keyword", add new option flags to specify the name space, something like: locale .... [-C category] ... [-K keyword] ... 1992-02-14 Page 24 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 44 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.35 o 44 Sect 4.35 (localedef) OBJECTION. page 487, line 7859-7862: Problem: The statement "... shall be interpreted as a pathname where the created locale definition(s) shall be stored" is ambiguous. Does this intend that the pathname shall be a regular file or a directory, or both? Action: State clearly its intention. The recommended action would be: Add the following sentence after the above sentence. "It is implementation defined what type of a file (e.g. a regular file or a directory) is allowed for the pathname target." _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 45 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.35 c 45 Sect 4.35 (localedef) EDITORIAL COMMENT. page 488, line 7887: Problem: "... any environment variables beginning with LC_. and LC_* variables as described in 2.6" seems an editorial error. Action: To keep consistency with other clauses, replace the above with: "... any environment variables beginning with LC_" full-stop. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 46 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 4.35 c 46 Sect 4.35 (localedef) COMMENT. page 488, line 7890 - 7895: Problem: The following description exists in LC_CTYPE: 1992-02-14 Page 25 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 "This variable shall have no affect on the processing of localedef input data; the POSIX Locale shall be used for this purpose, regardless of the value of this value.". On the other hand, a locale definition file allows to specify a character itself in it. (2.5.2 Locale Definition, Page 65) When characters which are not defined in POSIX locale are present in a localedef input data, localedef cannot parse strings correctly. For example, a byte which corresponds to a localedef special character (such as separators) may occur in the second or subsequent bytes of a multibyte character. Action: Reconsider the reference of LC_CTYPE environment variable by localedef or reconsider the specification of locale definition. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 47 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.36 o 47 Sect 4.36 (logger) OBJECTION. page 491 - 493: Problem: It is unclear that this standard requires non "POSIX locale" messages shall be also readable for later use by a system administrator or programmer. Action: Add the following sentence after the last paragraph of 4.3.2 Description clause. The messages of any supported locale are also expected to be readable at later reference by a system administrator, however, it is implementation defined whether the saved messages of non "POSIX locale" can be readable at later reference. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 48 of 57 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ @ 4.40 c 48 Sect 4.40 (mailx) EDITORIAL COMMENT. page 511, line 8638 - 8639: Problem: The term "message" in the specification of mailx is used as that means "mail message". To avoid the confusion in the text of LC_MESSAGES, it is necessary to add several words to draw a distinction between the two different "message"s. 1992-02-14 Page 26 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Action: Replace "This variable shall determine the language in which messages should be written." with "This variable shall determine the language in which (diagnostic and informative) messages should be written.". ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Toshinori Numata) Phone: +81-44-754-3343 Seq: 49 of 57 Email: numa@sysrap.cs.fujitsu.co.jp Fax: +81-44-754-3522 ------------------------------------------------------------------------------ @ 4.45 o 49 Sect 4.45 (od) OBJECTION. page 533, line 9350-9353: Problem: If dumping in character dump mode (with option "-t c") and "-j skip" and/or "-N count" option is specified, it is possible to start (or finish) dumping in the middle of multibyte character. In such case, dumped result may not consist valid multibyte characters. Action: Add the following sentence after the description of multibyte character dump (line 9353): When "-j skip" and/or "-N count" option is specified, and if the dump is started or finished in the middle of multibyte character, the result is implementation-defined. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 50 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.46 o 50 Sect 4.46 (paste) COMMENT. page 538, line 9538 - 9545: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. In the case of concatenating lines, extra shift codes could be occur. Implementation may remove extra shift codes. Action: Add following sentence to description: 1992-02-14 Page 27 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 If the shift coding were applied, extra shift codes are removed. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 51 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.48 o 51 Sect 4.48 (pax) OBJECTION. page 548, line 9587-9916: Problem: According to the current proposed interface, the "copy" function needs two flags: -r and -w. From a (naive) user's point of view, this makes the description of functionality and option flags complicated and thus may cause unnecessary misunderstanding and misusage. The standard should not introduce such misleading option flag(s), with the following basic rule in mind. Rule: "one option flag for one function", in principle. Action: Replace the "-r and -w" flags with: -P (coPy) or -Y (copY), and re-write the corresponding descriptions in the whole clause 4.48. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 52 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.49 c 52 Sect 4.49 (pr) EDITORIAL COMMENT. page 562, line 10432: Problem: "Produce output that is *columns* wide" is misleading. Action: Change the above to: "Produce multicolumn output with *column* columns", or "Produce output in *column* columns", where *xxx* denotes italic form. 1992-02-14 Page 28 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 53 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.49 c 53 Sect 4.49 (pr) COMMENT. page 563, line 10438-10439: Problem: "When used with -t (neither header nor trailer option), use the minimum number of lines to write the output" is unclear. Since multicolumn output with -a (i.e. "horizontal" multicolumn) always produce the minimum number of lines, is this sentence trying to talk about the case of "vertical" multicolumn output only? In spite of the fact that most existing implementations produce the "minimum" number of lines even without the -t flag for the "vertical" multicolumn output, does this standard require such minimumness only for the case with the -t flag? On related notes. The word "balanced" in line 10436, on page 562: "Whether or not text columns are balanced is unspecified" means both in "lines" and in "each column width and layout"? Action: Make clear the intention and then change or delete it. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yasushi Nakahara) Phone: +81-428-32-0722 Seq: 54 of 57 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0408 ------------------------------------------------------------------------------- @ 4.49 o 54 Sect 4.49 (pr) OBJECTION. page 562-568: Problem: (1) column width and character width This standard requires that the pr utility shall produce output on "column position" basis, not "byte count" basis. I definitely agree upon this approach, however, the only thing I can read from this standard regarding "column width" of each "character" (by definition, this means a multibyte character in general) is the following: "Each printable character in the *portable character set* 1992-02-14 Page 29 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 has a column width of one. The standard utilities, when used as described in this standard, assume that all characters have integral column widths. The column width of a character is not necessarily related to the internal representation of the character (numbers of bits or octets)." [2.2.2.31 column position, on page 30.] This means that the pr utility has to know *column width* of each *character*, except each printable character in the *portable character set*. And in 4.49.5.3 Environment Variable clause, this standard refers to LC_CTYPE as if it provides such width information not only of each single-byte character, but also of each "multibyte" character. However, this standard does not provide any specifications how to get width information, for example, from such LC_CTYPE and/or other character attribute database. (2) Multibyte character allowance for the -e, -i, -n and -s options. On another note. For the -e[char][gap], -i[char][gap], -n[char][width] and -s[char] options, this standard says that "char" is a character in general, not a "single-byte" character, i.e. it allows a multibyte character as the optional "char". I personaly appreciate this specification, but on the other hand I have a great concern whether such multibyte character allowance is a clear intention of this standard, since both the proposed option parsing function "getopt()" and its existing practice have a single-byte interface only (as well as the case of main()), and hence application programs (including this pr utility), which use such option parsing function or similar, may not handle the optional [char] argument as a "multibyte" character unless it is explicitly stated so in the specification. *Automagical* support of multibyte characters by just using a term "character" is misleading. Action: (1) a) If the standard intends that the pr utility shall support any character in the supported locale in a standard manner, it is highly recommended that a certain specification of such character width definition/recognition mechanisms be introduced in this standard. b) If this is not the case, it seems that such support is left to implementation. And, therefore, the following explicit statement is recommended to be added in a normative clause of the pr utility. "Although the standard expects that all characters in the supported locale are well handled by the pr utility in terms of their character width and character column positions, it is implementation defined whether all the other characters 1992-02-14 Page 30 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 than the printable characters in the POSIX locale can be treated properly in their character width and positions." And make it clear in the LC_CTYPE environment variable paragraph that LC_CTYPE does not necessarily provide information about character column-width. (2) If the multibyte character allowance is really intended, please add such an explicit statement. If not, the following description should be added in a normative clause of the pr utility. "Regarding *char* in the -e[char], -i[char], -n[char] and -s[char] options, it is implementation defined whether a multibyte character (non single-byte character) is allowed as their optional character arguments." _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 55 of 57 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- @ 4.60 o 55 Sect 4.60 (tail) COMMENT. page 623, line 12476 - 12486: Problem: In the case of shift encoding, it is unclear how to handle designate/invoke sequences in the current draft. Appropriate shift codes shall be inserted, if necessary, in the beginning of output. Action: Add following sentence to description: Appropriate shift codes shall be inserted, if necessary, in the beginning of output. ______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Yukiharu Imafuku) Phone: +81-44-548-4555 Seq: 56 of 57 Email: ima@rd.nttdata.jp FAX: +81-44-548-4551 ------------------------------------------------------------------------------- @ 4.71 o 56 Sect 4.71 (wc) OBJECTION. page 674, line 14065 - 14070: Problem: Current options of wc command supports only byte, line and word counts. It is desirable for multibyte characters be counted as characters also. Action: 1992-02-14 Page 31 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 Enhance the options of wc command to write the number of characters in each file. A proposed option is as follows: -n Write to the standard output the number of characters in each input file. _______________________________________________________________________________ ITSCJ/SSI/POSIX WG (Toshinori Numata) Phone: +81-44-754-3343 Seq: 57 of 57 Email: numa@sysrap.cs.fujitsu.co.jp Fax: +81-44-754-3522 ------------------------------------------------------------------------------ @ B.5 o 57 Sect B.5 (regcomp() family) OBJECTION. page 788, line 618: Problem: The functions regcomp() and regexec() should have wchar_t version interface because of the following reasons: (1) To use regcomp() and regexec() functions in a program which handles its internal character data in wchar_t data type, for example a text editor, it should do the following process: 1. convert internal text data from wchar_t array to char array. 2. search pattern using regexec(). The conversion should be done every time the program searches a pattern, for each line. It is too heavy overhead to such programs and it will make wchar_t based programming too hard. If wchar_t version of regcomp()/regexec() functions are provided, no wchar_t-to-char conversion is needed. (2) If regexec() is used on a system which uses state-dependent encoding, the following problem should occur. When the function regexec() is called with REG_NOSUB flag in the cflags argument is not set, and when a match is found, the function returns matched position in pmatch argument. If state-dependent encoding is used, this pmatch information may be useless because it sometimes will not returns state information. For example, suppose we are using a state-dependent encoding, which has two shift state and switches initial shift state to another shift state by SO (Shift Out) code and return from another shift state to initial shift state by SI (Shift In) code. If searched pattern is: 1992-02-14 Page 32 ITSCJ/SSI/POSIX WG Comments on CD 9945-2.2 #define SO 0x0e #define SI 0x0f char *pattern = { SO, 'X', 'Y', 'Z', SI, ' ' }; and the string is: char *string = { SO, 'A', 'B', 'C', 'X', 'Y', 'Z', 'U', SI, ' ' }; the regexec() function will return pmatch information which says: pmatch[0].rm_so = 4 (start of matched string) pmatch[0].rm_eo = 7 (end of matched string) pmatch[1].rm_so = -1 pmatch[1].rm_eo = -1 But in this case, naive program will treated the matched string as { 'X', 'Y', 'Z' } in INITIAL SHIFT STATE, not in ANOTHER SHIFT STATE, because returned string position information does not contains any state information. Action: Define wchar_t version of regcomp(), regexec() functions, which takes (wchar_t *) type string argument, not (char *) type. Because wchar_t string has no state dependent information, this problem does not happen. It is also useful for programs which treats all character/string information in wchar_t type, instead of char type. _______________________________________________________________________________ 1992-02-14 Page 33