ISO/IEC JTC1/SC22/WG15 N226 (WG15RIN SRTN 8) Japanese Concerns about CD 9945-2.2 and related issues. SSI/POSIX WG IPSJ/ITSCJ, JAPAN November 5, 1991 1. Japan strongly supports the POSIX.2 approach about character handling in a sense that "a character is a character, not a byte", which means that a character is a multibyte character in general. 2. Japan believes that the most important thing of I18N is to achieve a kind of "character independency", taking into account of the following aspects. - Character counts != byte counts - Character counts != display width - Byte counts != display width - Only the wchar_t type in C (a wide character) corresponds to the concept of a character. 3. Although Japan has just started to review the POSIX.2 (CD 9945- 2.2) and not completed the work, the following are the major concerns about I18N features of POSIX.2; (1) Shift encoding It is unclear that the POSIX.2 has some undocumented assumptions and/or restrictions about the encoding schemes including shift encoding. The following are possible interpretations. (a) Shift encoding is out of scope. (b) Shift encoding is allowed, but it is a feature of implementation defined. (c) To support shift encoding is one of the issues, and it would be considered in the future draft. Japan feels that such assumptions/restrictions, if any, should be clearly stated in the POSIX.2. -1- ISO/IEC JTC1/SC22/WG15 N226 (WG15RIN SRTN 8) (2) Width of a character/string There are several utilities (e.g. fold) in the POSIX.2 and POSIX.2a, which are character width sensitive. And, the definition of "column" seems appropriate for so called character cell terminals. However, there is no suitable way to specify/identify the width of a character either in POSIX.2 (LC_CTYPE or charmap file) or in POSIX.1. Japan feels that some specification should be developed in the future version with a sufficient discussion within WG15/RIN. (3) Wide character support and the ISO/C MSE As stated earlier, Japan definitely believes that a wide character support is essential to implement the POSIX.2 specifications of "character" handling. And such wide character supporting interfaces are useful not only for POSIX.2 implementation, but also for all POSIX applications which really aim to get a world-wide portability from country to country. Japan thinks the following approach is the most convenient; o Add the ISO/C MSE features in one of the near future POSIX.2/POSIX.1 extensions. o Re-examine all APIs which handle "character" (not byte) stream/text, from a wide character point of view. In particular, the following functions in POSIX.2 are candidates of such enhancement, as of now. - regcomp - regex (4) Additional character classes Japan feels that there is a need to provide a suitable interface to handle various character classes which are dependent on some national languages and cultural specific presentation. In conjunction with a new function is_wctype() in the ISO/C MSE, some enhancement of LC_CTYPE description file should be considered for "user/implementation definable character classes". -2- ========================== Memo of N226 Discussion =========================== 4.a SRTN8, Japanese concerns re CD 9945-2 - Japan would like to make this document visible to other countries - need to assign number although Japan plans to expand the document and deliver more detailed response before the end of the year. Japan needs to handle multiple char sets simultaneously, per ISO 2022; data files often contain various escape sequences which indicate which char set data follows; discussion of these requirements in relation to nature of LC_CTYPE: - Hal indicated that he did not feel that LC_CTYPE would prevent interpretation of command line args consistent with Japanese needs - item 3 on Pg2 of comments really deal with 9945-1 features? Japan has difficulty dealing with wide char data with traditional Lib C; would like to see wide char handling capabilities in .2 utilities, both for functionality and as an example of wide char handling for programmers. Japan is not sure whether it would be more appropriate to include wide char (ISO C/MSE) features in .1 or .2; .1a might be the appropriate place to include these extensions. (Although it might be feasible to include in the LIS spec, WG15 has told US body that LIS MUST be the same as the 1990 standard, thus no extensions could be included). Hal suggested that these comments be included in the Japanese ballot, so that they would be on record officially, and the US could deal with them as work on .1 AND .2 proceed. A Japanese "Yes" vote with this comment, creating a WG15 issue, would allow Hal to insist that extensions be included in .2b (and .1a) RIN SRTN8 should be registered to raise visibility to other national bodies, when Japanese ballot comments on CD2.2 arrive, add items to issue log for resolution prior to final standard approval.