From ynk@ome Tue Nov 12 13:19:17 1991 Received: from mcsun.EU.net by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA23832; Tue, 12 Nov 91 13:19:17 +0100 Received: by mcsun.EU.net via EUnet; id AA25476 (5.65a/CWI-2.123); Tue, 12 Nov 1991 13:19:26 +0100 Received: by kddlab.kddlabs.co.jp (5.61/6.2Junet) id AA24702; Tue, 12 Nov 91 20:46:02 +0900 Received: by tis1.tis.toshiba.co.jp (3.2/6.4J.6-R47) id AA26390; Tue, 12 Nov 91 20:36:51 JST Return-Path: Message-Id: <9111121136.AA26390@tis1.tis.toshiba.co.jp> To: wg15@dkuug.dk, wg14@dkuug.dk Cc: x3j16-intl@redbone.att.com From: ynk@ome.toshiba.co.jp (Yasushi Nakahara) Subject: WG15/N226: Japanese concerns about CD 9945-2.2 and MSE Date: Tue Nov 12 17:45:50 JST 1991 X-Charset: ASCII X-Char-Esc: 29 Hi all WG15 people and related I18N experts, On Keld's request at the Kista meeting regarding on-line document circulation, I'm sending the SC22/WG15 N226; "Japanese concerns about the CD 9945-2.2 (POSIX.2) and related issues", which was submitted at the WG15 November meeting by the Japanese member body based upon its WG's discussion in Japan. In order to help readers to understand the Japanese concerns and the discussion made at the WG15 (subgroup, not as WG15RIN, but quite similar) meeting in Kista, Stockholm, I'm also enclosing a part of the raw draft minutes which Ralph Barker kindly took and offered for a later review and comment. One of the most important issues is about incorporation of, and harmonization with, the "MSE for ISO/C" in the POSIX environment. See 3. (3) in N226 and the corresponding discussion in the memo 4.a for more details. Please read through the attached two materials carefully and send your comments to wg15rin@dkuug.dk or posix@u-tokyo.ac.jp (Japanese POSIX WG alias). Regards, Yasushi Nakahara TOSHIBA Corp. Phone: +81 428-32-0722 Fax: +81 428-32-0408 Email: ynk@ome.toshiba.co.jp | y.nakahara@ui.org | y.nakahara@xopen.co.uk | ..!tsbome!ynk p.s. Keld, I'm sending this to wg15@dkuug.dk rather than wg15rin@dkuug.dk. So, if some people in WG15RIN are missing in the SC22WG15 mailing list, please forward this to such people. Thank you in advance, again. ========================== SC22/WG15 N226 =================================== ISO/IEC JTC1/SC22/WG15 N226 (WG15RIN SRTN 8) Japanese Concerns about CD 9945-2.2 and related issues. SSI/POSIX WG IPSJ/ITSCJ, JAPAN November 5, 1991 1. Japan strongly supports the POSIX.2 approach about character handling in a sense that "a character is a character, not a byte", which means that a character is a multibyte character in general. 2. Japan believes that the most important thing of I18N is to achieve a kind of "character independency", taking into account of the following aspects. - Character counts != byte counts - Character counts != display width - Byte counts != display width - Only the wchar_t type in C (a wide character) corresponds to the concept of a character. 3. Although Japan has just started to review the POSIX.2 (CD 9945- 2.2) and not completed the work, the following are the major concerns about I18N features of POSIX.2; (1) Shift encoding It is unclear that the POSIX.2 has some undocumented assumptions and/or restrictions about the encoding schemes including shift encoding. The following are possible interpretations. (a) Shift encoding is out of scope. (b) Shift encoding is allowed, but it is a feature of implementation defined. (c) To support shift encoding is one of the issues, and it would be considered in the future draft. Japan feels that such assumptions/restrictions, if any, should be clearly stated in the POSIX.2. -1- ISO/IEC JTC1/SC22/WG15 N226 (WG15RIN SRTN 8) (2) Width of a character/string There are several utilities (e.g. fold) in the POSIX.2 and POSIX.2a, which are character width sensitive. And, the definition of "column" seems appropriate for so called character cell terminals. However, there is no suitable way to specify/identify the width of a character either in POSIX.2 (LC_CTYPE or charmap file) or in POSIX.1. Japan feels that some specification should be developed in the future version with a sufficient discussion within WG15/RIN. (3) Wide character support and the ISO/C MSE As stated earlier, Japan definitely believes that a wide character support is essential to implement the POSIX.2 specifications of "character" handling. And such wide character supporting interfaces are useful not only for POSIX.2 implementation, but also for all POSIX applications which really aim to get a world-wide portability from country to country. Japan thinks the following approach is the most convenient; o Add the ISO/C MSE features in one of the near future POSIX.2/POSIX.1 extensions. o Re-examine all APIs which handle "character" (not byte) stream/text, from a wide character point of view. In particular, the following functions in POSIX.2 are candidates of such enhancement, as of now. - regcomp - regex (4) Additional character classes Japan feels that there is a need to provide a suitable interface to handle various character classes which are dependent on some national languages and cultural specific presentation. In conjunction with a new function is_wctype() in the ISO/C MSE, some enhancement of LC_CTYPE description file should be considered for "user/implementation definable character classes". -2- ========================== Memo of N226 Discussion =========================== 4.a SRTN8, Japanese concerns re CD 9945-2 - Japan would like to make this document visible to other countries - need to assign number although Japan plans to expand the document and deliver more detailed response before the end of the year. Japan needs to handle multiple char sets simultaneously, per ISO 2022; data files often contain various escape sequences which indicate which char set data follows; discussion of these requirements in relation to nature of LC_CTYPE: - Hal indicated that he did not feel that LC_CTYPE would prevent interpretation of command line args consistent with Japanese needs - item 3 on Pg2 of comments really deal with 9945-1 features? Japan has difficulty dealing with wide char data with traditional Lib C; would like to see wide char handling capabilities in .2 utilities, both for functionality and as an example of wide char handling for programmers. Japan is not sure whether it would be more appropriate to include wide char (ISO C/MSE) features in .1 or .2; .1a might be the appropriate place to include these extensions. (Although it might be feasible to include in the LIS spec, WG15 has told US body that LIS MUST be the same as the 1990 standard, thus no extensions could be included). Hal suggested that these comments be included in the Japanese ballot, so that they would be on record officially, and the US could deal with them as work on .1 AND .2 proceed. A Japanese "Yes" vote with this comment, creating a WG15 issue, would allow Hal to insist that extensions be included in .2b (and .1a) RIN SRTN8 should be registered to raise visibility to other national bodies, when Japanese ballot comments on CD2.2 arrive, add items to issue log for resolution prior to final standard approval. =============================================================================== EOF