From noda@lang2.bs1.mt.nec.co.jp Tue Dec 1 07:54:56 1992 Received: from TYO.gate.nec.co.jp ([192.135.93.2]) by dkuug.dk with SMTP id AA21998 (5.65c8/IDA-1.4.4j for ); Tue, 1 Dec 1992 07:54:56 +0100 Received: from mailsv.nec.co.jp ([133.200.254.203]) by TYO.gate.nec.co.jp (5.65c/6.4J.6-TYO_gate) id AA23028; Tue, 1 Dec 1992 15:54:41 +0900 Received: from ideon.d1.bs2.mt.nec.co.jp ([133.201.52.2]) by mailsv.nec.co.jp (5.65c/6.4J.6-NEC_OSDAD) id AA25017; Tue, 1 Dec 1992 15:55:06 +0900 Received: from audi.lang2.bs1.mt.nec.co.jp by ideon.d1.bs2.mt.nec.co.jp (4.0/6.4J.6) id AA06118; Tue, 1 Dec 92 15:52:05 JST Received: by audi.lang2.bs1.mt.nec.co.jp (4.0/6.4J.6) id AA00655; Tue, 1 Dec 92 15:53:53 JST Received: by euphony.lang2.bs1.mt.nec.co.jp (4.1/6.4J.6) id AA01130; Tue, 1 Dec 92 15:54:35 JST Date: Tue, 1 Dec 92 15:54:35 JST From: noda@lang2.bs1.mt.nec.co.jp (Makoto Noda) Return-Path: Message-Id: <9212010654.AA01130@euphony.lang2.bs1.mt.nec.co.jp> To: xopenj!xopen!g.miller@necbs6.bsd.mt.nec.co.jp Cc: XoJIG@necbs6.bsd.mt.nec.co.jp, wg15@dkuug.dk, wg20@dkuug.dk Subject: Re: Latest version of MSE X-Charset: ASCII X-Char-Esc: 29 We Japanese C Working Group received the comments on MSE from Mr.Gary Miller, X/Open i18n WG chair, through SC22/WG20 mailing list (SC22WG20.488). Gary-san, thank you very much. The following is Japanese response to Gary-san's comments. We feel that a few points remain to be discussed at next WG14 Washington meeting. 1. Mixing use of narrow and wide-oriented stream > In section 2.2.2 States of a stream, the paragraph which reads: > > When one of the byte input/output functions is performed on an > initial stream, the stream shall become narrow-oriented. When > one of the wide character input/output functions is performed on > an initial stream, the stream shall become wide-oriented. If any > of the byte input/output functions is performed on a > wide-oriented stream or any of the wide input/output functions is > performed on a narrow-oriented stream, the behavior is > undefined. > ^^^^^^^^^ > > seems to be over specified. The standard should make this behavior > unspecified instead of undefined. Japanese opinion is still that it should be "undefined" because we think that a mixing use of narrow and wide-oriented stream should be prohibited. Anyway this point seems to be one of open issues. 2. Metric of formatted I/O functions > There is an inconsistency in the way that wide characters are > handled by the formatted I/O functions: > > In section 3.4.2.1 The fwprintf function, the paragraph which > describes the %s format directive reads: > > The argument shall be a pointer to an array of wchar_t type. > Wide characters from the array are converted to multibyte > characters and written up to (but not including) a terminating > null wide character; if the precision is specified, no more > than that many wide characters are converted to multibyte > characters and written. If the precision is not specified > or is greater than the size of the array, the array shall > contain a null wide character. > > In section 3.4.2.2 The fwscanf function, the paragraph which > describes the %s format directive reads: > > Matches a sequence of non-white-space wide characters. The > corresponding argument shall be a pointer to the initial wide > character of an array large enough to accept the sequence > and a terminating null wide character, which will be added > automatically. > > In section 3.4.3.1 The fprintf function, the paragraph which > describes the %S format directive reads: > > The argument shall be a pointer to an array of wchar_t type. > Wide characters from the array are converted to multibyte > characters (each as if by a call to the wctomb function, > except that the shift state of the wctomb function is not > affected) up to (but not including) the terminating null wide > character. The resulting multibyte characters are written. If > no precision is specified, the array shall contain a null wide > character. If a precision is specified, no more than that many > characters (bytes) are written (including shift sequences, if > any), and the array shall contain a null wide character if, to > equal or to surpass the multibyte character sequence length > given by the precision, the function would need to access a > wide character one past the end of the array. In no case > shall a partial multibyte character be written. Redundant > shift sequences may result if multibyte characters have a > state-dependent encoding. > > In section 3.4.3.2 The fscanf function, the paragraph which > describes the %S format directive reads: > > Matches a sequence of multibyte characters that begins in the > initial shift state none of which are also single byte white > space characters (as specified by the isspace function). Each > multibyte character is converted as if by a call to the mbtowc > function, except that the shift state of the mbtowc function > is not affected. The corresponding argument shall be a > pointer to an array of wchar_t large enough to accept the > sequence and the terminating null wide character, which will > be added automatically. > > In all of the above cases, with the exception of the fprintf > function %S format directive with precision specified, wide > characters are dealt with as complete characters. It is only in > this one case that the metric deviates from wide characters to > bytes. The proper metric for precision in this case is number of > wide characters. > > The fprintf %S description should be changed to read: > > The argument shall be a pointer to an array of wchar_t type. > Wide characters from the array are converted to multibyte > characters (each as if by a call to the wctomb function, > except that the shift state of the wctomb function is not > affected) up to (but not including) the terminating null > wide character. The resulting multibyte characters are > written. If a precision is specified, no more than that many > wide characters are converted to multibyte characters and > written. If the precision is not specified or is greater > than the size of the array, the array shall contain a null > wide character. Redundant shift sequences may result if > multibyte characters have a state-dependent encoding. > > In ISO/IEC 9899:1990 subclause 7.9.6.1, the fprintf precision > description, change > > "or the maximum number of characters to be written from a > string in s conversion." > > to > > "the maximum number of characters (bytes) to be written from > a string in s conversion, or the maximum number of wide > characters to be written from a wide character string in S > conversion." A programming model which uses the fprintf and fscanf functions should be distinguished from wide oriented model, so that we intentionally specified that the metric for precision in %S of the fprintf function is "byte". Moreover, the metric for the fscanf function is actually "byte" also. If it is difficult to read that the metric for fprintf is byte, we would rather change the description of fprintf. 3. Extended multibyte functions > In section 3.8 Extended Multibyte Functions, there are function > descriptions that look a lot like functions that I proposed back > in December 1989: > > int s_mblen (const char *s, size_t n, state_t *state); > int s_mbtowc (wchar_t *pwc, const char *s, size_t n, state_t *state); > int s_wctomb (char *s, wchar_t wchar, state_t *state); > size_t s_mbstowcs (wchar_t *pwcs, const char *s, size_t n, state_t *state); > size_t s_wcstombs (char *s, const wchar_t *pwcs, size_t n, state_t *state); > > Would anyone like to comment :-) ? I remember Gary-san's proposed functions. December 1989 was too eraly for us to investigate an introduction of these basic functions into a frame of MSE. If you have comments on such functions, please let us know. Thank you, -- makoto noda (nec) phone:+81.3.3456.7446; fax:+81.3.3456.7448 -- e-mail : noda@LANG2.BS1.mt.nec.co.jp -- xopen : m.noda@xopen.co.uk - or - xopenj!necbs6!noda