From kido@vnet.IBM.COM Mon Mar 15 04:03:14 1993 Received: from vnet.IBM.COM ([192.239.48.4]) by dkuug.dk with SMTP id AA29551 (5.65c8/IDA-1.4.4j); Mon, 15 Mar 1993 03:01:20 +0100 Message-Id: <199303150201.AA29551@dkuug.dk> Received: from YMTVM8 by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 4020; Sun, 14 Mar 93 21:00:37 EST Date: Mon, 15 Mar 93 11:00:58 JST From: "Akio Kido" To: sc22wg15@dkuug.dk, sc22wg20@dkuug.dk, XoJIG@xopen.co.uk, sig-international@osf.org, uojlg-bse@uiap.ui.org, efischer@donald.aix.kingston.ibm.com Subject: MSE 4.5.3.3.mm X-Charset: ASCII X-Char-Esc: 29 .SK .H 4 "Formatted input/output functions" .P The formatted input/output functions from \*(AC subclause 7.9.6 are adjusted to include two additional conversion specifiers, .Cf C and .Cf S , which provide input and output of wide characters and wide strings. .H 5 "The \*(Cwfprintf\fP function" .eX 7.9.6.1 .P Adjust the precision description to include .Cf S with .Cf s so that it contains the following fragment: .DS I F 5 or the maximum number of characters (bytes) to be written from a string in .Cf s or .Cf S conversion .DE .P Add the following two paragraphs to occur within the description of the conversion specifiers. .P .VL \w'\*(Cwx,Xm'u .LI "\*(CwC\fP" The .Cf wchar_t argument is processed as if by the .Cf S conversion specifier with no precision and an argument that points to a two-element array of .Cf wchar_t , the first element containing the .Cf wchar_t argument to the .Cf C conversion specifier and the second a null wide character. .LI "\*(CwS\fP" The argument shall be a pointer to an array of .Cf wchar_t type. Wide characters from the array are converted to multibyte characters (each as if by a call to the .Cf wcrtomb function, with the conversion state described by an .Cf mbstate_t object initialized to zero before the first wide character is converted) up to and including a terminating null wide character. The resulting multibyte characters are written up to (but not including) the terminating null character (byte). If no precision is specified, the array shall contain a null wide character. If a precision is specified, no more than that many characters (bytes) are written (including shift sequences, if any), and the array shall contain a null wide character if, to equal or to surpass the multibyte character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case shall a partial multibyte character be written.\*F .FS Redundant shift sequences may result if multibyte characters have a state-dependent encoding. .FE .LE .P The above extension is applicable to all the formatted output functions specified in \*(AC. .HU Examples .P The examples are adjusted to include the following: .P In this example, multibyte characters do not have a state-dependent encoding, and the multibyte members of the extended character set consist of two bytes, the first of which is denoted here by a .Cf \(sq and the second by an uppercase letter. .P Given the following wide string with length seven, .Cb static wchar_t wstr[] = L"\z\(sq\0X\z\(sq\0Yabc\z\(sq\0Z\z\(sq\0W"; .Ce the seven calls .Cb fprintf(stdout, "|1234567890123|\en"); fprintf(stdout, "|%13S|\en", wstr); fprintf(stdout, "|%-13.9S|\en", wstr); fprintf(stdout, "|%13.10S|\en", wstr); fprintf(stdout, "|%13.1S|\en", wstr); fprintf(stdout, "|%13.15S|\en", &wstr[2]); fprintf(stdout, "|%13C|\en", wstr[5]); .Ce will print the following seven lines: .Cb |1234567890123| | \z\(sq\0X\z\(sq\0Yabc\z\(sq\0Z\z\(sq\0W| |\z\(sq\0X\z\(sq\0Yabc\z\(sq\0Z | | \z\(sq\0X\z\(sq\0Yabc\z\(sq\0Z| | | | abc\z\(sq\0Z\z\(sq\0W| | \z\(sq\0Z| .Ce .H 5 "The \*(Cwfscanf\fP function" .eX 7.9.6.2 .P Adjust the definition of input failure to include encoding errors so that it contains the following fragment: .DS I F 5 (if an encoding error occurs or due to the unavailability of input characters) .DE .P Adjust the initial white space character skip description additionally to include .Cf C so that it reads as follows: .DS I F 5 Input (single byte) white space characters (as specified by the .Cf isspace function) are skipped, unless the specification includes a .Cf [ , .Cf c , .Cf C , or .Cf n specifier. .DE .P Add the following two paragraphs to occur within the description of the conversion specifiers. .P .VL \w'\*(Cwx,Xm'u .LI "\*(CwC\fP" Matches a sequence of multibyte characters that begins and ends in the initial shift state. Each multibyte character in the sequence is converted to a wide character as if by a call to the .Cf mbrtowc function, with the conversion state described by an .Cf mbstate_t object initialized to zero before the first multibyte character is converted. The number of wide characters matched is specified by the field width (1 if no field width is present in the directive). The corresponding argument shall be a pointer to the initial element of an array of .Cf wchar_t large enough to accept the resulting sequence of wide characters. No null wide character is added. .LI "\*(CwS\fP" Matches a sequence of multibyte characters that begins and ends in the initial shift state. None of the multibyte characters in the sequence are also single byte white space characters (as specified by the .Cf isspace function). Each multibyte character is converted to a wide character as if by a call to the .Cf mbrtowc function, with the conversion state described by an .Cf mbstate_t object initialized to zero before the first multibyte character is converted. The corresponding argument shall be a pointer to the initial element of an array of .Cf wchar_t large enough to accept the sequence and the terminating null wide character, which shall be added automatically. .LE .P The above extension is applicable to all the formatted input functions specified in \*(AC. .HU Examples .P The examples are adjusted to include the following: .P In these examples, multibyte characters do have a state-dependent encoding, and multibyte members of the extended character set consist of two bytes, the first of which is denoted here by a .Cf \(sq and the second by an uppercase letter, but are only recognized as such when in the alternate shift state. The shift sequences are denoted by .Cf \(ua and .Cf \(da , in which the first causes entry into the alternate shift state. .AL 1 .LI After the call: .Cb #include /*...*/ char str[50]; fscanf(stdin, "a%s", str); .Ce with the input line: .Cb a\z\(ua\0\z\(sq\0X\z\(sq\0Y\z\(da\0 bc .Ce .Cf str will contain .Cf \z\(ua\0\z\(sq\0X\z\(sq\0Y\z\(da\0\e0 assuming that none of the bytes of the shift sequences (or of the multibyte characters, in the more general case) appears to be a single byte white space character. .LI In contrast, after the call: .Cb #include #include /*...*/ wchar_t wstr[50]; fscanf(stdin, "a%S", wstr); .Ce with the same input line, .Cf wstr will contain the two wide characters that correspond to .Cf \(sqX and .Cf \(sqY and a terminating null wide character. .LI However, the call: .Cb #include #include /*...*/ wchar_t wstr[50]; fscanf(stdin, "a\z\(ua\0\z\(sq\0X\z\(da\0%S", wstr); .Ce with the same input line will return 0 due to a matching failure against the .Cf \(da sequence in the format string. .LI Even worse, assuming that the first byte of the multibyte character .Cf \(sqX is the same as the first byte of the multibyte character .Cf \(sqY , after the call: .Cb #include #include /*...*/ wchar_t wstr[50]; fscanf(stdin, "a\z\(ua\0\z\(sq\0Y\z\(da\0%S", wstr); .Ce with the same input line, 0 will again be returned, but .Cf stdin will be left with a partially consumed multibyte character. .LE