From kido@vnet.IBM.COM Mon Mar 15 05:04:18 1993 Received: from vnet.IBM.COM ([192.239.48.4]) by dkuug.dk with SMTP id AA00572 (5.65c8/IDA-1.4.4j); Mon, 15 Mar 1993 03:09:28 +0100 Message-Id: <199303150209.AA00572@dkuug.dk> Received: from YMTVM8 by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 4040; Sun, 14 Mar 93 21:08:47 EST Date: Mon, 15 Mar 93 11:09:02 JST From: "Akio Kido" To: sc22wg15@dkuug.dk, sc22wg20@dkuug.dk, XoJIG@xopen.co.uk, sig-international@osf.org, uojlg-bse@uiap.ui.org, efischer@donald.aix.kingston.ibm.com Subject: MSE 4.5.6.mm X-Charset: ASCII X-Char-Esc: 29 .SK .H 3 "Extended multibyte and wide character conversion utilities" .P The header .Cf declares an extended set of functions useful for conversion between multibyte characters and wide characters. .P Most of the following functions\(em\c those that are listed as ``restartable,'' subclauses 4.5.6.3 and 4.5.6.4\(em\c take as a last argument a pointer to an object of type .Cf mbstate_t that is used to describe the current .I "conversion state" from a particular multibyte character sequence to a wide character sequence (or the reverse) under the rules of a particular setting for the .Cf LC_CTYPE category of the current locale. .P The initial conversion state corresponds to the initial shift state of the associated multibyte sequence; a zero-valued .Cf mbstate_t object describes an initial conversion state.\*F\ .FS The easiest way to construct a zero-valued .Cf mbstate_t object is to initialize it when declared: .Fb mbstate_t desc = { 0 }; .Fe .P 0 Another is to make use of a constant zero-valued object when a reset is appropriate: .Fb static const mbstate_t init = { 0 }; /*...*/ desc = init; /* \fIreset to initial\fP */ .Fe .FE Zero-valued .Cf mbstate_t objects are not yet bound to a particular multibyte character sequence or .Cf LC_CTYPE category setting; if an .Cf mbstate_t object with a nonzero value is used with a different multibyte character sequence (or in the other conversion direction) or is used with a different .Cf LC_CTYPE category setting than on earlier function calls, the behavior is undefined.\*F .FS Thus a particular .Cf mbstate_t object can be used, for example, with both the .Cf mbrtowc and .Cf mbsrtowcs functions as long as they are used to step sequentially through the same multibyte character string. .FE .H 4 "Single-byte wide character conversion utilities" .H 5 "The \*(Cwwctob\fP function" .HU Synopsis .Cb #include #include int wctob(wint_t c); .Ce .HU Description .P The .Cf wctob function determines whether .Cf c corresponds to a member of the extended character set whose multibyte character representation is as a single byte when in the initial shift state. .HU Returns .P The .Cf wctob returns .Cf EOF if .Cf c does not correspond to a multibyte character with length one; otherwise, it returns the single byte representation. .H 4 "Conversion state utilities" .H 5 "The \*(Cwsisinit\fP function" .HU Synopsis .Cb #include int sisinit(const mbstate_t *ps); .Ce .HU Description .P If .Cf ps is not a null pointer, the .Cf sisinit function determines whether the pointed-to .Cf mbstate_t object describes an initial conversion state. .HU Returns .P The .Cf sisinit function returns nonzero if .Cf ps is a null pointer or if the pointed-to object describes an initial conversion state; otherwise, it returns zero. .H 4 "Restartable multibyte/wide character conversion utilities" .P These functions differ from the corresponding internal-state multibyte character functions of \*(AC subclause 7.10.7 .Cs ( mblen , .Cf mbtowc , and .Cf wctomb ) in that they have an extra parameter, .Cf ps , of type pointer to .Cf mbstate_t that points to an object that can completely describe the current conversion state of the associated multibyte character sequence. If .Cf ps is a null pointer, each function uses its own internal .Cf mbstate_t object instead. The implementation shall behave as if no library function calls these functions with a null pointer for .Cf ps . .P Also unlike their corresponding functions, the return value does not represent whether the encoding is state-dependent. .P If the encoding is state-dependent, on entry each function takes the described conversion state (either internal or pointed to by .Cf ps ) as current. The conversion state described by the pointed-to object is altered as needed to track the shift state of the associated multibyte character sequence. For encodings without state dependency, the pointer to .Cf mbstate_t parameter shall be ignored. .H 5 "The \*(Cwmbrlen\fP function" .HU Synopsis .Cb #include int mbrlen(const char *s, size_t n, mbstate_t *ps); .Ce .HU Description .P The .Cf mbrlen function is equivalent to the following call: .Cb mbrtowc((wchar_t *)0, s, n, ps != 0 ? ps : &\fIinternal\fP) .Ce where .I .Cf & internal .R is the address of the internal .Cf mbstate_t object for the .Cf mbrlen function. .HU Returns .P The .Cf mbrlen function returns a value between \-2 and .Cf n , inclusive. .rF the .Cf mbrtowc functions (4.5.6.3.2). .H 5 "The \*(Cwmbrtowc\fP function" .HU Synopsis .Cb #include int mbrtowc(wchar_t *pwc, const char *s, size_t n, mbstate_t *ps); .Ce .HU Description .P If .Cf s is a null pointer, the .Cf mbrtowc function determines the number of bytes necessary to enter the initial shift state (zero if encodings are not state-dependent or if the initial conversion state is described). In this case, the value of the .Cf pwc parameter shall be ignored, and the resulting state described shall be the initial conversion state. .P If .Cf s is not a null pointer, the .Cf mbrtowc function determines the number of bytes that are contained in the multibyte character (plus any leading shift sequences) pointed to by .Cf s , produces the value of the corresponding wide character and then, if .Cf pwc is not a null pointer, stores that value in the object pointed to by .Cf pwc . If the corresponding wide character is the null wide character, the resulting state described shall be the initial conversion state. .HU Returns .P If .Cf s is a null pointer, the .Cf mbrtowc function returns the number of bytes necessary to enter the initial shift state. The value returned shall not be greater than that of the .Cf MB_CUR_MAX macro. .P If .Cf s is not a null pointer, the .Cf mbrtowc function shall return the first of the following that applies: .VL \w'\fIpositive\fRm'u .LI 0 if the next .Cf n or fewer bytes form the multibyte character that corresponds to the null wide character. .LI \fIpositive\fR if the next .Cf n or fewer bytes form a valid multibyte character; the value returned is the number of bytes that constitute that multibyte character. .LI \-2 if the next .Cf n bytes form an incomplete (but potentially valid) multibyte character, and all .Cf n bytes have been processed; it is unspecified whether this can occur when the value of .Cf n is less than that of the .Cf MB_CUR_MAX macro.\*F .FS When .Cf n has at least the value of the .Cf MB_CUR_MAX macro, this case can only occur if .Cf s points at (too many) adjacent shift sequences (for implementations with state-dependent encodings). .FE .LI \-1 if an encoding error occurs (when the next .Cf n or fewer bytes do not form a complete and valid multibyte character); the value of the macro .Cf EILSEQ shall be stored in .Cf errno , but the conversion state shall be unchanged. .LE .H 5 "The \*(Cwwcrtomb\fP function" .HU Synopsis .Cb #include int wcrtomb(char *s, wchar_t wc, mbstate_t *ps); .Ce .HU Description .P If .Cf s is a null pointer, the .Cf wcrtomb function determines the number of bytes necessary to enter the initial shift state (zero if encodings are not state-dependent or if the initial conversion state is described). The resulting state described shall be the initial conversion state. .P If .Cf s is not a null pointer, the .Cf wcrtomb function determines the number of bytes needed to represent the multibyte character that corresponds to the wide character given by .Cf wc (including any shift sequences), and stores the resulting bytes in the array whose first element is pointed to by .Cf s . At most .Cf MB_CUR_MAX bytes shall be stored. If .Cf wc is a null wide character, the resulting state described shall be the initial conversion state. .HU Returns .P If .Cf s is a null pointer, the .Cf wcrtomb function returns the number of bytes necessary to enter the initial shift state. The value returned shall not be greater than that of the .Cf MB_CUR_MAX macro. .P If .Cf s is not a null pointer, the .Cf wcrtomb function shall return the number of bytes stored in the array object (including any shift sequences) when .Cf wc is a valid wide character; otherwise (when .Cf wc is not a valid wide character), an encoding error occurs, the value of the macro .Cf EILSEQ shall be stored in .Cf errno and \-1 shall be returned, but the conversion state shall be unchanged. .H 4 "Restartable multibyte/wide string conversion utilities" .P These functions differ from the corresponding internal-state multibyte string functions of \*(AC subclause 7.10.8 .Cs ( mbstowcs and .Cf wcstombs ) in that they have an extra parameter, .Cf ps , of type pointer to .Cf mbstate_t that points to an object that can completely describe the current conversion state of the associated multibyte character sequence. If .Cf ps is a null pointer, each function uses its own internal .Cf mbstate_t object instead. The implementation shall behave as if no library function calls these functions with a null pointer for .Cf ps . .P Also unlike their corresponding functions, the conversion source parameter, .Cf src , has a pointer-to-pointer type. When the function is storing the conversion results (that is, when .Cf dst is not a null pointer), the pointer object pointed to by this parameter shall be updated to reflect the amount of the source processed by that invocation. .P If the encoding is state-dependent, on entry each function takes the described conversion state (either internal or pointed to by .Cf ps ) as current and then, if the destination pointer, .Cf dst , is not a null pointer, the conversion state described by the pointed-to object is altered as needed to track the shift state of the associated multibyte character sequence. For encodings without state dependency, the pointer to .Cf mbstate_t parameter shall be ignored. .H 5 "The \*(Cwmbsrtowcs\fP function" .HU Synopsis .Cb #include size_t mbsrtowcs(wchar_t *dst, const char **src, size_t len, mbstate_t *ps); .Ce .HU Description .P The .Cf mbsrtowcs function converts a sequence of multibyte characters that begins in the shift state described by .Cf ps from the array indirectly pointed to by .Cf src into a sequence of corresponding wide characters, which, if .Cf dst is not a null pointer, are then stored into the array pointed to by .Cf dst . Conversion continues up to and including a terminating null character, but the terminating null wide character shall not be stored. Conversion shall stop earlier in two cases: when a sequence of bytes is reached that does not form a valid multibyte character, or (if .Cf dst is not a null pointer) when .Cf len codes have been stored into the array pointed to by .Cf dst .\*F\ .FS Thus, the value of .Cf len is ignored if .Cf dst is a null pointer. .FE Each conversion takes place as if by a call to the .Cf mbrtowc function. .P If .Cf dst is not a null pointer, the pointer object pointed to by .Cf src shall be assigned either a null pointer (if conversion stopped due to reaching a terminating null character) or the address just past the last multibyte character converted. If conversion stopped due to reaching a terminating null character and if .Cf dst is not a null pointer, the resulting state described shall be the initial conversion state. .HU Returns .P If the input string does not begin with a valid multibyte character, an encoding error occurs: The .Cf mbsrtowcs function stores the value of the macro .Cf EILSEQ in .Cf errno and returns .Cf (size_t)-1 , but the conversion state shall be unchanged. Otherwise, it returns the number of multibyte characters successfully converted, which is the same as the number of array elements modified when .Cf dst is not a null pointer. .H 5 "The \*(Cwwcsrtombs\fP function" .HU Synopsis .Cb #include size_t wcsrtombs(char *dst, const wchar_t **src, size_t len, mbstate_t *ps); .Ce .HU Description .P The .Cf wcsrtombs function converts a sequence of wide characters from the array indirectly pointed to by .Cf src into a sequence of corresponding multibyte characters that begins in the shift state described by .Cf ps , which, if .Cf dst is not a null pointer, are then stored into the array pointed to by .Cf dst . Conversion continues up to and including a terminating null wide character, but the terminating null character (byte) shall not be stored. Conversion shall stop earlier in two cases: when a code is reached that does not correspond to a valid multibyte character, or (if .Cf dst is not a null pointer) when the next multibyte character would exceed the limit of .Cf len total bytes to be stored into the array pointed to by .Cf dst . Each conversion takes place as if by a call to the .Cf wcrtomb function. .P If .Cf dst is not a null pointer, the pointer object pointed to by .Cf src shall be assigned either a null pointer (if conversion stopped due to reaching a terminating null wide character) or the address just past the last wide character converted. If conversion stopped due to reaching a terminating null wide character and if .Cf dst is not a null pointer, the resulting state described shall be the initial conversion state. .HU Returns .P If the first code is not a valid wide character, an encoding error occurs: The .Cf wcsrtombs function stores the value of the macro .Cf EILSEQ in .Cf errno and returns .Cf (size_t)-1 , but the conversion state shall be unchanged. Otherwise, it returns the number of bytes in the resulting multibyte characters sequence, which is the same as the number of array elements modified when .Cf dst is not a null pointer.