From suehiro@jrd.dec.com Thu Dec 3 23:00:10 1992 Received: from inet-gw-1.pa.dec.com by dkuug.dk with SMTP id AA10500 (5.65c8/IDA-1.4.4j for ); Thu, 3 Dec 1992 06:37:35 +0100 Received: by inet-gw-1.pa.dec.com; id AA26968; Wed, 2 Dec 92 21:02:30 -0800 Message-Id: <9212030500.AA11858@jrdvax.jrd.dec.com> Received: from localhost by jrdvax.jrd.dec.com (5.65/J-ULTRIX4.2A) id AA11858; Thu, 3 Dec 92 14:00:16 +0900 To: wg15rin@dkuug.dk, posix-dot2@spartan.eng.sun.com Cc: wg14@dkuug.dk, suehiro@jrd.dec.com Subject: LC_CTYPE extension for widechar mapping Date: Thu, 03 Dec 92 14:00:10 +0900 From: suehiro@jrd.dec.com X-Mts: smtp X-Charset: ASCII X-Char-Esc: 29 The attached is the online version of the document I submitted at the POSIX meetings in Utrecht and Reading. Title: LC_CTYPE extension for Additional Character Mapping IEEE POSIX.2 group document number: P1003.2-N162 WG15 RIN temporary document number: RTN 010 Because the original document contained many multibyte characters in a sample LC_CTYPE definition, I modified the second page of the document not to use multibyte characters. I put wg14 in cc field for your information. regards, Yoichi ----------------------------------------------------- Yoichi Suehiro Digital Equipment Corporation Japan, TNSG/ISE LC_CTYPE extension for Additional Character Mapping ==================================================== October, 1992 Yoichi Suehiro Digital Equipment Corporation Japan A member of POSIX WG in Japan In POSIX.2 project, LC_CTYPE extension for additional character classes is specified by the request from Japan. In the same way, another extension is necessary for additional character mapping, at least for Japan. Current POSIX and C language standards have character mapping functionality only between uppercase letter and lowercase letter. In other languages, the same kind of character mapping functionality will be required for the operations such as string search. The typical example of this is the mapping between Hiragana characters and Katakana characters in Japanese character set. Katakana and Hiragana are 2 different set of Japanese syllabary for the same set of syllable. Pronunciation is the same but the representation is different. These 2 sets are used interchangeably in the same context. However, Hiragana and Katakana characters are not mixed in the same string like uppercase and lowercase characters. On most Japanese systems, this mapping functionality exists with different interfaces. Moving forward internationalized applications with POSIX locale methods, additional character mappings should be allowed in the standard specification. Current LC_CTYPE specification for character mapping can be used when this functionality is added. The way of the extension should be the same as the one of additional character class. I wonder if this kind of requirements exist in other countries than Japan. If this requirement is general enough to be included in the standard specification, I'd like to see this in .2 project. I (or Japan) can provide the proposal. Otherwise, I'm happy if I can get comments how to cope with this issue. [This page has been changed a lot because original text had many kanji characters in it. -ys] # Sample definition LC_CTYPE hiragana # JIS X0208 Hiragana characters [Hiragana characters were listed here. -ys] # JIS X0208 Katakana characters katakana [Katakana characters were listed here. -ys] english # JIS X0208 English alphabet [A-Z, a-z in JIS X0208 were listed here. -ys] number # JIS X0208 Numbers [0-9 in JIS X0208 were listed here. -ys] ############################################################################ tojhira # JIS X0208 Katakana -> JIS X0208 Hiragana # 3 katakana characters do not have corresponding characters. (,);... abridged ...;(,) tojkata # JIS X0208 Hiragana -> JIS X0208 Katakana (,);... abridged ...;(,) tojzenkaku # ASCII -> JIS X0208 (,);... abridged ...;(<~>,) END LC_CTYPE Wide Character mapping functions ================================ The wide character mapping functions wcmap and towcmap map wide characters in the same style as the traditional character mapping functions (toupper, tolower) according to the rules of the coded character set defined by the character type information in the program's locale (category LC_CTYPE). The wcmap function ================== Synopsis #include wcmap_t wcmap(const char *mname); Description The wcmap function tests if the mapping rule specified by the argument is valid in the current locale. The mname is a string identifying a rule that maps a wide character in a class to the corresponding wide character in another class. The function returns a value of type wcmap_t, which can be used as the second argument to a call of towcmap. The wcmap function determines values of wcmap_t according to the rules of the coded character set defined by the character type information in the program's locale (category LC_CTYPE). Values returned by wcmap are valid until a call to setlocale that modifies the category LC_CTYPE. Returns The wcmap function returns (wcmap_t) 0 if the given mapping name is not valid for the current locale (category LC_CTYPE), otherwise it returns a value of type wcmap_t that can be used in calls to towcmap. The towcmap function ==================== Synopsis #include wint_t towcmap(wint_t c, wcmap_t wc_map); Description The towcmap function converts a wide character to the corresponding wide character according to the rule designated by the wc_map. If the value of wc_map is invalid (that is, not obtained by a previous call to wcmap, or wcmap_t has been invalidated by a subsequent call to setlocale that has affected category LC_CTYPE), the behavior is implementation-defined. Returns If the argument c is a wide character for which the corresponding wide character is defined in the mapping, the towcmap function returns the corresponding wide character; otherwise, the argument is returned unchanged. Example If the mapping tojhiragana which is a rule to map a katakana character to the corresponding hiragana character is defined in the current locale, the following calls to wcmap and towcmap can be used for such conversion. #include /* ... */ ch = towcmap(c, wcmap("tojhiragana")); /* ... */ [End]