From daemon@dkuug.dk Sun Nov 18 20:36:15 1990 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA01611; Sun, 18 Nov 90 20:40:48 +0100 Received: from [140.186.1.4] by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA01521; Sun, 18 Nov 90 20:36:15 +0100 Received: by meillet.ila.com (4.1/ILA-4.10) id AA01206; Sun, 18 Nov 90 14:15:57 EST Date: Sun, 18 Nov 90 14:15:57 EST From: glenn@ila.com (Glenn Adams) Message-Id: <9011181915.AA01206@meillet.ila.com> To: erik@sra.co.jp Cc: Becker.OSBU_North@xerox.com, unicode@sun.com, i18n@dkuug.dk, arnet@hpda.cup.hp.com In-Reply-To: Erik M. van der Poel's message of Sun, 18 Nov 90 12:56:17 +0900 <9011180356.AA24323@sran8.sra.co.jp> Subject: Han Character Code Ordering X-Charset: ASCII X-Char-Esc: 29 From: Erik M. van der Poel Date: Sun, 18 Nov 90 12:56:17 +0900 String-based sorting is desirable because of the change in pronunciation of a character when it is combined with other characters. Example: KAZE (1 character) means "wind" TAI FUU (2 characters) means "typhoon" Here, the KAZE and FUU are the same character. The implications of this are staggering. Not only do we need a large dictionary with all the different pronunciations, but we may in some cases also need to parse sentences. But this should probably be left to sophisticated applications. Alternatively, one could retain the yomi at input conversion time and annotate jiritsugo accordingly. The annotation could be retained for cases where recovery would be difficult or impossible (unambiguously). Unfortunately, this will be impossible for most conversion interfaces which remove this structure. I believe this is a good reason for demanding a richer conversion interface. Glenn