From daemon@dkuug.dk Mon Nov 19 07:20:48 1990 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA17121; Mon, 19 Nov 90 07:21:11 +0100 Received: from MCSUN.EU.NET by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA17110; Mon, 19 Nov 90 07:20:48 +0100 Received: by mcsun.EU.net with SMTP; Mon, 19 Nov 90 07:24:10 +0100 Received: from srava.sra.co.jp by srawgw.sra.co.jp (5.64WH/1.4) id AA23533; Mon, 19 Nov 90 15:23:16 +0900 Received: from sran8.sra.co.jp by srava.sra.co.jp (5.64b/6.4J.6-BJW) id AA19238; Mon, 19 Nov 90 15:23:19 +0900 Received: from localhost by sran8.sra.co.jp (4.0/6.4J.6-SJ) id AA25707; Mon, 19 Nov 90 15:21:54 JST Return-Path: Message-Id: <9011190622.AA25707@sran8.sra.co.jp> Reply-To: erik@sra.co.jp From: Erik M. van der Poel To: glenn@ila.com Cc: unicode@sun.com, i18n@dkuug.dk, arnet@hpda.cup.hp.com Subject: Re: Han Character Code Ordering Date: Mon, 19 Nov 90 15:21:50 +0900 Sender: erik@sran8.sra.co.jp X-Charset: ASCII X-Char-Esc: 29 > From: Erik M. van der Poel > > String-based sorting is desirable because of the change in > pronunciation of a character when it is combined with other > characters. Example: > > KAZE (1 character) means "wind" > TAI FUU (2 characters) means "typhoon" > > Here, the KAZE and FUU are the same character. The implications of > this are staggering. Not only do we need a large dictionary with all > the different pronunciations, but we may in some cases also need to > parse sentences. > > Alternatively, one could retain the yomi at input conversion time and > annotate jiritsugo accordingly. The annotation could be retained for > cases where recovery would be difficult or impossible (unambiguously). > Unfortunately, this will be impossible for most conversion interfaces > which remove this structure. I believe this is a good reason for > demanding a richer conversion interface. > > Glenn Yes, I have often thought about this idea, and it seems like a good idea, but I think there would be several problems. 1. Existing unannotated text may be difficult to reverse-convert, especially when it's ambiguous, as you say. So you can only use your idea on newly converted and annotated text. 2. In some cases, it is easier to convert to a particular Kanji by entering a different yomi. I.e. you can convert quickly if you enter a yomi that is unique so that you don't have to waste time disambiguating. Of course, this has a lot to do with the limitations of the input conversion systems of today. Nevertheless, this may be a problem. 3. If we are going to annotate text, we had better do it consistently, so that we can send each other text even if we use different input conversion systems. This is not a problem with the idea itself, but more with the actual implementation of this idea. By the way, who or what is "arnet@hpda.cup.hp.com"? It'd be kinda nice to know who I'm sending mail to! :-) Erik