From derek@knosof.uucp Wed Jan 26 21:19:14 1994 Received: from eros.Britain.EU.net by dkuug.dk with SMTP id AA27414 (5.65c8/IDA-1.4.4j for ); Wed, 26 Jan 1994 23:22:24 +0100 Received: from pyra.co.uk by eros.britain.eu.net with UUCP id ; Wed, 26 Jan 1994 22:22:06 +0000 Received: by knosof.UUCP (anilla/UUCP-Project/rel-1.0/11-05-86) id AA14605; Wed, 26 Jan 94 21:19:14 GMT Date: Wed, 26 Jan 94 21:19:14 GMT From: derek@knosof.uucp (Derek M Jones) Message-Id: <9401262119.AA14605@knosof.UUCP> To: wg15@pyra.co.uk X-Charset: ASCII X-Char-Esc: 29 All, Felt that I had to comment on the following. >Date: Wed, 26 Jan 1994 10:24:20 -0500 >From: isaak@csac.zko.dec.com (Jim Isaak-NEW: respond via isaak@csac.enet.dec.com) >Subject: (wg15-uk 463) (SC22WG15.323) Character-less programming Article > > > Character-less Programming > -------------------------- > > by > K Hopper & RH Barbour > Dept of Computer Science, University of Waikato, New Zealand > >... > The task of providing the >internal representation of the syntax tokens is a simple one of mapping >short sequences of lexemes/phonemes into a token. The encoding of the >lexemes/phonemes is irrelevant -- provided that a mapping is made available >to the translator for its use when translating! It may be irrelevant but it is the hard bit that we have yet to figure out how to do, properly. > > A mapping provided for a translator in this way should be made >available in the environment in which it is to execute, as part of the >culture-dependent components of the host operating system. A multi- This is the wrong way around. The translator should get the information from the environment. > >Summary >------- > The paper identifies an urgent need for the expansion of information >technology to all nations and cultures. A generic mechanism has been No it does not. >proposed to extend the way in which the lexis of programming language >standards is defined to improve the cultural sensitivity of Bit of an exaggeration. >implementations. A similar approach to the definition of shared cultural >concepts permits greater cultural independence of application programs >written using existing programming languages. The key components of this >are :-- > > a. A named list of lexical tokens for a language. Standards already provide this. > > b. A locally-defined mapping between representation encodings and > the items in this list. > This sort of thing has to be done globally. The above paper only addresses a subset of the issues. I would recommend reading: "Understanding Japanese Information Processing" by Ken Lunde, O'Reilly & Associates ISBN 1-56592-043-0. To provide useful background on the problems being faced. Chapter 4 on encoding methods will confirm your worst nightmares. CD 9899:1990 PDAM 1, Amendment 1 to ISO 9899:1990 Programming language C on: Integrity Addendum. Does a good job (I guess I'm a bit biased here) of handling the runtime issues. Ignore the Integrity Addendum bit. This document is really about significant extensions to the runtime handling of multibyte characters. Now what about compile time? C 'solves' this problem for literals by introducing wide characters (the current method is really a bit of a cop out). For identifiers there are two known solutions: 1) Rely on a compiler option to switch on the recognition/handling of additional character sequences. This has the disadvantage of reducing source portability. Unless you compiler supports the appropriate locale you cannot compile source that uses it. 2) Have the compiler accept anything as an identifier (a few rules will have to be laid down to govern source representation issues so that the entire program does not get accepted as a single identifier). This has the advantage that it does not reduce source portability. It works because compilers only ever need to compare identifiers for equality (is this identifier the same as that one), so collating sequence is not an issue (but it does impose the restriction that where multiple external representations are possible, the same one is always used). After much discussion WG14 failed to agree on either option as being the one to use. With the standard coming up for revision this issue will be visited again. If you have a third option please let WG14 know. derek jones