From keld@dkuug.dk Mon Oct 21 17:33:32 1991 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA22002; Mon, 21 Oct 91 17:33:32 +0100 Date: Mon, 21 Oct 91 17:33:32 +0100 From: Keld J|rn Simonsen Message-Id: <9110211633.AA22002@dkuug.dk> To: wg15rin@dkuug.dk Subject: 10646/UNICODE status X-Charset: ASCII X-Char-Esc: 29 Here is an update on the Unicode/10646 merger that has been posted to several news groups. --------- Date: Fri, 18 Oct 91 14:59:06 EDT From: schein@TOROLAB5.vnet.ibm.com Subject: Universal Code Set (UCS) update The update on the Unicode/10646: A significant progress was made during the recent ISO work group (SC2/WG2) meetings in Geneva (August 91) and Paris (October 91). The interest was very high with 30 people (representing 13 countries) attending the meetings. A broad consensus was reached on all technical issues (see details below). ISO SC2 plenary (Rennes, October 91) unanimously authorized WG2 to issue a new DIS 10646 in January 1992 for a 4-month vote. It is expected that the 2nd DIS will be approved and the International Standard (IS 10646) will be completed by 3Q 92. 2nd DIS 10646 (UCS) contents: ============================= Architecture: > 4-byte canonical form (UCS-4) > 2-byte Base Multilingual Plane on Plane 0 (UCS-2) - No characters are currently defined on any other planes - No 'swapping' of the other planes is defined - Compaction methods 1, 3, and 5 are eliminated - Single Graphic Character Introducer (SGCI) is eliminated > Graphic characters are coded in the C0/C1 area (except row 0, which is identical to 8859/1) > Two implementation levels defined for the 'combining' characters (called previously 'non-spacing marks' or 'floating diacritics'): - Implementation level 1 does not allow combining marks - Implementation level 2 allows both combining marks and precomposed characters > The Unified Set of the ideographic characters is defined on the BMP > The formatting characters are included to control text in the bidirectional data streams (for Arabic and Hebrew scripts) > The UCS Transformation Format (UTF) is defined in the informative annex to specify a variable-length encoding of the data avoiding C0, C1, NUL and SPACE octets > A claim of conformance should identify the form (UCS-2 or UCS-4), the implementation level, and the identified subset of characters Structure of the UCS-2: 00 FF |-------------------------------| 00| | Alphabetics, Symbols, | A-zone (19968 positions) | CJK auxiliary, Hangul,... | | | | |-------------------------------| 4E| | | I-zone (20992 positions) | Unified Ideographic | | |-------------------------------| A0| | | O-zone (16384 positions) | Reserved for future use | | |-------------------------------| E0| | Private Use (6K), Compatibility | R-zone (8192 positions) | Area, Arabic presentation | | forms, Arabic ligatures, ... |-------------------------------| Unicode status: =============== The Unicode 1.0 book containing non-ideographic part is completed and available. It is published by Addison-Wesley and will be in the bookstores by November 91. Although 10646 (UCS-2) is based on Unicode 1.0, some differences exist. The Unicode Technical Committee (UTC) has decided to incorporate all adjustments with UCS-2 in Unicode 1.1, after the DIS 10646 will be approved by the ISO ballot. +----------------------------------------------------------------------+ | Isai Scheinberg A3/979/895/TOR | | IBM Canada, Inc. | | phone: (416) 448-2260 895 Don Mills Road | | fax: (416) 448-2114 Noth York, Ontario M3C 1W3 | | email: schein@torolab5.vnet.ibm.com CANADA | +----------------------------------------------------------------------+