SC22/WG20 N223 ISO/IEC JTC1 SC22/WG20 Internationalization Date: October 22, 1993 Title: WG20's current and intended work Source: WG20 Action: For information to SC22 and its Working Groups Distribution: SC22 Introduction At its Plenary meeting in Paris, September 1993, SC22 requested WG20 to provide a detailed report on its current work and anticipated output, so that the members of SC22 might better understand the scope of WG20, and its relationship to the work of other SC22 WGs. This report was prepared by WG20 at its meeting in Tokyo, 18-22 October 1993, in order to respond to SC22's request. As the 20th century draws to a close, the use of computers is moving very rapidly from a primarily professional, American/European oriented, user-base to a much more general usage, encompassing users from a wide range of cultures and speaking many different languages. This has led to a need for software to be used in ways which are quite impossible unless the underlying hardware/software architecture is capable of not only supporting alternate character representations but also of adapting a wide range of functions to the environment in which the user is most comfortable. The process of creating software that can readily be adapted (or can adapt itself) to differing cultural environments is known as internationalization - often abbreviated to i18n. In order to deal with the implications of internationalization, WG20 was established by SC22 in order to "identify elements relevant to the work of SC22 that may be affected by differences in language, culture, customs and habits; for these elements, [to] develop standards that enable applications to be portable across differing cultural practices; [and to] develop a Technical Report that describes a framework for nations to provide those elements [SC22 N1424]." WG20 also has a wider brief with regard to activities of JTC1, but these are not relevant to this report. The relationship of WG20 to other SC22 WGs The topic of internationalization is an extremely important one for the use of information processing systems. Since information processing systems rely on underlying programming languages and system software interfaces, internationalization is, therefore, extremely important to programming languages and system software interfaces. The role of WG20 is to provide the information and, where appropriate, the tools to enable SC22 WGs to rise to the challenge of updating their standards to support internationalization facilities. WG20 recognises that the incorporation of facilities to support internationalization in programming languages will impose an additional burden on programming languages WGs, and has therefore divided its work into three parts. The first of these, and WG20's highest priority, is the production of a Technical Report, addressed primarily to the members of SC22 and its WGs, explaining the problems that need to be addressed, and the tools and capabilities that programming languages need to provide, or to support, in order to enable programs to be internationalized. It is the intention of WG20 that this TR will provide valuable background information for SC22 WGs, together with sufficient detailed information to enable them to establish plans and schedules for the incorporation of i18n facilities into their languages. This TR will be followed by a revision of TR 10176 - Guidelines for the preparation of programming language standards. This revision will consist of the addition of new sections to the existing TR to address the particular problems of internationalization. In addition to these two TRs, there will be several international standards and registries, which will define tools and information sources that can be used by programming languages to provide specific i18n functionalities. Each of WG20's projects is described briefly in the remainder of this report. The descriptions indicate the timescale of the projects and the particular benefits of each project to SC22 WGs. Projects being undertaken by WG20 For each project the following information is provided: 1) Title of a project / task 2) Project number 3) Anticipated output 4) Project editor 5) Status / stage of the technical work 6) Priority 7) Schedule of the technical work 8) Benefits 1. Framework document 1.1 Framework -- requirements and model for internationalization 1.2 JTC1 22.30.01 1.3 A new type 3 Technical Report 1.4 T. K. Sato 1.5 Stage 2 (working draft) 1.6 First priority 1.7 June 1994, PDTR registration and ballot 1.8 Benefits This technical report presents the framework and reference model for internationalization, and identifies the services required for the internationalization of information technologies, including programming languages and their environments. The report describes the underlying principles for the specification of internationalization features, including: - a list and brief discussion of internationalization related requirements, - methods to provide internationalization features for information technologies This technical report is intended for the standards providers as a planning reference for the specification of internationalization services in their respective work. It is also being used as the base document for all ISO/IEC JTC1 SC22/WG20 activities. The report is a prerequisite for the revision of the TR10176 document. 2. Amendment of TR 10176 2.1 Guidelines for the preparation of programming languages standards 2.2 JTC1 22.13 2.3 An amendment to a type 3 Technical Report 2.4 WG20 is proposing A. Kido and M. Noda 2.5 Stage 2 (working draft) 2.6 Second priority 2.7 December 1994, PDTR registration and ballot 2.8 Benefits The existing TR 10176 (1991) provides guidelines to which all programming language standards should conform. There are, however, two important areas that are not addressed in the current version. The first of these, internationalization, is the subject of the Framework TR. The revision of TR 10176 will provide guidelines for the incorporation of these i18n features into programming language standards. The other is related to ISO/IEC 10646-1 (Universal Multi-Octet Coded Character Set). Commonly used codesets and encoding methods, such as ISO/IEC 646 (ASCII), ISO/IEC 8859-1, and Japanese EUC, include characters for a single language or a small group of languages. Because of this, there are limitations to the number of languages current codesets support. If, for example, ISO/IEC 8859-1, which supports Western European language characters only, were used, it would not be possible to include Japanese, Greek, Arabic, or other non-Western European language characters in the text. Some applications and users need mixtures of languages that current codesets do not support. Therefore, the goal in creating ISO/IEC 10646 was to include all characters from all significant languages; to be what the standard calls a "Universal Multi-Octet Coded Character Set" (UCS). Large amounts of 8-bit data and devices will continue to exist for a considerable time to come, and systems that support ISO/IEC 10646 will have to coexist with those that only support single-byte encodings. Because of the diversity of character sets and character encoding schemes, standards are necessary to assist programming languages and operating systems in dealing with certain character set related issues, and WG20 will be proposing standards to cover these. The revision of TR10176 will also highlight these issues, and will provide guidelines for the handling of large character sets in programming language standards. 3. International tailorable ordering 3.1 International ordering of ISO/IEC 10646 3.2 JTC1 22.30.02.02 3.3 An International Standard 3.4 Alain LaBonté 3.5 Stage 2 (working draft) 3.6 Third priority 3.7 December 1994, CD registration and ballot 3.8 Benefits A default international ordering mechanism, built on the richest character set standard that JTC1 ever produced, will be an immediate universal reference for international developers using standard programming languages. The definition of this ordering standard will be done independently of coding of characters, so that adaptation can be done to whatever codeset is used. Furthermore, as the character repertoire is the richest to be expected by any programming language, it will be easy to subset the reference repertoire by eliminating entries, should the need arise. The ordering will, for the first time, be a universally available collating reference that will both fulfill the requirement of cultural correctness (which has never been guaranteed) and of full predictibility of results. The ordering standard is intended to be used at the character string comparison level so as to allow consistency of operation in all compare/sort/search/merge operations at the lowest level and that for programs, indexing methods, data base indexing engines. National ordering standards are in existence, European ordering standards are in preparation, and a framework already exists in POSIX to handle those standards in a formal way. User requirements about services expected from programming environments are also available (SHARE Europe national language architecture). The WG20 international ordering project will build on all these works and extend them to cover most world human languages, with a possibility of easy adaptation when necessary. The ordering services will be provided either at the operating systems level, as programming language features, or via library routines. 4. Functionality of Internationalization 4.1 Functionality of the internationalization of applications 4.2 JTC1 22.30.02.01 4.3 A number of standards, see subproject descriptions below 4.4 Keld Simonsen 4.5 This is an umbrella project 4.6 Not applicable, see subproject descriptions below 4.7 Not applicable, see subproject descriptions below 4.8 Benefits This project is the umbrella project for the development of specific standards for functionality of the internationalization of applications, such that users can specify their cultural requirements, and obtain consistent behaviour across system platforms and applications. WG20 considers that SC22 member bodies will be the main producers of the cultural specifications, by submitting such specifications as are relevant to their territories. WG20 believes that there will be a need for at least one specification for each country of the world, and, as such an international registry of such specifications will be the most suitable way of handling this, the information will be universally available with a unique identification. WG20 considers that SC22 WGs will be important providers of standards for tools for application development, and that SC22 WGs should thus include provision for i18n support in their standards, using i18n specifications provided by the member bodies. WG20 is planning to produce a number of standards to help SC22 member bodies and SC22 WGs easily accommodate user needs for i18n support in applications, for example: - a specification standard for cultural conventions - an API standard for internationalization - a registration standard for cultural convention-sets - a standard for a modifiable ordering of IS 10646 repertoire strings - a standard for the classification of IS 10646 repertoire characters Note that, in response to requests for clarification of the terminology, WG20 proposes to use the term "cultural convention" to refer to a single culturally-dependent entity, such as date formatting or string ordering, and "cultural convention-set" to refer to the set of cultural conventions which relate to a particular user environment, such as French-speaking Canada or the Japanese banking community. Of the above-mentioned proposed standards, only the ordering standard is currently approved as a subdivision of the original project JTC1.22.30, and is described in section 3, above. The others will be proposed for subdivisions of JTC1.22.30.02.01. The benefits of this approach to the member bodies are: The member bodies need only provide a limited set of specifications (of the order of one specification per language used in their country), to obtain consistent behaviour of applications for a number of programming languages and POSIX, and applications utilising those facilities. The specification standard will ensure that there is a formal well specified notation to specify the requirements, and the registration standard will ensure that specifications are universally available with a unique identification. The API standard will provide uniform semantics and access to the registered cultural convention-sets, in order to enable consistent behaviour across programming languages and POSIX. The two IS 10646 related standards will make it feasible for SC22 member bodies to have their i18n specifications applicable to the full character repertoire of IS 10646, as the member bodies only need to address the deviance from these specifications instead of specifying the total behaviour on the vast repertoire of IS 10646. WG20 expects the two IS 10646 related standards to cover all of the i18n specification requirements with respect to character sets, and these specifications can cover all encodings of IS 10646 or subsets thereof, including other coded character sets. The benefits of the WG20 approach to SC22 WGs are: The WGs will need to consider the revised guidelines for the preparation of programming languages standards (TR 10176), and decide which changes to make to their languages to accomodate i18n, and will need to provide the appropiate binding to the Language Independent API for i18n. Then the WGs will have access to the vast body of i18n specifications in a consistent way across the products implementing WG20 specifications. 4a. Internationalization API 4a.1 Internationalization API 4a.2 A proposed subdivision of JTC1.22.30.02.01 4a.3 An International Standard 4a.4 Keld Simonsen (nominated) 4a.5 No input papers yet, but described in the Framework WD 4a.6 Not yet decided 4a.7 Needs coordination with the Framework TR and the i18n specification standard 4a.8 Benefits A Language Independent API standard on internationalization services. The services defined will include: date format service time format service day numbering service week numbering service numeric formatting service currency formatting service measuring system service compare, sort and search services paper format service cultural convention-set invocation service character set handling service character set identification service data class definition service case mapping service data presentation service data announcement service data communication service data input service character set invocation service multilingual support service message service language selection service The benefits for the SC22 WGs will be that they will only need to specify the syntax for the APIs for their language, and that they can then use the body of cultural convention specifications in a way consistent with other programming languages. 4b. Cultural convention specification standard 4b.1 Cultural convention specification standard 4b.2 A proposal for subdivision of JTC1.22.30.02.01 exists 4b.3 An International Standard 4b.4 Keld Simonsen (nominated) 4b.5 The subdivision has been proposed 4b.6 Not yet defined 4b.7 Not yet available 4b.8 Benefits This standard will provide a specification method for cultural conventions, such as classification of characters, date and numbering formats, string ordering, monetary formatting, and message handling, so that users can specify their cultural needs. Specification for character sets will also be covered. Programming languages can use this information via a binding to the i18n APIs, and thus just rely on this specification instead of providing all of this functionality themselves. The standard will include full ISO/IEC 9945-2 (POSIX) compatibility. This standard, together with I18n APIs, will also ensure a uniformity of interface for the users across applications and programming languages. 4c. Cultural convention-set registry 4c.1 Cultural convention-set registry 4c.2 A proposal for a new project exists 4c.3 An International Standard for registry procedure and an International Registry 4c.4 Keld Simonsen and the Danish UNIX User Group are nominated as the project editor and registration authority 4c.5 The NP is under consideration (stage 1) 4c.6 Not yet defined 4c.7 Needs coordination with cultural convention specification standard 4c.8 Benefits This will be a registration standard for cultural convention-sets, so that a user can expect the same behaviour according to his cultural expectations across a variety of platforms. The programming languages need thus only provide binding to the I18n services to get access to this vast variety of information. Member bodies can specify cultural IT needs for their territories in a uniform way, that will be avaliable to all programming languages and operating systems and system software interfaces. The registry will also accommodate ISO/IEC 9945-2 (POSIX) locales, and there is thus no need for a specific POSIX locale and charmap registration standard. 4d. Classification of the characters of the IS 10646 repertoire 4d.1 Classification of the characters of the IS 10646 repertoire 4d.2 A proposal for a subdivision of JTC1.22.30.02.01 will be needed 4d.3 An International Standard 4d.4 Keld Simonsen (nominated) 4d.5 No work done yet 4d.6 Not yet defined 4d.7 Not yet available 4d.8 Benefits This standard will define the classification of the characters of characters of the IS 10646 repertoire, such as upper/lowercase, alphabetic and control characters, and mappings between them, so that this need not be done by SC22 member bodies for their cultural convention-set specifications, and also so that there is little need for separate specification for their language.