From isaak@csac.zko.dec.com Thu Aug 19 07:13:00 1993 Received: from ns.dknet.dk by dkuug.dk with SMTP id AA03948 (5.65c8/IDA-1.4.4j for ); Thu, 19 Aug 1993 17:26:34 +0200 Received: from crl.dec.com by ns.dknet.dk with SMTP id AA03825 (5.65c8/IDA-1.4.4j for ); Thu, 19 Aug 1993 17:26:46 +0200 Received: by crl.dec.com; id AA26544; Thu, 19 Aug 93 11:11:13 -0400 Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93);id AA02948; Thu, 19 Aug 1993 11:13:00 -0400 Date: Thu, 19 Aug 1993 11:13:00 -0400 From: isaak@csac.zko.dec.com (Jim Isaak-respond via isaak@decvax.dec.com) Message-Id: <9308191513.AA02948@csac.zko.dec.com> To: sc22wg15@dkuug.dk Subject: i18n part 5/7 X-Charset: ASCII X-Char-Esc: 29 Return-Path: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93); id AA23272; Wed, 18 Aug 1993 20:30:47 -0400 Date: Wed, 18 Aug 1993 20:30:45 -0400 From: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com To: isaak@decvax.dec.com Subject: WD3A(5/7) ....................................................................... Part 5 of 7 of WORKING DRAFT ver. 3A (WD3A(5/7)) of "Framework and requirements for internationalization" 6. Expectations and Obligations This chapter introduces service requirements that satisfy the needs of programmers in creating applications geared towards the international community of users. The examples included in this chapter show the functionallity expected by the programmer using the programming language or operating system environment, and describe in greater detail the internationalization services that need to be supported. It is anticipated that most programming languages will provide similar services in their native syntax or by accessing platform-provided services, and that the services will have an equivalent behavior for every specific cultural element supported by the programming language. For example, it is anticipated that programming languages able to format numeric values will be able to do so in a manner satisfactory to the users in the supported cultural environments. A proposed extension is the data model for textual data, that has to accommodate character repertoires other that the single-byte character model. Services are described here as enabling technologies. Programming languages will need to incorporate these techniques to facilitate the communication between the user and the computer by using the user's native language, for example in computer-generated messages, source code literals, comments embedded in the program, or by providing a wider range of characters to name the program identifiers. The diversity of cultures to be supported recommends an implementation strategy based on minimizing the number of versions required. For example, current internationalization standards recommend the dynamic selection of cultural elements at run-time and the support of multiple character encodings. This chapter will also refer to some standards such as ISO/IEC 10646 and POSIX, and to industry proposals such as the X/OPEN Company object-oriented internationalization specifications. These examples should be treated as a base for further discussion and not as an endorsement or mandatory requirement for supporting the described services. 6.1 Service Requirements The service requirements for internationalization are identified in this section. In this context, programming languages can be considered to be applications running on a (HW/SW) platform. The services can be provided by the platform (such as through POSIX interfaces for internationalization to the operating system) or they can be provided as part of the application. Clearly, a solution where these services are provided by the platform for all applications is preferable. Examples of standards which are related to the required internationalization services are identified. 6.2 Character Set and Data Representation Service 6.2.1 User Requirements Dialogues of international users with system platforms or applications in local language require the support of language-specific character sets. For example, German text contains "umlauts" and the "sharp S" - these are characters which do not exist in the English language, but are essential in German. Most languages have similar requirements for characters which are not included in the basic ASCII set (American Standard Characters for Information Interchange). The written languages can be classified into various groups based on fundamental characteristics as described in xxxx. 6.2.2 Character Set Repository Service The Character Set Repository Service provides a central character set repository that contains coded character sets and relevant information about them. It supports character set and data representation related services. Entries in the repository may include: - Code format - Escapement rules: some languages such as Hebrew or Arabic are written from right to left while numbers within the text of these languages escape from left to right. It is necessary to maintain this information with the character set information. - Character set identifier - Data Classes - Mapping rules - Code extension techniques Some standards related to Character Set Repository Service: - ISO 2375, 3ed, 1985 : Procedure for Registration of Escape Sequences - ISO 7350, 2ed, 1990 : Registration of Graphic Character Subrepertoires - ISO 2022 : Code extension techniques 6.2.3 Character Set Handling Service The Character Set Handling Service provides the capability to recognize, process, store, retrieve, communicate, and present different character sets. Some standards related to Character Set Handling Service : - ISO 8859-1, 1ed, 1987 : Latin Alphabet No. 1 - ISO/IEC 10646-1 : Universal Multiple Octet Coded Character Set - JIS X0208:1990 : Code for the Japanese graphic character set - JIS X0201:1976 : Code for information interchange in Japan 6.2.4 Character Set Identification Service The Character Set Identification Service provides unique identification of character sets. This service allows that different character sets can be used concurrently on a system or in an application without the danger of data corruption. It also provides information for the exchange of data between systems or networks, and the possibility to identify appropriate translation tables between different character encodings. 6.2.5 Data Class Definition Service Characters have different character classes. Since processing often depends on the decision whether or not characters are considered space, numeric, alphabetic, or special characters, a service is provided to identify the class of a character . 6.2.6 Case Mapping Service The Case Mapping Service provides upper case to lower case and lower case to upper case mapping. 6.2.7 Data Presentation Service The Data Presentation Service provides the capability to present data on different display units, printers, or other output devices. According to rules in a repository, the service includes escapement of characters and selection of different shapes. Preparing data for presentation may involve extensive translation and/or transliteration due to hardware selections or limitations. The service also provides default presentation forms for coded characters that have no associated graphic shape. 6.2.8 Data Announcement Service The Data Announcement Service provides the capability to recognize the coded character set of data entities (files, messages, etc.). This capability allows the processing and storage of data in different encoding schemes on the same system without the danger of data corruption. International standards bodies are presently addressing the announcement mechanism. 6.2.9 Data Communication Service The Data Communication Service provides the capability to transmit and receive data to and from communication systems while maintaining the integrity of the data. In international communication environments, this may include data translation due to different coded character sets being used in different service categories. 6.2.10 Data Input Service The Data Input Service provides support for keyboards with local characters and other complex input methods, especially for Far East pictographic character sets. Potentially, input data can carry character set identification information. - ISO/IEC 9995-1,-2,-3,-4,-5,-6,-7 Keyboard standard 6.2.11 Character Set Invocation Service The Character Set Invocation Service provides the capability to specify the character set to be used for input, processing, and output of data. This functionality can be invoked through: - user selection - default specification - data announcement techniques - information about the presentation capabilities of specific output devices The service will potentially also allow the user to dynamically switch from one character set to another, if required. 6.3 Cultural Elements Service 6.3.1 See chapter xxxx 6.3.2 Cultural Elements Repository Service The Cultural Elements Repository Service provides the capability to maintain and access rules and conventions for cultural entities. These might be areas with a common language, geographic areas, or areas with common cultural or historic background. The repository contains information that supports other cultural elements services. A standard related to Cultural Elements Repository Service - ISO/IEC DIS 9945-2 : POSIX shell and utilities 6.3.3 Date Format Service The presentation of day, month, and year varies in different countries, as do habits of using long or short names for days and months and prefixes in long date formats. For example, in the US, the date is mostly presented as mm/dd/yy, while in Europe the forms dd/mm/yy or yy-mm-dd are commonly used. Considering the 5th of October in 2001 we will find the following confusing formats: 10/05/01 for the US, 05/10/01 or 01/10/05 for Europe. Japan counts the years of the emperors era. The Date Format Service provides the capability to use these formats. Some standards related to Date Format Service - ISO 8601, 1ed, 1988 : Representation of Dates and Time - JIS X0301:1977 : Identification Code of Dates in Japan - ISO/IEC DIS 9945-2 : POSIX shell and utilities 6.3.4 Time Format Service While some countries prefer the 12-hour cycle with a.m. or p.m. others use the 24-hour clock. The Time Format Service has the capability to handle these formats as well as world time zones and their offset values relative to UTC. 6.3.5 Day Numbering Service Weeks begin on Monday in certain countries, on Sunday in some other countries, on Saturday in Islamic countries. The day numbering service provides the number of the day. 6.3.6 Week Numbering Service In some applications it is often more convenient to use week numbers for calculations than months and days. The first week in a year is defined differently in various countries. The Week Numbering Service supports these conventions and provides conversion routines. 6.3.7 Numeric Formatting Service Interpretation of numeric fields in unfamiliar formats is one of the major contributors to human errors in data processing. The Numeric Formatting Service provides the capability to handle the different cultural conventions: the point as the decimal delimiter is most commonly used in America; most of Europe uses a comma instead. Spaces or periods are used to separate groups of normally 3 digits. Negative numbers are identified with leading or trailing minus signs and also by surrounding an unsigned value by parenthesis. 6.3.8 Currency Formatting Service The Currency Formatting Service describes the handling of currency fields and symbols: not only the symbols for currencies vary from country to country, but also their placement before, after, or between the integer and the fractional part of the amount. The field lengths and the number of digits after the decimal point depend on the monetary system. Negative amounts are indicated according to local rules and regulation. 6.3.9 Measuring System Service Presentation of dimensions in inches, feet, yards, and miles are different from millimeters, centimeters, meters, and kilometers; ounces and pounds convert into grams and kilograms, cups and gallons into liters, and degrees into centigrade. Conversion facilities and country specific presentation are provided by the Measuring System Service, based on the cultural convention repository. 6.3.10 Compare, Sort, and Search Service See annex 1 6.3.11 Paper Format Service This service provides the capability to select various paper formats as defined in the cultural elements repository. 6.3.12 Cultural Elements Invocation Service This service provides the capability to invoke the cultural elements requested by the user, by default from the repository, by the application, or as defined in the user profile. 6.4 Natural Language Support Services 6.4.1 User Requirements and Background Information The use of computers today is no longer limited to the domain of highly trained specialists; they are now a commodity in homes, schools, and businesses, where they must be used by people who do not have significant "data processing" skills. It is impractical to expect that everybody working on a computer will understand English. Instead, the computer must learn to "speak" the local language of the individual user. A service must be provided with the capability to present messages, menus, help information, and online documentation in the language selected by the user, even when more than one language is required in a single document. This service must enable dialogs with applications and operating systems in local languages. Finally, for text processing, the service must include hyphenation, spell checking, and a thesaurus for each language. Only when these facilities are provided can the computer be considered "useful" on a worldwide basis. 6.4.2 Multi-Lingual Support Services The Multi-Lingual Support Service provides the capability to support more than one natural language simultaneously. For example, a text processor works with text in Japanese and French on the same page with synchronized paragraphs. 6.4.3 Message Service The Message Service provides the capability to present (display, print,etc.) messages, menus, forms, help information, and online documentation in the language selected by the user. Different languages can be used simultaneously. The service maintains independence of the messages from the applications, allows for variable message length (German translations of English messages tend to be 30% longer), has a delivery service to insert parameters into translated messages, and uses the cultural convention repository for format definitions. It also allows users to interact with applications and operating systems in the language of their choice. It allows entering of local language, using local characters, and parsing of local formats as defined in the cultural elements repository. 6.4.5 Language Selection Service This service provides the capability for the user to specify the language of interaction with the application. If none is chosen, the default language is selected. 6.5 Internationalization in Fortran - A Possible Approach The following description indicates one way in which the functionality and services required for support of internationalization might provided in Fortran. It is not intended to imply that this is a recommended approach, it is simply presented as a form of existence proof to indicate that it would not be difficult to add the necessary functionality. This example shows one way in which two key issues might be approached, namely the identification and/or specification of an appropriate cultural environment, and the use of that cultural environment to automatically invoke the culturally appropriate form of string comparison to enable an array, or other collection, of textual items to be sorted in the correct order for the environment. In this approach three new intrinsic procedures provide all the necessary control, and these are briefly described first in a simplified version of the style used in the Fortran Standard (ISO/IEC 1539 : 1991). (i) CULTURAL_ENVIRONMENT (ELEMENT) Description. Returns the processor dependant code for the cultural environment specified. Class. Inquiry function. Argument. ELEMENT is optional, but must be scalar and of type default character if present. Result Type. Default integer. Result Value. If ELEMENT is present then it must represent the name of a cultural environment supported by the processor; the result will be a processor-dependent integer which will identify this cultural environment within this processing system. If ELEMENT is not present the result will be the processor dependent integer which identifies the cultural environment in which the processor is currently operating. (ii) REPERTOIRE_KIND (ENVIRONMENT) Description. Returns the kind type of the character repertoire associated with the current cultural environment. Class. Inquiry function. Argument. ENVIRONMENT is optional, but must be scalar and of type default integer if present. Result Type. Default integer. Result Value. If ENVIRONMENT is present then it must represent the integer code for a cultural environment supported by the processor; the result will be the kind type of the character repertoire which is associated by default with this cultural environment. If ENVIRONMENT is not present the result will be the kind type of the character repertoire which is associated by default with the cultural environment in which the processor is currently operating. (iii) SET_ENVIRONMENT (ENVIRONMENT,CHAR_KIND) Description. Changes the current cultural environment. Class. Subroutine. Arguments. ENVIRONMENT must be scalar and of type default integer. It specifies the cultural environment to be used for subsequent processing. CHAR_KIND (optional) must be scalar and of type default integer. If present, it specifies the kind type to be used by default for any subsequent character declarations; if absent the kind type associated by default with the environment specified will be used. An example of the use of these procedures to automatically localise a program to the environment which is current when the program commences execution might be as follows: PROGRAM Automatic_Localization IMPLICIT NONE ! Establish current cultural environment INTEGER, PARAMETER :: environment=CULTURAL_ENVIRONMENT(), & ch_kind=REPERTOIRE_KIND() ! Character variable declarations CHARACTER(KIND=ch_kind,LEN=20) :: string1,string2 ! etc. . . ! Start of execution CALL SET_ENVIRONMENT(environment) ! This is not strictly . ! necessary, but is . ! probably good practice One of the effects of setting an environment could be to overload the comparison operators between character strings of the character kind specified or implied so as to use the correct culturally comparison algorithm. Thus the statement IF (string1 < string2) THEN . . which would normally compare two strings character-by-character using the rules specified in the Fortran Standard would compare them using the correct culturally sensitive algorithm once a SET_ENVIRONMENT statement had been obeyed. A call to a standard sorting routine would then carry out the sorting using this same algorithm without the need to make any adjustment to the sorting procedure at all. There are many other internationalization functionalities which could be incorporated into Fortran in a similar way with relatively little effort, and with only minimal extensions to the language. It is believed, moreover, that this functionality could initially be added by means of a module, as has been suggested for Fortran's varying length string datatype in CD 1539-2. -------end of WD3A(5/7)-------------