From isaak@csac.zko.dec.com Thu Aug 19 07:11:58 1993 Received: from crl.dec.com by dkuug.dk with SMTP id AA04642 (5.65c8/IDA-1.4.4j for ); Thu, 19 Aug 1993 17:45:35 +0200 Received: by crl.dec.com; id AA26479; Thu, 19 Aug 93 11:10:11 -0400 Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93);id AA02925; Thu, 19 Aug 1993 11:11:58 -0400 Date: Thu, 19 Aug 1993 11:11:58 -0400 From: isaak@csac.zko.dec.com (Jim Isaak-respond via isaak@decvax.dec.com) Message-Id: <9308191511.AA02925@csac.zko.dec.com> To: sc22wg15@dkuug.dk Subject: I18n part 2/7 X-Charset: ASCII X-Char-Esc: 29 Return-Path: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93); id AA23242; Wed, 18 Aug 1993 20:29:37 -0400 Date: Wed, 18 Aug 1993 20:29:37 -0400 From: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com To: isaak@decvax.dec.com Subject: WD3A(2/7) ....................................................................... Part 2 of 7 of Working Draft ver.3A (WD3A(2/7)) of the "Framework, requirements and models for Internationalization " 3. Internationalization and Localization The requirements described in section 2 are all external from the point of view of the system. If the system is sufficiently "user friendly", then the methodology used which ensures this "friendliness" is not of primary concern to the user(s). On the other hand, the methodology used is of considerable concern to suppliers. Since many different approaches to providing "friendliness" are possible, this section describes the internationalization/localization method as a recommended approach of the SC22WG20. 3.1 Current approach Most of application developers incorporate all the codes necessary to support different cultural environments into the product design. Codes with similar functionalities are therefore often being developed repeatedly. This is a waste of effort on the part of designers and programmers, and carries with it the risk of inadequate or inconsistent implementation. Also, the cost of developing applications for multiple cultural environments is high. This means that many applications are developed for a single cultural environment, which automatically limits the potential market for the application. The weaknesses of the current approach, described above, are: - High cost due to the reinvention of the same functionalities for different culture(s) - This reinvention is not only costly, but also results in a timing gap in the introduction of the application in the marketplace. This gap may cause serious problems when systems are components of a worldwide network. - The possibility of inconsistencies in applications in different cultures arises, even though the external functionality is the same at the beginning * Again, this can cause problems within a worldwide network, and * lead to up-dating/maintenance problems. These differences could prohibit consistent next-generation system up-grading. The ownership of reinvented applications might be unclear, which could hinder the original inventor's maintenance support capabilities. For these reasons, different systematic approaches should be considered. 3.2 Internationalization/Localization approaches If support services were provided which simplified the development of applications for multiple cultural environments, the effort required to produce applications that were usable in a number of cultural environments would be substantially reduced. Also, the implementation of different cultural features would be more uniform among applications, which means that users would become familiar with what to expect in different circumstances. The Internationalization To permit the design and implementation of an application which can accommodate users with a variety of cultural backgrounds, services are required which insulate the application from a variety of cultural differences that are not relevant to its functionality. A system which provides this service is called the internationalized system in this document. The Localization The internationalized application must then adapt to the specific cultural interfaces required by users with shared cultural needs. This adaption process is called localization in this document. Localization can be provided for specific single cultures, multiple cultures, or for the "Global uniformity" or "Cross-cultural friendliness" principles. Since localization is not necessary only for a single culture, the ideal internationalized system would be the basis for any worldwide (internationalized) system. Note: Once a real internationalized system is in place, even localization to USASCII and the American culture will be necessary, so that the American user can use the system. It is not necessary to start with the traditional ASCII system in order to internationalize. 3.3 Relationship to Application Portability The Internationalization as described above is, in other words, very similer to an Application Portability. In principle, applications can be considered to exist in an environment made up of human users and the application platform. Most of the aspects of Application Portability deal with moving an application from one Application Platform to another (i.e. changing the application platform) while keeping the user requirements the same. Internationalization, however, involves changes to the external interface to the application so as to adapt it to different user requirements while keeping the application platform the same or within same family and the application functionality same as well. In Principle therefore, internationalization does not need to consider portability across different platforms. It should be noted here that most applications only communicate with the user via the application platform and thus Figure 1 provides a realistic view where an application's portability is concerned. +-------------------------------+ | Application Platform | User | | | +---------------------+ | ___ | | | | (^_^) | | Application | | | | | | | ----+---- | | | | | | | | | | | | | | | | +---------------------+ | / \ | | / \ | | / \ +-------------------------------+ Figure 1 Applications Portability View of the Relationship between User, Application and Application Platform Because of this relationship between application, platform and user, Internationalization should be considered not only for applications, but also for the platforms themselves, together, as a paired set. In view of this relationship, internationalization can be defined as a "High level of adaptability of different user interface requirements" or "High localizability". 4. Culture-dependent requirements This section describes the difference in requirements from one culture to another. It itemizes the various specific requirements of cultures, but does not discuss taxomony or the relationship between the requirements. Also, the methodologies for finding solutions to these requirements are described in section XXXX of this document. There is not necessarily a direct link between a given requirement and a given solution, but secondary requirements which are derived from original requirements may have a close relationship with the implementation methodologies. For example, an environmental switching mechanism is NOT a user requirement, but it is a secondary requirement and one of the choices necessary in order to fulfill the original requirements. 4.1 Requirements for Cultural Dependencies It is necessary to adapt the system surface to handle the culture-dependent representation and description. Cultural dependencies can be divided into two categories: the first is SCRIPTS to present natural language in native form, and the other relates to culture-dependent items such as national conventions. Section 4.2 of this document is concerned with scripts, and culture-dependent items are listed in section 4.3. In general, the installation of local requirements on to internationalized systems ensures the desired behavior of the localized system. This installation process is to be called localization. There are many other customer requirements which can be categorized as cultural requirements. However, those requirements which do NOT stem from cultures, geographically and socially speaking, are NOT included in this document. Such requirements would be categorized as Application field culture or similarly. NOTE: This is only a list of the differences or requirements, it is not always necessary to support all items under internationalization. To try to accommodate all (or any) requirements to make systems friendly to all (or any) users does mean DIVERSIFICATION, which is somewhat contradictory to STANDARDIZATION (which is what ISO is aiming for). 4.2 Script At the present time, more than 3000 languages are spoken throughout the world. Just over a hundred or so of these languages are actually written. About one half of the world's population uses some version of the Latin script. The other half uses different significant/minor scripts. Information systems represent these non-ASCII scripts within four writing schemes: alphabetic, (diacritical), syllabic and ideographic: a. In alphabetic scripts, vowels and consonants have equal importance. Vowels are distributed within the alphabet, rather than grouped at the beginning. Moreover, most of the alphabetic scripts are the only ones that have uppercase (i.e. capital) and lowercase (i.e. small) forms of each letter. Typical non-Latin alphabetic scripts are Cyrillic, Greek, Arabic, Hebroe and Japanese Katakana/Hiragana. ---- Samples of alphabetic character to be here ------ Figure 2 Sample of Alphabetic character Some of the alphabetic scripts are used with diacritical marks and some are not. >From information technology view point, needs of support of diacritical marks as a combining marks (so called non-spacing character for example) requires significant different technology, therefore, if necessary, alphabetic script may be categokized into two (with and without diacritical marks). b. In syllabic scripts, a vowel can appear above, below, within or beside its associated consonant, or a vowel and its associated consonant are combined as a single independent symbol. Most of South-East Asian scripts are former case and Korean Hangul is a leter. In some of these scripts, vowels are not separate characters. ----- Samples of Syllabic and Ideographic character to be here ----- Figure 3 Sample of Syllabic and Ideographic character c. In ideographic scripts, e.g. Chinese, each character symbolizes a concept, and sound(s). Moreover, ideographic scripts have an open-ended nature in terms of the number of characters within the script. Alphabetic and syllabic scripts are all phonetic, i.e., without specific meaning attached to the individual character. A specific script is not systematically attached to a given linguistic family. For example, Persian is an Indo-European language written with Arabic characters which were designed for a Semitic language. For the writing schemes discussed above, present-day computer systems and their data entry, processing and display facilities must be rendered capable of supporting any operation that can be supported in English. 4.3 List of Cultural Dependent Items Culture-dependent items recognized as relevant to internationalization are listed below. The more widely used an information system is, the more culture-dependent items there are to be identified. Addition of such items to the list will be done on a demand/registration basis. It is necessary to note that these cultural elements carry different weight depending on the culture of the user. For example, tolerance to input methods that are difficult to use varies with the number and frequency of appearance of the infrequent characters in the data: American users may accept the use of the ALT key to enter accented characters more easily than French users, because French users often encounter these characters in their native language text. /* Editor's note */ /* Questionnaire response should be reflected in this section, if needed */ 4.3.1. Character encoding and handling Character sets used in data, literals, source code, search functions, and identifiers vary in terms of contents (e.g., national characters), the container size (e.g., multi-octet), and encoding (different codings of the same set of characters in containers of the same or different size). 4.3.2. Text/String comparison/ordering process (Collating sequence) Collating sequence depends upon natural languages used. For example, the German sharp-s sorts as ss, and Spanish ch sorts after cz. 4.3.3. Conversion mapping of characters/Case conversion Mapping of characters for conversion (including case conversion) is required to handle character data in some character sets, while it is not allowed in other character sets. Samples of the conversions are Normalized-Character, Uppercase/Lowercase, Free-Standing/Initial-Form/Medial-Form/Final-Form, Subscript/Superscript, Simplified-Form/Variation-of-Form/Traditional-Form (CJK) and so on. 4.3.4. Character property classification Character property classification (e.g., alphabetic characters, numeric characters, and special characters in Latin alphabetic character sets; Hanja and Hangul in Korean character set) differs. 4.3.5. Hyphenation of words, Spacing/Punctuation in text Hyphenation of words is applicable to some natural languages (e.g.,English), while it is not applicable to other natural languages (e.g., Chinese). The ways of hyphenating words differ from one natural language to another. Also, rules for spacing the words (No word spacing is needed for Japanese) and punctuation rules/marks are different. 4.3.6. Word representation of numbers Word representation of numbers may be different even though the number formatting is the same. 4.3.7. Messages and dialogs Natural languages may be used for computer-human dialog. The ways of presenting headings, prompts, error messages, and warnings differ among national languages used. 4.3.8. Documentation Documentation (e.g., user manuals) should be provided in users' natural language 4.3.9. Character(Glyph) size, line size, and line spacing Printed/displayed character size, line size, and line-spacing differ among cultures and scripts. (e.g., the Han script is normally wider than the Latin script) 4.3.10 Preferred Font style Preferred font styles differ among cultures, even for the same glyph. For example, Chinese has a more "brush writing" flavor than does Japanese. Unfamiliar style may give the reader a strong foreign impression. 4.3.11 Writing directions Writing direction (e.g., embedded left-to-right numbers in right-to-left in Arabic text ) is language and culture-dependent. Writing direction differences can also have an impact on the usage of mirrored characters. (e.g. open/close perenthesis vs. left/right perenthesis) 4.3.12 Voice message Some applications may need to have voice messages translated to user's natural language, such as television news programs. Others, such as music, should not be translated. 4.3.13 Date and time calendar Presentations of date, time, and calendar are culture-dependent (e.g., the sequence of presenting the day, month, and year), and different presentations can be used in a single culture (e.g., 09/18/90 and September 18, 1990, 2:00pm and 14:00). Some needs Era name for year and some not. Also, some cultures still use the lunar calendar. 4.3.14 Currency The presentation of currency symbols can be at the beginning (e.g., $15.23 in the US), in the middle (e.g.,15$23 in Portugal) or at the end (e.g., 15,23F in France). Also, currency signs, monetary field size, formatting, etc., are different. 4.3.15 Price expression On top of the Currency presentation, price expression are different in some cases. (e.g. $ 123.45++ means tax and service charge not included in some place) 4.3.16 Number formatting The presentation of numbers is culture-dependent (e.g., 99,999.99 in one place and 99.999,99 in another). 4.3.17 Number Rounding The way in which numbers are expected to be rounded in format conversion for presentation when reducing the number of places after the decimal point is culture dependent, some expecting truncation, others rounding and sometimes different action depending on whether the value is positive or not. 4.3.18 Mathematical symbols Mathematical symbols (for common people use) are different in some cultures. (e.g. dot above and dot under holizontal bar is division symbol for most of cultures, but it means minus in Denmark) 4.3.19 Telephone number formatting The presentation of telephone numbers varies from country to country. Also, the same telephone number has different formats depending on whether it is international, domestic, long-distance or local. For example, (5432)-9876 is the local number in Tokyo, 03-(5432)-9876 is the number from within Japan, but outside of Tokyo, and +81 3 5432 9876 is the international number. 4.3.20 Postal address formatting Presentations of postal addresses vary across countries (e.g. the state-town- street sequence in China and street-town-county sequence in UK). 4.3.21 Measurement systems Measurement systems (e.g., distance, weight, speed) used are culture-dependent. Moreover, most cultures have both a modern measurement system and traditional units. 4.3.22 Icons and symbols Icons and standard symbols are different depending on the country and culture (e.g., icon for trash cans in U.S does not look like trash cans for Japanese). 4.3.23 Use of color The use of color differs depending on the culture (e.g., white dress is for dead body in some of Asian culture) 4.3.24 Paper size There are several standard paper sizes depending upon the culture (e.g., ISO standard A4 size and North-American letter size). 4.3.25. Input mechanism In some natural languages, two or more methods for entering characters (e.g., Kana to Kanji conversion in Japanese) are available, and are selected by users based on their preferences. Preferred input methods differ from culture to culture as well. 4.3.26 Message length The space required to store message are different by language and character sets, and also, structure of sentences and order of the words are differenbt. (e.g. "open file" equivalent in some language is "file open"). 4.3.27 Spelling Sppeling of same words may different from culture to culture. (e.g. Center vs. Centre Color vs. Colour) 4.3.28 Function names There are cases in which the words of a natural language are used as function names. The names may be required to carry the meaning within each culture, and must be appropriately translated. 4.3.29 Page Layout Special page layouts for documents (mainly legal use) are required in some cultures, center-folded, double-sided (Fukurotoji) in Japan is an example of layout requirements. Business letter format is also in this categoly. 4.3.30 Legal/Regulatory requirements Each country has its own regulatory/legal requirements - they are not necessarily the same from country to country. 4.3.31 Taboo words Each culture has its own taboo words which are of no significance in other cultures. 4.3.32 Person's title Methods of addressing a person differ from culture to culture. ------- end of WD3A(2/7)---------