From isaak@csac.zko.dec.com Thu Aug 19 07:12:37 1993 Received: from crl.dec.com by dkuug.dk with SMTP id AA04812 (5.65c8/IDA-1.4.4j for ); Thu, 19 Aug 1993 17:48:46 +0200 Received: by crl.dec.com; id AA26515; Thu, 19 Aug 93 11:10:50 -0400 Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93);id AA02941; Thu, 19 Aug 1993 11:12:37 -0400 Date: Thu, 19 Aug 1993 11:12:37 -0400 From: isaak@csac.zko.dec.com (Jim Isaak-respond via isaak@decvax.dec.com) Message-Id: <9308191512.AA02941@csac.zko.dec.com> To: sc22wg15@dkuug.dk Subject: i18n part 4/7 X-Charset: ASCII X-Char-Esc: 29 Return-Path: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com Received: by csac.zko.dec.com (5.65/fma-100391/BobG-15-Feb-93); id AA23266; Wed, 18 Aug 1993 20:30:42 -0400 Date: Wed, 18 Aug 1993 20:30:41 -0400 From: codjig::abyss::SATO_TAKAYUKI_K/HP8900_HQ////////HPMEXT1/TAKAYUKI#b#K#b#SATO#o#HP8900#o#HQ@opnmail2.corp.hp.com To: isaak@decvax.dec.com Subject: WD3A(4/7) ....................................................................... Part 4 of 7 of Working Draft ver.3A (WD3A(4/7)) of the "Framework, requirements and models for internationalization" 5.4 Architectual model Internationalization can be analyzed not only from the perspective of its surface functional components. System implementations differ depending upon the programming language, other tools and the system environment, the software architecture of the underlying operating system (for example, the relative performance of managing large areas of memory versus processing language dependent data on the spot), the model of the application (such as the decision of where the language dependencies are kept), and the hardware facilities that enable efficient system implementation. This is the reason why Architectual modeling for internationalization is necessary along with Surface functionality models. 5.4.1 Interface model The simplest information system model is made up of the following elements: - User (Application User and Operator) - Application Platform - Application Software It is necessary to describe the relations and interfaces between these elements. Usually, there are fixed interfaces between the above elements as follows: - API (Application Program Interface)-between Application Software and Application Platform - User Interface-between User and Application Platform/Application Software Each of the above three elements play a necessary role in internationalization. And since the user sees user interfaces not only through application software, but also through application platforms, both interfaces must have the necessary services to support surface functional requirements for the internationalization described earlier. The generic structure of one element is shown as Figure 9. +-------------------------------------+ | culture dependent specification | | | +-------------------------------------+ | universal specification data | | | ===================================== | | | data representation(code etc.) | +-------------------------------------+ | culture independent layer | | | +-------------------------------------+ | | ================================================= interface for the other elements or external environments Figure 9 Generic structure of one element The specifications of culture dependency should be described by universal data (culture independent form), and the data should be represented by a well- defined form, like code, font, graphic ICON and so on. After representing the data, they become a unitype data or culture-independent form, like binary coded data. Once it becomes culture independent, it is possible to send the data to other elements of the said information system or external environment outside the said information system. 5.4.2 Layered Architecture model Software internationalization technologies can be divided into three major categories: - Input services - Character handling services - Internal design methodology - Presentation services 5.4.2.1 Input services ----- To be filled in later ------- 5.4.2.2 Character handling services Character handling services range from the physical representation of data to the interpretation and processing of text. 5.4.2.2.1 Physical layer The physical layer describes the encoding of data in the computer. Simple encoding techniques such as fixed-length characters allow a very low-cost support of textual data when the number of symbols to be represented is small. This is the case with most of the European languages where each symbol can be encoded in a single byte. 5.4.2.2.2 Logical representation layers In some cases, the logical representation of symbols can be further encoded in a more efficient scheme to reduce the amount of space required to store the data. For example, compaction algorithms in data communications may be assigned to this level of the internationalization model. SYNTACTIC LAYER The syntactic layer describes the logical representation of the characters, allowing the identification of encoded characters in an array of both data and meta-data (information related to the data itself). The syntactic layer is the subject of many standardization efforts currently underway. Known examples of recent developments recommend either a simple scheme where the identification of symbols is done by using a rule external to the data or by introducing data tags that differentiate the data elements. For example, both UNICODE and ISO 10646 allow for fixed-length (in bits) encoding of symbols. Character identification tools can identify the symbols in an array by very simple algorithms. Encoding schemes based on the ISO 2022 standard (such as Compound Text) may allow a much more compact representation of data when the set of symbols is very large but the set of frequently used symbols is relatively small. The cost of the efficient representation is the complexity and processing time required to handle individual symbols. SEMANTIC LAYER The semantic layer deals with mappings between single symbols (identified in the previous layer) and actual characters. Semantics is used here as the set of rules deals with the characters. This layer is void when characters are represented by a sequence of a constant number of symbols. For example, ASCII characters are represented by a single symbol, and for any encoding scheme where only ASCII data is represented, the semantic layer is void. In other cases (e.g. some coded character sets support of floating diacritics) the character identification layer includes the rules for identifying characters from a set of symbols (basic character shape and diacritics). 5.4.2.3 Internal design methodology The internal design methodology includes the specification of the language programming interfaces for software designers and developers. Software internationalization specifications such as those proposed by POSIX describes the tools that decouple the application from its specific behavior required by international users. 5.4.2.3.1 System service layer System services make source code (macros), libraries, commands, and shell programs available to the users of specific operating systems. Examples of these services include the tools to interact with the computer in the user's preferred language, the presentation of numeric and date information according to the customs of where the machine operates, the definition or customization of the behavior of native languages. 5.4.2.3.2 Programming language layer An alternative approach to software internationalization is to modify the semantics of the programming language constructs to introduce the native language processing variants in the object program. For example, a programming language such as COBOL could support the sentence to indicate run-time selection of language. "CULTURE XXXXXX" to indicate that dates and numbers have to be formatted according to the definition of the specified culture supported by the platform, and to perform comparisons, determine sub-strings, index character arrays, or issue messages following the conventions of the language. Note that the support of internationalization at this layer can be implemented by using the system services of the previous layer, but the actual implementation in each case may differ. 5.4.2.4 Presentation services The presentation services in the architecture include the specific language variants of the product that communicate with the user. These services are related to the software product but they are not subsets or modules of the application. The reason for their inclusion in the architectural model is to recognize the importance assigned to them by the user (the first internationalization aspect required by users is to have localized messages) and to show the relationship between character handling, internal design, and presentation services. There are two distinct layers in the presentation services: Application functionality layer and Localization tools layer. 5.4.2.4.1 Application functionality layer Application functionality includes the definition of the behavior of the language-dependent features for a particular native language. For example, in both the system services and the programming language layers, the internal design components of the architecture define how the programmer can invoke services to perform language-dependent functions such as displaying the date and time. At the application functionality level, the standards specify that dates are formatted using full month names and the Emperor year when the application runs in a Japanese environment. 5.4.2.4.2 Localization tools layer Localization is the process of setting the appropriate parameters and translating the messages and related documentation of the product to conform to the native language requirements of the users. Standardization at this level has not been fully developed. Some early attempts in the standardization of the system services include the normalization of the formats of messages (such as indicating that text and parameters in a message could be re-ordered according to the language), but there are still many components open to standardization: Program source code analysis tools, in order to determine the areas where the program has to be internationalized. Some of these areas are obvious, such as issuing messages or invoking formatting tools. Others may require more sophisticated techniques, such as identifying the use of bytes either as characters in textual information or just as storage units for non-textual data in the program. Computer-aided translation for internationalized documentation and user messages. Testing tools for localized products, both for the product itself (to check for erroneous assumptions about the data that the program may encounter) and the localized components. Since many of the localization tools in the industry are of a proprietary nature and the number of users of these tools is very small (usually, localizat ioY iY donI ay centerY staffed b*@professionas localizers)~ therI iY ny * critical mass to start standardization activities at this level of the architecture. Figure 10 represents the software internationalization layered model and shows the dependencies between the individual components of the architecture. +---------------------------+--------------------------------+ | Localization tools | | +---------------------------+ Presentation services | | Application functionality | | +---------------------------+--------------------------------+ | Programming language | | +---------------------------+ Internal Design methodology | | System services | | +---------------------------+--------------------------------+ | Character identification | | +---------------------------+ | | Logical representation | Character handling services | +---------------------------+ | | Physical | | +---------------------------+--------------------------------+ | Data conversion | | | (Normalization) | Input services | +---------------------------+ | | Physical | | +---------------------------+--------------------------------+ Figure 10 Software internationalization layered model ----------end of WD3AC(4/7)-----------------------------