ISO/IEC JTC1/SC22/WG15 N214 To: National Standards Bodies, ISO member countries. From: Internationalization Rapporteur group SC22/WG15 Subj: National conventions. Below, please find a questionnaire concerning national conventions associated with handling information that is often processed using computers. The purpose of the questionnaire is to identify conventions that vary from country to country with the intent to help make applications more adaptable and software development more effective by making it easy for the applications to adapt to local conventions. As computers become more prevalent, they must deal with local and national cultural conventions, rather than reflecting the conventions of limited populations. To do this, much information must be gathered so that the mechanisms can deal successfully with all the relevant conventions rather than finding that some were omitted and cannot be easily retrofitted. The technology is not ready to deal with issues such as natural-language translation, but issues such as time and date, currency, and timezones are ready to be considered. The issue of character sets is being addressed in SC2. We presume that the necessary characters can be represented. This survey is intented to be qualitative in nature, and to identify the nature of the issues in this area, not to identify some statistical characteristic of the information gathered. We would like to have the questionnaire filled out by as many nations, representing as many cultures, as possible. Within a nation that has more than one culture or set of conventions, please fill it out for each culture or set of conventions. The viewpoint reflected should be that of the culture, rather than responding from the viewpoint of a computer expert who is able to deal with the representations that do not match the culture. This questionnaire is written with the intent that it can be filled out by someone not particularly familiar with computers. The questionnaire first explains what the issue is, and then shows examples of what the current technology can deal with. This is to give you an idea of what the problem is, as it is currently perceived. We would ask you to answer several questions in each area: 1) Is the current technology (as represented in the questionnaire, not in terms of actual products) minimally acceptable; can you operate successfully in your culture, for computer use only, with what is available? 2) Is the current technology adequate for most computer usage? Does it meet all your national or cultural needs when computers are being used as data processing devices? If not, please describe the problem, and how that information should be represented to meet local needs. 3) Is the current technology adequate for non-expert usage? Are there situations where people who do not normally use computers would be presented with information in an unfamiliar form if the current technology were not extended? Again, we would ask for descriptions of the problem if the needs are not fully met. 4) We realize that there are also historical usages, such as obsolete currencies, that would need to be represented in textual documents. If those would also be used by computers, in terms of manipulating them, please describe them. If, however, a computer would not have to deal with them (except possibly as uninterpreted text) they are not within the goals of this questionnaire, as it is the manipulation, not representation, of that information that is important. 5) The goal of this questionnaire is more to identify new (to us) issues than to develop a prioritization among the known issues. Although the questions above are intended to guide standards developers when technological or resource tradeoffs must be made, they are not intended to be statistical in nature. Subjective indications of "this is important, this is not" help standards developers develop general solutions to as many of the problems as possible. In most cases, a simple indication that the known solutions cover all known needs is preferable to a list of the needs. The purpose of survey is to idenfify the unknown and exception, not the currently known problems. There are also several classes of concepts where we are aware there are issues, but which are for some reason or other not currently of interest within this questionnaire: these include: Character sets: there are other activities dealing with that issue. (SC2, some other SC22 work.) This questionnaire assumes that the necessary characters can be processed, represented, and displayed as needed. Anything internal to a computer program (such as identifiers, comments or function names). This does not relate to "applications portability". (Presumably things such as tagging of data for the culture that it comes from must occur, but this is not typically visible to the application user.) Processing of mixed language text is a need, and taken as a given, but is not specific to a given culture and information on that is not being requested here. (Since this remains an area of research, suggested solutions would be helpful, however.) Functions which are not "generic" to most applications: For example, portability of payroll programs (including such issues as tax law) is not of interest. Document layout also falls in this category as only document processing programs are concerned with this. Units of measure: although translation of such units is desireable, it must in general be the application that does the translation: this is because the unit is as likely to be a function of the application as of the culture: aircraft applications could display distances in meters, altitude in feet, and fuel in pounds. The same fuel could be taxed in either liters or gallons. Cooking measure is sometimes by volume and sometimes by weight, and requires knowledge of the density of the ingredient to translate. Issues involving understanding natural language: translation (obviously), hyphenation of words, and the like. Please use the examples as a guideline both to understand the questions we are asking, and also to help us understand your response. In no case can the examples be complete. If you are unsure whether the needs of your culture are met, indicate that, and we can evaluate the situation to see if the technology can already do it. Because of the diversity of cultures, it may not be possible to represent every concept in all possible ways at a reasonable cost. However, by knowing of the issues, we can hope to do a better job than otherwise. It remains up to the programmer to actually use these facilities, so they will not automatically be present in programs even when they are available. Where we suspect that there might be a problem, a list of "possible issues", to start thinking about the problems, is mentioned. Background information: Name and Address: Telephone: Fax: Electronic Mail (if available): Country Described: Locale/Culture within the country: Your background within the Locale/Culture described (e.g.: librarian, computer expert, historian, linguist.) If you are representing a group please give the name of the group (e.g SC22 in Denmark, librarians in Denmark, DEC users): References that we could use to understand further details. (National standards that are not also ISO standards, in particular.) Other information about the cultural conventions you think we should have: (That is, did we miss something that is a known problem? Is there some other information we should have.) Did we miss any classes of conventions that affect that culture particularly? Any other comments: Note: Remember to fill in the list as an average user, and not as a specialist in a particular area. User-specific information can be given as comments. As mentioned in point 5 above, we are looking for exceptions, not lists of known solutions. If the suggested solutions are sufficient, all you need to indicated is that they are sufficient. If not, then that's the information we're looking for. Date and time. Dates and times can be converted from an internal representation (representing UCT) to external forms with the following rules: The month can be represented as: - one or two digit number (or ordinal) - a two digit number (or ordinal) - a month name abbreviation - an arbitrary length month name - Capitalization of the month name can be varied. The day of month can be represented as: - a one or two digit number (or ordinal) - a two digit number The year can be represented as: - The four digits of the Western era - the last two digits of the Western era (or ordinal) - Other eras: + Dates can be started from other bases than the Western era + Names of eras can be attached + Years can be named as ordinals. The day of the week can be represented as: - A day of week name abbreviation - An arbitrary length week day name - Numeric day of the week (0=Sunday) (or ordinal) - Capitalization of the day name can be controlled. Hours can be represented as: - One or two decimal digits (or ordinal) - Two decimal digits. (or ordinal) - In 12 or 24 hour time, with or without AM/PM notation. + The AM and PM notation can be changed. Minutes and seconds can be represented as: - Two decimal digits (or ordinal) Weeks can be represented as the week number of the year. Either Sunday or Monday can be used as the first day of the week. The current timezone name can be printed; it is an arbitrary string. The above elements can be combined in arbitrary order, with any fixed punctuation between them. Wherever numbers can be printed, alternate number strings can be used instead. (For the range 0-99.) These are arbitrary strings, and thus can be natural language ordinals (either western ordinals or Kanji digits), alternate (e.g. Hindi) digits, or the like. (The examples below use English, but Hindi digits or Asian Era dates and times are the primary intent of this functionality.) The locale allows for two default date formats, so that, for example, both a Gregorian and a locale-specific date can be used. Some example dates that can be generated include Feb 28, 1990 february 28, 1990 HH2Y2M28D (Where HH Y M and D would be Kanji) Wednesday 28 February, 1990 02/28/1990 28/02/90 28 II, 1990 (The month name would be a Roman Numeral) The 28th Day in February in 1990. The TwentyEighth Day of February of 1990. 10:01 PM 2201 PM 1001 10:01:02 22:01:02 10:10 PM EST Possible issues: solar time, lunar calendars, calendars that do not align with the Western one. Note: In answering the questions below, please keep in mind that what is interesting is not what we already know how to do, but rather things we don't know about. If the solutions mentioned above work for your culture, the answer to the questions below are all simply "yes". If the answer is "no", we will need details of the problem. This philosophy applies throughout this questionaire. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met, why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Timezones: The timezone in which a date or time needs to be represented needs to be represented as an offset from GMT. Timezones can be represented in terms of: - Offset from UCT, in hours, minutes and seconds, + or - 24 hours. - The start and end of daylight/summer time: + In terms of a day number of the year + In terms of a particular day of week, week of month, and month number + At a specified time. - The offset (in hours, minutes, and seconds) of daylight/summer time from the normal time. - The names of the normal and summer timezones Examples: 7 hours west of GMT, with one hour for summer time on the first Sunday of April, ending on the last Sunday of October, both at 0200. Names MST and MDT. One hour east of GMT, one hour for summer time, starting on the last Sunday in March and ending the last Sunday in September. Names MEZ and MESZ. (Or MET and METDST.) Nine and 1/2 hours east of GMT, one hour for summer time, starting on the first Sunday in October, ending the first Sunday in March. Names CST, CDT (Australia.) Five Hours west of GMT. No summer time, but the timezone name changes in the summer by the same rules as the first example. Names of EST and CDT (Indiana, USA). Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Character set characteristics: Character sets can be classified into the following classes: Upper Case Lower Case Numeric Punctuation White space (characters that just move the print position) (Plus several that are primarily for computer usage). Translation of characters between upper and lower case can be done with or without loss of accent marks. These concepts need not be applied to languages which do not have the concepts of case or other character classes. Examples: The character a-accent-grave can be translated to either A or A-accent-grave. The three Russian characters that never occur in upper case can be left alone during translation. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Collation: Collation is the ordering of textual material into some predefined order. The rules which can be used to determine the collation of text include: - The specification of a collation order different from that which occurs naturally in the computer character set. (French and Canadian French use the same character codes, but collate in different orders.) - Certain characters do not participate in collation decisions. For example, as required on page 10 of Webster's Ninth New Collegiate Dictionary: The main entries follow one another in alphabetical order letter by letter without regard to intervening spaces or hyphens: - Certain characters should collate equally even if they are different characters. (E.g. in some languages the accented vowels are all equal and the accents do not participate in collation decisions.) - Certain characters should collate equally until they are the only difference, and then collate in a specified order. (As in the example above, only when two strings differ only by accent marks, the order is specified.) - Certain pairs of characters should be treated as a single character. (E.g. ll and ch in Spanish.) - Certain characters should collate as if they were two characters. (Eszet in German, the ae diphthong.) - Collation can be done either with upper and lower case characters distinct, or with the upper and lower case characters treated equivalently. The upper to lower translations mentioned for character collation can be done. - Collation order for large character sets (e.g. Asian) can be specified by ranges of character codes. Examples: German requires the following: - the ability to process a single character as two distinct collation elements each of which is distinct from all other collation elements. An example is the character which looks similar to the Greek beta and is also referred to as . is collated as two identical collation elements which are ordered between and . Experts understand the issues of Chinese "character collation", French collation concerns, and Japanese "word collation". They are too long to give as examples here. Due to the complexity of collation issues, a reference to a standard work on collation for your culture or language would be very useful. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Numbers: Numbers can be represented with or without "thousands separators" (where the number of digits in a group can be varied) and with any character (e.g. `.' or `,') as the radix point Examples: 123456.7890 123 456.789 0 123,456.7890 123.456,7890 12 3456.7890 Possible issue: Some countries use Hindi digits. Are there other digit systems in use that would ever be used in portable computer programs? The representation of numbers as words is an issue: are there situations where in your culture you would represent a number as words differently than a similar culture which uses the same numeric representation and or language? (An example is the difference between the British and American meaning of "billion".) Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Currency. Currency can be represented using any of the numeric formats, but can be separately identified from the numeric formats. (That is, numeric formats could use a different thousands separator from monetary formats.) Separate local and international currency symbols are maintained. The currency symbol can be placed at the beginning or the end, and can be multiple characters. It can be separated from the amount by a space. The decimal delimiter can be specified. Specific strings can be used for specific signs. $123456.45 $ 123 456.45 ( 123 456.45 ) 123 456.45 CR 123 456$45 Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Messages and Responses: Messages, and the strings that the user uses to respond to messages, can be kept separate from the program, and can be separately translated (not automatically, however) to any supported language. The order in which substitutions (such as amounts or names) appear can be controlled. The text of message responses can be stored in the same way. A single string for "yes", and a single string for "no" is always available. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Text presentation: Not all natural languages are read and written in the European left-to-right, top-to-bottom order of presenting characters. Presentation in either right-to-left, top-to-bottom or top-to-bottom, right-to-left order are also currently available. Inclusion of left to right digits in right-to-left text is understood as a need. Possible issues: directions that vary within a single document, other major direction patterns; in a computer environment, displays typically "scroll". Does this present problems? Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Color usage: It has often been expressed that conventions on the use of colors to indicate various states varies culturally. However, there has been little concrete information collected. An example might be the conventions (based on traffic lights) of red for stop or emergency, yellow for caution, and green for OK or "go". Another might be that certain Native American languages do not distinguish between blue and green (treating them as a single color). Would situations such as this affect the actual use of those colors as clues to meaning? Within your culture? Could you provide information on the use of colors, particularly where you are aware of cultural conflicts. Icons. Icons are symbolic objects used to indicate a concept, such as the international traffic signs. The number of possible icons is quite large, and ultimately depends upon the application. However there are some generic icons that are used worldwide, such as the slashed circle for "do not" or "no". Are you aware of icon usage where an icon used in your culture would conflict with icons used in other cultures, either because it would be confusing or meaningless? By "confusing" we are looking for icons clash with another visually similar icon in a different culture with another meaning, and very particularly with an opposite meaning. Icon components, such as the slashed circle, are particularly of interest. Even if a general solution as discussed for some of the problems above is not found, documentation of these issues will help application writers avoid possible pitfalls in this area.