From donn@hpfcrn.fc.hp.com Thu Mar 14 18:49:20 1991 Received: from hpfcla.fc.hp.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA13135; Thu, 14 Mar 91 18:49:20 +0100 Received: from hpfcrn.fc.hp.com by hpfcla.fc.hp.com with SMTP (15.11.1.6/15.5+IOS 3.20) id AA12704; Thu, 14 Mar 91 10:43:40 -0700 Received: from hpfcdonn by hpfcrn.HP.COM; Thu, 14 Mar 91 10:49:39 -0700 Message-Id: <9103141749.AA07530@hpfcrn.HP.COM> To: wg15rin@dkuug.dk Subject: Questionaire (finally) Date: Thu, 14 Mar 91 10:48:36 MST From: Donn Terry X-Charset: ASCII X-Char-Esc: 29 **** DRAFT**** For comment on the questionaire itself To: WG15 RIN From: Donn Terry Subj: Questionaire Here's the next draft of the questionaire (after MUCH delay; my apologies). Again, this is for review of the questionaire itself, not for answering (except possibly as a tool to QA it.) Feel free to distribute it informally, as long as this note is attached indicating that it is *not yet time* to actually use it for its intended purpose. Your comments back to me to further improve is are more than welcome. ------------------------------------------------------------------------- To: National Standards Bodies, ISO member countries. From: Internationalization Rapporteur group SC22/WG15 Subj: National conventions. Below, please find a questionnaire concerning national conventions associated with handling information that is often processed using computers. The purpose of the questionnaire is to identify conventions that vary from country to country with the intent to help make applications more adaptable and software development more effective by making it easy for the applications to adapt to local conventions. As computers become more prevalent, they must deal with local and national cultural conventions, rather than reflecting the conventions of limited populations. To do this, much information must be gathered so that the mechanisms can deal successfully with all the relevant conventions rather than finding that some were omitted and cannot be easily retrofitted. The technology is not ready to deal with issues such as natural-language translation, but issues such as time and date, currency, and timezones are ready to be considered. The issue of character sets is being addressed in SC2. We presume that the necessary characters can be represented. We would like to have the questionaire filled out by as many nations, representing as many cultures, as possible. Within a nation that has more than one culture or set of conventions, please fill it out for each culture or set of conventions. The viewpoint reflected should be that of the culture, rather than responding from the viewpoint of a computer expert who is able to deal with the representations that do not match the culture. The questionnaire first explains what the issue is, and then shows examples of what the current technology can deal with. This is to give you an idea of what the problem is, as it is currently perceived. We would ask you to answer several questions in each area: 1) Is the current technology (as represented in the questionnaire, not in terms of actual products) minimally acceptable; can you operate successfully in your culture, for computer use only, with what is available? 2) Is the current technology adequate for most computer usage? Does it meet all your national or cultural needs when computers are being used as data processing devices? If not, please describe the problem, and how that information should be represented to meet local needs. 3) Is the current technology adequate for non-expert usage? Are there situations where people who do not normally use computers would be presented with information in an unfamiliar form if the current technology were not extended? Again, we would ask for descriptions of the problem if the needs are not fully met. 4) We realize that there are also historical usages, such as obsolete currencies, that would need to be represented in textual documents. If those would also be used by computers, in terms of manipulating them, please describe them. If, however, a computer would not have to deal with them (except possibly as uninterpreted text) they are not within the goals of this questionnaire, as it is the manipulation, not representation, of that information that is important. 5) There are also several classes of concepts where we are aware there are issues, but which are for some reason or other not currently of interest within this questionaire: these include: Character sets: there are other activities dealing with that issue. (SC2, some other SC22 work.) This questionaire assumes that the necessary characters can be processed, represented, and displayed as needed. Anything internal to a computer program (such as identifiers, comments or function names). This does not relate to "applications portability". (Presumably things such as taging of data for the culture that it comes from must occur, but this is not visible to the application user.) Functions which are not "generic" to most applications: For example, portability of payroll programs (including such issues as tax law) is not of interest. Document layout also falls in this category as only document processing programs are concerned with this. Issues involving understanding natural language: translation (obviously), hyphenation of words, Please use the examples as a guideline both to understand the questions we are asking, and also to help us understand your response. In no case can the examples be complete. If you are unsure whether the needs of your culture are met, indicate that, and we can evaluate the situation to see if the technology can already do it. Because of the diversity of cultures, it may not be possible to represent every concept in all possible ways at a reasonable cost. However, by knowing of the issues, we can hope to do a better job than otherwise. It remains up to the programmer to actually use these facilities, so they will not automatically be present in programs even when they are available. Where we suspect that there might be a problem, a list of "possible issues", to start thinking about the problems, is mentioned. Background information: Name and Address: Telephone: Fax: Electronic Mail (if available): Country Described: Locale/Culture within the country: Other information about the cultural conventions you think we should have: Did we miss any classes of conventions that affect that culture particularly? Any other comments: Date and time. Dates and times can be converted from an internal representation (representing UCT) to external forms with the following rules: The month can be represented as: - one or two digit number (or ordinal) - a two digit number (or ordinal) - a month name abbreviation - an arbitrary length month name - Capitalization of the month name can be varied. The day of month can be represented as: - a one or two digit number (or ordinal) - a two digit number The year can be represented as: - The four digits of the Western era - the last two digits of the Western era (or ordinal) - Other eras: + Dates can be started from other bases than the Western era + Names of eras can be attached + Years can be named as ordinals. The day of the week can be represented as: - A day of week name abbreviation - An arbitrary length week day name - Numeric day of the week (0=Sunday) (or ordinal) - Capitalization of the day name can be controlled. Hours can be represented as: - One or two decimal digits (or ordinal) - Two decimal digits. (or ordinal) - In 12 or 24 hour time, with or without AM/PM notation. + The AM and PM notation can be changed. Minutes and seconds can be represented as: - Two decimal digits (or ordinal) Weeks can be represented as the week number of the year. Either Sunday or Monday can be used as the first day of the week. The current timezone name can be printed; it is an arbitrary string. The above elements can be combined in arbitrary order, with any fixed punctuation between them. Wherever numbers can be printed, alternate number strings can be used instead. (For the range 0-99.) These are arbitrary strings, and thus can be natural language ordinals (either western ordinals or Kanji digits), alternate (e.g. Hindi) digits, or the like. (The examples below use English, but Hindi digits or Asian Era dates and times are the primary intent of this functionality.) The locale allows for two default date formats, so that, for example, both a Gregorian and a locale-specific date can be used. Some example dates that can be generated include Feb 28, 1990 february 28, 1990 HH2Y2M28D (Where HH Y M and D would be Kanji) Wednesday 28 February, 1990 02/28/1990 28/02/90 28 II, 1990 (The month name would be a Roman Numeral) The 28th Day in February in 1990. The TwentyEighth Day of February of 1990. 10:01 PM 2201 PM 1001 10:01:02 22:01:02 10:10 PM EST Possible issues: solar time, calendars that do not align with the Western one. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Timezones: The timezone in which a date or time needs to be represented needs to be represented as an offset from GMT. Timezones can be represented in terms of: - Offset from UCT, in hours, minutes and seconds, + or - 24 hours. - The start and end of daylight/summer time: + In terms of a day number of the year + In terms of a particular day of week, week of month, and month number + At a specified time. - The offset (in hours, minutes, and seconds) of daylight/summer time from the normal time. - The names of the normal and summer timezones Examples: 7 hours west of GMT, with one hour for summer time on the first Sunday of April, ending on the last Sunday of October, both at 0200. Names MST and MDT. One hour east of GMT, one hour for summer time, starting on the last Sunday in March and ending the last Sunday in September. Names MEZ and MESZ. (Or MET and METDST.) Nine and 1/2 hours east of GMT, one hour for summer time, starting on the first Sunday in October, ending the first Sunday in March. Names CST, CDT (Australia.) Five Hours west of GMT. No summer time, but the timezone name changes in the summer by the same rules as the first example. Names of EST and CDT (Indiana, USA). Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Character set characteristics: Character sets can be classified into the following classes: Upper Case Lower Case Numeric Punctuation White space (characters that just move the print position) (Plus several that are primarily for computer usage). Translation of characters between upper and lower case can be done with or without loss of accent marks. These concepts need not be applied to languages which do not have the concepts of case or other character classes. Examples: The character a-accent-grave can be translated to either A or A-accent-grave. The three Russian characters that never occur in upper case can be left alone during translation. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Collation: Collation is the ordering of textual material into some predefined order. The rules which can be used to determine the collation of text include: - The specification of a collation order different from that which occurs naturally in the computer character set. (French and Canadian French use the same character codes, but collate in different orders.) - Certain characters do not participate in collation decisions. For example, as required on page 10 of Webster's Ninth New Collegiate Dictionary: The main entries follow one another in alphabetical order letter by letter without regard to intervening spaces or hyphens: - Certain characters should collate equally even if they are different characters. (E.g. in some languages the accented vowels are all equal and the accents do not participate in collation decisions.) - Certain characters should collate equally until they are the only difference, and then collate in a specified order. (As in the example above, only when two strings differ only by accent marks, the order is specified.) - Certain pairs of characters should be treated as a single character. (E.g. ll and ch in Spanish.) - Certain characters should collate as if they were two characters. (S-zed in German, the ae diphthong.) - Collation can be done either with upper and lower case characters distinct, or with the upper and lower case characters treated equivalently. The upper to lower translations mentioned for character collation can be done. - Collation order for large character sets (Asian) can be specified by ranges. Examples: German requires the following: - the ability to process a single character as two distinct collation elements each of which is distinct from all other collation elements. An example is the character which looks similar to the Greek beta and is also referred to as . is collated as two identical collation elements which are ordered between and . Experts understand the issues of Chinese "character collation", French collation concerns, and Japanese "word collation". They are too long to give as examples here. Due to the complexity of collation issues, a reference to a standard work on collation for your culture or language would be very useful. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Numbers: Numbers can be represented with or without "thousands separators" (where the number of digits in a group can be varied) and with either . or , as the radix point. Examples: 123456.7890 123 456.789 0 123,456.7890 123.456,7890 12 3456.7890 Possible issue: Some countries use Hindi digits. Are there other digit systems in use that would ever be used in portable computer programs? Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Currency. Currency can be represented using any of the numeric formats, but can be separately identified from the numeric formats. (That is, numeric formats could use a different thousands separator from monetary formats.) Separate local and international currency symbols are maintained. The currency symbol can be placed at the beginning or the end, and can be multiple characters. It can be separated from the amount by a space. The decimal delimiter can be specified. Specific strings can be used for specific signs. $123456.45 $ 123 456.45 ( 123 456.45 ) 123 456.45 CR 123 456$45 Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Messages and Responses: Messages, and the strings that the user uses to respond to messages, can be kept separate from the program, and can be separately translated (not automatically, however) to any supported language. The order in which substitutions (such as amounts or names) appear can be controlled. The text of message responses can be stored in the same way. A single string for "yes", and a single string for "no" is always available. Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Text presentation: Not all natural languages are read and written in the European left-to-right, top-to-bottom order of presenting characters. Presentation in either right-to-left, top-to-bottom or top-to-bottom, left-to-right order is currently available. Inclusion of left to right digits in right-to-left text is understood as a need. Possible issues: varying directions, other major direction patterns; in a computer environment, displays typically "scroll". Does this present problems? Is some combination of the elements above minimally acceptable for usage limited strictly to computer issues? Is some combination of the elements above suitable for use by users of applications, who have some training in the application? Is some combination of the elements above suitable for use by the average individual, not trained in the use of an application? Is some combination of the elements above suitable for use by people who only see the products of the application, and may not even be aware that a computer was used? For each of these questions, please also indicate that if the need is not met why not, and how big a problem does it present? How much, in terms of increased costs (in terms of money or application usability), is it worth to fix it? Color usage: It has often been expressed that conventions on the use of colors to indicate various states varies culturally. However, there has been little concrete information collected. An example might be the conventions (based on traffic lights) of red for stop or emergency, yellow for caution, and green for OK or "go". Another might be that certain Native American languages do not distinguish between blue and green (treating them as a single color). Would situations such as this affect the acutal use of those colors as clues to meaning? Within your culture? Could you provide information on the use of colors, particularly where you are aware of cultural conflicts. --------