ISO/IEC JTC1/SC22/WG15 Rapporteur Group on Internationalization Minutes of London Meeting, 90-10-04 - 05 Document JTC1/SC22/WG15RIN-N031 Attendance RB Ralph Barker (U.S.) UniForum Technical Committee Subcommittee on internationalization; (Expert invited by U.S.) DD Dominic Dunlop U.K. rapporteur GL Greger Leijonhufvud (Sweden) Technical reviewer, IEEE 1003.2 (Expert invited by U.K.) SN Shigekatsu Nakao Alternate for Japanese rapporteur, Prof. Nobuo Saito KS Keld Simonsen Danish rapporteur DT Donn Terry U.S. rapporteur 1. Opening_of_Meeting The meeting convened at the BSI Conference Centre, London, England at 10:00 on 90-10-04. 1.1 Introductions_and_roll_call_of_rapporteurs The rapporteurs introduced themselves. The names and affiliations of the internationalization rapporteurs and invited experts appear above. A full address list appears as an attachment to this document. 1.2 Selection_of_chair,_secretary,_drafting As agreed at the group's previous meeting, Keld Simonsen chaired the meeting. Dominic Dunlop agreed to act as secretary, and Greger Leijonhufvud to supervise drafting, with help in reviewing from Shigekatsu Nakao and the rest of those present. 1.3 Adoption_of_agenda The agenda appearing as an attachment to these minutes was adopted. 90-11-15 Page 1 RIN Minutes 1.4 Approval_of_minutes The minutes of Paris meeting of 90-06-11 - 12, document JTC1/SC22/WG15RIN-N028, were approved. 2. Status,_liaison_and_action_item_reports 2.1 Review_of_action_items DD: Write up minutes (WG15RIN-N028), circulate by electronic mail, and forward to convener for entry into document register. Done. KS: (Carried over.) Set up electronic forum to canvas European opinions on and solutions to the problem of accommodation of IS 646 within the framework of POSIX. In progress. (Carried over again.) KS: Work up point 10 from JTC1/SC22/WG15-N091, Danish national POSIX.2 locale definition, into a full proposal for evaluation by group. Done. All: Review JTC1/SC22/WG15/RIN-N021, draft questionnaire, and respond, by electronic mail if possible, to DT. In progress. (Carried over again.) 2.2 Report_on_IEEE_1003.x_internationalization_status DT reported to the group that, following the incorporation of "tiny editorial changes", printing plates for the the new revision of POSIX.1 would be sent to ITTF (the ISO Information Technology Task Force). While there is a chance that the document will appear as IS 9945-1:1990, it is more likely to be IS 9945-1:1991, as it is the date of publication, rather than that of approval, which is significant. ANSI/IEEE Std. 1003.1:1990 will be identical in content, and may appear sooner. ITTF still has slight problems over copyright notices covering AT&T and UniForum material appearing in the standard. It was hoped that these issues could be resolved at an SC22 meeting during the week of 90-10-08. 90-11-15 Page 2 RIN Minutes A few internationalization issues raised in ballot comments remain outstanding from POSIX.1, presumably to be addressed by later revisions of the standard. Discussion of these was deferred to f4, New Business. The extensions to POSIX.1 defined in the current draft of 1003.1 no longer have significant internationalization implications, with the exception of the ISO labeled tape data interchange format, which raises the same issues on the handling of filename characters outside the portable filename character set as does POSIX.2's pax. Again, discussion was deferred. Reporting on draft 10 of 1003.2, DT said that over 300 objections had been received, with around 80 still outstanding -- many touching on internationalization. Although a considerable improvement over the response to draft 9, the level of objections argued against submitting draft 10 to ISO for acceptance as a CD. Draft 11, due over the new year, may be stable enough to submit. The final standard is likely to correspond to draft 12 or 13. A discussion on mechanisms for obtaining timely international comments on IEEE drafts ensued, culminating in the formulation of resolution 4. GL reported that 1003.2a (User Portability Extension) was in ballot, but had little internationalization content. Briefly summarizing activity elsewhere in IEEE TCOS, DT identified 1003.8, Transparent File Access, as having clear implications for internationalization. There were also issues, many of them not yet clearly identified, in other groups. Resolutions 5 and 6 aim to establish a dialogue between RIN and the working groups involved. Electronic mail is clearly an important element in such dialogues. The group decided to set up an open mailing list, i18n@dkuug.dk, in addition to the existing closed list (wg15rin@dkuug.dk). The new list will be publicized among interested parties, and steps will be taken to serialize and archive the traffic on both lists. (See f6.3, Action items.) 2.3 Liaison reports: SC22 internationalization working group; SC22 ad hoc working group on character sets KS reported that the main development of interest to RIN was DS' recommendation that future SC22 internationalization activity should be based on the work of WG15 -- even if a few of the requirements which had 90-11-15 Page 3 RIN Minutes emerged in SC22's work, such as a method of handling measurement units, were not yet covered by POSIX. The group agreed that the proposal made sense, particularly since the localedef mechanism is extensible to address future needs. Resolution 7 commends the Danish proposal. 2.4 TSG1_internationalization_activity DT reported that he was unaware of any developments in TSG1, which is in the process of producing its final report. The future of the group following its publication is unclear. 2.5 Other_internationalization_activity_ GL reported briefly that the UniForum Technical Committee Subcommittee on Internationalization, having completed its messaging proposal, had shifted its focus to two areas: internationalization architecture; and data announcement. Both were to be discussed at a subcommittee meeting commencing on 90-10-08. GL and DT reported that X/Open was close to freezing the functional content of XPG4, with a projected availability date of 1992. The document will represent a considerable increase in internationalization content relative to XPG3. In order that the group can be better informed of X/Open's direction, an informal report is to be requested from the organization. (See f6.3, Action items.) GL expressed concern at the number of organizations currently in the process of defining Application Binary Interfaces (ABIs) with the aim of creating de facto standards. If ABIs are too tightly written, it may prove difficult or impossible to preserve binary compatibility in the face of future standards designed only to preserve source compatibility with previous standards. This would be a barrier to the adoption of new consensus standards. The group agreed that developments in this area should be monitored. NS drew the group's attention to PortSoft, a grouping of Pacific rim UNIX user groups. It has a specific interest in the promotion of the use of multi-byte character sets. 90-11-15 Page 4 RIN Minutes 3. Balloting_Activities These issues had been discussed under f2.2 above, and are revisited in f4.4. 4. New_Business 4.1 Example_national_profiles_and_locales 4.1.1 Danish The group discussed the details of the Danish locale material in Annex E of IS 9954-1:1990 and Annex F of IEEE 1003.2, draft 10. While there was no problem with the former, the group had a number of questions about the latter. In particular, choice of code 29 as an escape character was questioned, as was the use of short, but often opaque, charmap names for characters outside the portable filename character set. KS stated that both reflect compromises intended to facilitate the entry of any character in the charmap from terminals incapable of generating all encodings. Similar, but inevitably mutually incompatible methods are used in existing commercial systems. Tests carried out to date suggest that the method works, but further testing is required. The group agreed that information about the charmap names and rationale should be included in draft 11 of 1003.2, and that amended versions of the Annex should be circulated to interested parties for comment as soon as possible. (See f6.3, Action items.) 4.1.2 Japanese The group discussed a preliminary draft of a Japanese national profile for POSIX under development by the Japanese SSI POSIX group. (A later draft of the document will be made available for registration by the group.) The material raised many issues, which were discussed in detail by the group, and will be summarized in a separate document by DD. A number of action items resulted. (See f6.3, Action items.) Of the issues raised, a few appeared to be specific to Japan, and hence within the scope of that country's national profile. The majority are probably relevant to several territories, suggesting that they should be addressed by the mechanisms defined in international standards for POSIX. For example, the definition of an LC_MEASUREMENT locale category may be desirable. 90-11-15 Page 5 RIN Minutes Knowledge gained through the discussion of the issues raised by Japan should be fed into the questionnaire, in order to determine which are of international interest. (See f6.3, Action items.) Collation was among the issues discussed. It was agreed that material describing multi-level collation should be incorporated into the next draft of 1003.2 in order to give greater guidance to those defining locales. (See f6.3, Action items.) 4.2 Harmonization_issues 4.2.1 Harmonization and collection of national locales and profiles Japan proposed, and the group agreed that it was important to discourage needless divergence among the means used to express locales. For example, the charmap names corresponding to a given glyph should not differ between locale definitions without good reason. Clearly, it is the responsibility of ISO to define guidelines for harmonization of this type, as the matter is outside the scope of any one member body (such as ANSI). DT suggested that a future resolution could express the group's concerns to WG15, and request a national working group harmonization meeting under existing rules. This would, among other things, serve notice on interested parties that they should be concerning themselves with the topic. Before the group can put forward such a resolution, however, it needs to do more work in defining the precise areas of concern. Among these are - Creation of new categories by national bodies. - Creation of new mechanisms by national bodies. - Extension by national bodies of limits in the standard. (For example, of equivalence classes.) An action for the production of a working paper on the topic was put on the Japanese rapporteur, Dr. Nobuo Saito. (See f6.3, Action items.) The group will discuss the eventual disposition of this information, which should be as easily available as the standards to which it refers, yet not appear to be POSIX-specific, at a later meeting. 90-11-15 Page 6 RIN Minutes 4.2.2 Character_naming The group formulated resolution 8 in connection with the harmonization of character names. (See also f4.8 below.) 4.2.3 Yen_symbol_(Y) The Japanese have a specific problem in a that JIS X 0201, a very widely-used Japanese character encoding, gives the Yen symbol, Y, the same encoding as ASCII's backslash (\). The Japanese wish to find a way of using the same encoding both for single-character quoting in the shell and elsewhere, and as a currency symbol. The context of the code should determine its display representation and its treatment by parsers. On the face of it, this does not appear to be possible. DD stated that the U.K. position on a similar issue, where the British pounds symbol could either be a currency symbol, or flag a shell comment, had been to resolve the issue in favour of shell syntax, as this corresponded to common usage. A conforming shell script could use new POSIX.2 mechanisms unambiguously to deliver the currency symbol in the current locale if it were needed for display. In Japan's case, the issue is further clouded by the wording of 1003.2, f2.2.2, which can be read as disallowing any alternate currency symbol other than pounds and $. The group did not discuss alternative wording. The group discussed a number of work-arounds, such as character doubling, and heuristic translation of narrower to wider character sets, but reached no resolution. KS observed that DS had submitted a potential solution to the U.S. member body, but had not received any formal comment. It was agreed that the proposal should be resubmitted, and that there should be a written response. (See f6.3, Action items.) The specific issue of the Yen sign will be rolled into the ongoing exploration by KS of seven-bit character set issues. 4.3 Character_classes_(Kanji,_Katakana,_Cyrillic_etc.) GL proposed, and the group agreed, that "All conforming implementations shall support at least, and all portable applications shall use no more than, ". (The final 90-11-15 Page 7 RIN Minutes phrase is shorthand for the classes recognized by common- usage C and other tools of similar vintage.) It is expected that the ISO standard for the C programming language will (sooner or later) define a mechanism for creating arbitrary character classes by adding new keywords to the LC_TYPE category. A short discussion on implementation ensued, reaching the conclusion that the cost of supporting a new and more general class mechanism need not be high. 4.4 Localedef_discussion The IEEE 1003.2 ballot on draft 10 has precipitated a number of objections both to the need for support of internationalization in general, and to particular aspects of the support proposed. These objections can be characterized as coming from the Berkeley UNIX community, and as reflecting concern that the proposed internationalization support, which does not correspond to existing practice, will force POSIX-conforming implementations to be bigger and slower than would otherwise be the case. One proposed solution is to delete all internationalization support from the draft, or to make it completely optional. The 1003.2 reviewers, represented by DT and GL, would like to find a compromise which would allow these objections to be withdrawn, as a document without some level of mandatory internationalization support would clearly be unacceptable to ISO member bodies. The guidance of RIN was sought. Giving close consideration to the internationalization requirements of current drafts, the group felt that it was already the case that implementations would have considerable latitude in the degree of change that they would allow to the default and/or POSIX locales. Further wording about error returns from localedef could make this clear. Of course, the conformance document for such an implementation would have to make it apparent that support for arbitrary locales was not provided. (Although support for a small number of implementor-supplied locales might be provided.) The group was of the opinion that such implementations might be successful in niche markets, but would ultimately be unsuccessful in world markets. The editors will return to the objectors with this information. 90-11-15 Page 8 RIN Minutes 4.5 TZ,_era Work on defining national locales has revealed a need to accommodate timezone names including digits preceded by a plus or minus sign, as in GMT+7. (The current definition excludes both digits and signs.) It was agreed that the problem could be solved in a backward-compatible manner by introducing less-than and greater-than (< and >) as quoting characters for timezone names. Thus, TZ="+0" would set the TZ environment variable to a permissible value. IEEE Std. 1003.1 will be modified accordingly at its next revision, and the change will work its way through to a future version of 9945-1. The Japanese need for a national locale which deals with era dating linked to the years of accession of Emperors, had been discussed under f4.1.2 above. It is clear that the currently defined mechanisms are inadequate to this task. It is less clear how many locales outside Japan have similar requirements, and whether it will be possible to define common mechanisms which meet most or all of these needs as part of the POSIX standards. The questionnaire should seek answers to these questions. However, the group agreed that the TZ specification should be modified to permit era dates, and to allow the specification of date names for the first to thirty-first days of the month. 4.6 Pax_utility The group considers that the as-yet unresolved Canadian comment about the treatment in archive file formats of characters from outside the portable filename character set stems from a wording problem in the existing text, and can be fixed by an editorial change. (A comment on LOGNAME has already been resolved in a similar manner.) KS remarked that Denmark was working on a general solution whereby arbitrary filenames could be represented on an archive using only an escape character and sequences of portable filename characters spelling out names defined in the charmap. (Of course, it will be necessary for format-creating and format-reading utilities to agree on these names, emphasizing the desirability of harmonization.) A proposal to the IEEE will be forthcoming. DD enquired about the status of the public-domain version of the pax utility, the development of which had been sponsored by USENIX. DT replied that, as far as he was aware, it was at the 1003.2 draft 9 level, and, while 90-11-15 Page 9 RIN Minutes little effort would be needed to bring it into line with draft 10, USENIX was no longer funding development. 4.7 Layered_locales Work on the Danish locale has shown the value of a mechanism which could facilitate the definition of families of closely-related locales. For example, the Faroe Island locale differs from that of Denmark only in its TZ, LC_TIME and LC_RESPONSE descriptions. It was agreed to raise the issue of the addition of an or directive to the input format for the 1003.2 localedef utility with the 1003.2 technical editor. (See f6.3, Action items.) 4.8 Character_set_terminology Reviewing JTC1/SC22/WG15RIN-N030, the group decided to defer requests for formal action from other JTC1 groupings and/or member bodies until hearing the opinion of the POSIX technical editor on the impact of incorporating the terminology into 1003.2. (See f6.3, Action items.) The word "byte", used by POSIX documents in the context of a "unit of storage", is particularly problematic. To substitute "octet", the word preferred by ISO, would not be technically correct, and would result in many objections in the IEEE balloting process. There are also problems in POSIX' usage of the term "string". Briefly revisiting the issue of character names, DT commented that U.S. and British dictionaries tended to agree more with POSIX usage than with that of ISO. 4.9 Messages This issue was not discussed because of the pressure of time. The group had heard earlier that the IEEE 1003.1 working group had rejected messaging proposals from UniForum and X/Open at its July 1990 meeting, and that UniForum was not devoting resources to the further development of its proposal. 4.10 Summary_of_internationalization_issues Owing to a lack of time for discussion, DT requested that group members review the draft document that he circulated at the meeting, and comment by electronic mail. 90-11-15 Page 10 RIN Minutes 4.11 Questionnaire The notes for the previous subsection apply also to this topic. 5. Review_and_Approval_of_Resolutions 5.1 Resolutions The resolutions appearing as JTC1/SC22/WG15RIN-N032 were approved. 5.2 Recommendations_for_WG15 Resolutions 4 - 9 require action by WG15. The group decided that it was not necessary to forward any documents to WG15 from the meeting. 6. Closing_Procedures 6.1 Future meeting considerations -- request for invitations There was some feeling that the group should meet again well before the April 1991 meeting of WG15. Ultimately, however, it was decided to schedule the next meeting for the days immediately prior to the WG15 meeting, and to process as much business as possible by electronic mail until then. Next meeting: 90-04-22 - 23 The Netherlands (exact location t.b.a) [Note: Since the October 1990 meeting of WG15 rescheduled its Netherlands meeting to 91-05-13 - 17, it is suggested that RIN's next meeting should be moved to 91-05-13 - 14. A formal notice to this effect may be expected from the WG15 convener.] No further meetings were scheduled. 6.2 Document_number_assignment The following document numbers were assigned: 90-11-15 Page 11 RIN Minutes N029 Japan Draft raw answer to Donn Terry's questionnaire N030 9945-1 Memo on standard terminology technical editor related to character issues N031 RIN Minutes, agenda and attendance list from London, 90-10-04 - 05 N032 RIN Resolutions from London, 90-10-04 - 05 6.3 Review_of_action_items A list of action items compiled by DD was informally reviewed. Due to the pressure if time, it was agreed that final approval should take place once those present had seen these minutes: RB: Investigate the possibility of links between wg15rin and i18n mail lists and the uniforum-intl mail list. RB: Send KS mail list serializing/archiving software used by UniForum at Sun Microsystems. DD/KS: Send rapporteur list to RB by electronic mail for coordination purposes. DD: Obtain example costing figures on TCOS institutional representative responsibilities from John Quarterman, and forward to KS. DD: Write up minutes, agenda and attendance list (JTC1/SC22/WG15RIN-N031) and resolutions (JTC1/SC22/WG15RIN-N032), circulate by electronic mail, and forward to convener for entry in document register. DD: Summarize issues raised in discussion of Japanese locale; circulate to group. GL: Add % descriptor to date format descriptions, allowing for explicit printing of AM or PM (or their equivalents in a particular locale). GL: Amend 9945-2 draft to replace the LC_DATE concept of abbreviated month name with alternate month name. GL: Investigate the possibility of links between wg15rin mail list and UNIX International mail lists. SN: Request that a representative of X/Open Company Ltd. provides a report to RIN either in written form, or in person at the next meeting of RIN. 90-11-15 Page 12 RIN Minutes KS: (Ongoing) Continue canvassing European opinions on and solutions to the problem of accommodation of IS 646 within the framework of POSIX. KS: Capture RIN comments on informally-circulated 1003.2 draft 10, pass them back to Hal Jespersen, and subsequently circulate amended result to selected members of WG15 etc. KS: Send copy of proposal on escape character substitution to DT by electronic mail. KS: Include issue of collision between Yen-symbol (Y) and escape character in discussions on seven-bit coded character set issues. KS: Pass additional materials on collating, such as further papers from Alain LaBonte of CSA, to GL for use in production of future drafts of 1003.2 KS: Promote seeking of TCOS Institutional Representative status by EUUG. KS: Arrange to put serial numbers on all postings to mail-lists. KS: Set up public mail-list, i18n@dkuug.dk; announce to those who need to know and tell them to tell their friends. Dr. Saito: Produce paper on issues of locale production for group. (May already be action from WG15 Paris meeting.) DT et al: (Ongoing) Continue work on questionnaire DT: Review following topics, and, where necessary, amend draft questionnaire to solicit specific information: - Need for locale-dependent binary time and date information from system (struct tm) - Interest in and requirements for an LC_MEASUREMENT locale - Interest in greater flexibility in representation of AM, PM and related concepts, including noon and midnight, in output of date, and input to at. 90-11-15 Page 13 RIN Minutes DT: Advise X/Open that usage of upper-case for LANG codes in XPG is contrary to the recommendations of IS 639:1988, Code for the representation of names of languages; and that @ () should not be used in locale names, as it is an IS 646 national variant symbol. DT: Investigate the possibility of links between wg15rin mail list and X/Open mail lists. DT: Request X/Open to give consideration to the usefulness of an LC_MEASUREMENT locale, and to its potential contents. DT: Respond in writing to KS proposal on escape character substitution. DT: Review following topics, and, where necessary, amend draft survey to solicit specific information. DT: Request 9945-1 technical editor to report informally on the impact of incorporating the character-related terminology outlined in JTC1/SC22/WG15RIN-N030 into 1003.2. 6.4 Thanks_to_host As a representative of BSI, DD was thanked on behalf of the group for the provision of facilities and accommodation for the meeting. The meeting adjourned at 18:10 on 90-10-05. 90-11-15 Page 14 RIN Agenda Meeting of 90-10-04 - 05, London Attachment to Minutes 1. Opening of Meeting 1 1.1 Introductions and roll call of rapporteurs 1 1.2 Selection of chair, secretary, drafting 1 1.3 Adoption of agenda 1 1.4 Approval of minutes 2 2. Status, liaison and action item reports 2 2.1 Review of action items 2 2.2 Report on IEEE 1003.x internationalization status 2 2.3 Liaison reports: SC22 internationalization working group; SC22 ad hoc working group on character sets 3 2.4 TSG1 internationalization activity 4 2.5 Other internationalization activity 4 3. Balloting Activities 5 4. New Business 5 4.1 Example national profiles and locales 5 4.2 Harmonization issues 6 4.3 Character classes (Kanji, Katakana, Cyrillic etc.) 7 4.4 Localedef discussion 8 4.5 TZ, era 9 4.6 Pax utility 9 4.7 Layered locales 10 4.8 Character set terminology 10 4.9 Messages 10 4.10 Summary of internationalization issues 10 4.11 Questionnaire 11 5. Review and Approval of Resolutions 11 5.1 Resolutions 11 5.2 Recommendations for WG15 11 6. Closing Procedures 11 6.1 Future meeting considerations -- request for invitations 11 6.2 Document number assignment 11 6.3 Review of action items 12 6.4 Thanks to host 14 - i -