ISO/IEC TR 14766 WD1

Guidelines for POSIX National Profile and National Locale

ISO/IEC TR 14766 WD1 - Guidelines for POSIX National Profile and National Locale

ISO/IEC TR 14766
Information technology -
Guidelines for POSIX National Profile and National Locale

Working Draft 1
1997-10-16


FOREWORD

ISO (the International Organization for Standardization) and IEC
(the International Electrical Commission) together form a system
for worldwide standardization as a whole. National bodies that
are members of ISO or IEC participate in the development of
International Standards through technical committees established
by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate
in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC,
also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical
committee, ISO/IEC JTC 1.

The main task of a technical committee is to prepare
International Standards but in exceptional circumstances, the
publication of a Technical Report of one of the following types
may be proposed:

- type 1, when the required support cannot be obtained for
the publication of an International Standard, despite repeated efforts;

- type 2, when the subject is still under technical
development or where for any other reason there is the
future but not immediate possibility of an agreement on an
International Standard;

- type 3, when a technical committee has collected data of a
different kind from that which is normally published as an
International Standard ("state of the art", for example).

Technical Reports of types 1 and 2 are subject to review within
three years of publication, to decide whether they can be
transformed into International Standards. Technical Report of
type 3 do not necessarily have to be reviewed until the date they
provide are considered to be no longer valid or useful.

ISO/IEC PDTR 14766, which is a Technical Report of type 3, has
been prepared by ISO/IEC JTC 1/SC22/WG15 - POSIX.

Suggestions and comments for improvement of this document are
welcome. They should be sent to:

Keld Simonsen
Sankt Jrgens Alle 8
DK-1615 Copenhagen V
Denmark

Tel: +45 3122-6543 Fax: +45 3325-6543 Email: keld@dkuug.dk

ISO/IEC TR 14766 WD1 - Guidelines for POSIX National Profile and
National Locale

CONTENTS
Note: actual page numbers may differ, because of additions to
the text.
1.
Scope................................................................ 1

2. Normative References............................................. 2

3.
Definitions.......................................................... 3
3.1 Terms defined in thisreport.................................... 3
3.1.1 POSIX Profile 3
3.1.2 POSIX National Profile 3
3.1.3 POSIX National Locale 3
3.1.4 POSIX National Body Conformance 3
3.2 Terms defined in other documents............................... 4

4.
Abbreviations........................................................ 5

5. Purpose of National Profiles and National Locales................ 6
5.1 Purpose of National Profiles 6
5.2 Purpose of National Locales 6

6. Concept of National Profiles..................................... 7
6.1 The relationship to base standards............................. 8
6.2 The relationship to Registration Authority..................... 8
6.3 Principles of National Profile Content......................... 9
6.3.1 General Principles 9
6.3.2 Principles of National Profile Content 9
6.3.3 Main elements of a National Profile Definition 9
6.4 The meaning of conformance to a National Profile............... 10
6.5 Conformance requirements of POSIX National Profiles............ 10
6.6 Implementation Conformance..................................... 11
6.6.1 General 11
6.6.2 Requirements 11
6.7 POSIX Application Conformance for National Profiles............ 11
6.7.1 <National Body> Conforming POSIX Application 12
6.7.2 <National Body> Conforming POSIX Application Using
Extensions 12

7. Contents of National Profile..................................... 13

8. Concept and consideration of National Locale..................... 17
8.1 Concept of POSIX National Locale and charmap
8.2 Contents of National Locale
8.3 Consideration on character classification and transformation
8.4 Consideration on numeric format
8.5 Consideration on monetary format
8.6 Consideration on collating sequence
8.7 Consideration on collating sequence
8.8 Consideration on messages

9. Using existing locale
9.1 WG15 locale collection
9.2 replace-after technique

Annex A. Locale related descriptions in POSIX

Annex B. Symbolic character name

Annex C. Convenient tools for producing National Locale

Annex D. Examples of National Profile - Japan

Annex E. Examples of National Locale - Denmark

Annex F. Use of ISO/IEC 10646 in POSIX standards

1. Scope

This Technical Report provides a guideline for ISO Member
Bodies in the process of making National Profiles and
National Locales for the ISO/IEC 9945 POSIX series of
standards.

- National Profiles provides requirements for making POSIX
suitable in the culture, by specifying options needed of the
POSIX standards and national standards to be applied.
Implementers can then comply to the POSIX National Profile to
make their product suited for the market, and ISO member
bodies can facilitate procurement by making National Profiles
which are national standards. Users can obtain products which
are suited for their needs and with consistent behaviour
across applications and platforms. A National Profile may
include National Locale specifications.

- National Locales specify options to POSIX standards in
POSIX locale format, on data that varies culturally.
Applications can be written in a internationally portable way
by removing hard-coded culturally dependent data or
functions, and using the POSIX National Locale data instead.
Implementers can, using the National Locales, be relieved
from specifying the often very complex internationalization
data them self and instead rely on a credible source as the
ISO Member bodies. Users can benefit from products that are
suited for their cultural needs and obtain consistent
behaviour across applications and platforms. ISO member
bodies can facilitate this process and provide procurement
specifications via national standards on National Locales.

__________

1. Hereafter through this document, for simplicity of wording,
the word National Profile is used as synonym of the word POSIX
National Profile, unless otherwise stated.

2. References

The following standards contain provisions which constitute
provisions of this report.

ISO/IEC 9945-1:1990, Information technology - Portable Operating
System Interface (POSIX) - Part 1: System Application Program
Interface (API) [C Language]

ISO/IEC 9945-2:1993 Information technology - Portable Operating
System Interface (POSIX) - Part 2: Shell and Utilities

ISO/IEC 646:1991, Information processing - ISO 7-bit coded
character set for information interchange.

ISO 2022, Information processing - 7-bit and 8-bit coded
character sets - Code extension techniques.

ISO 8859, Information processing - 8-bit single-byte coded
graphic character sets - Part 1, .., Part 10.

ISO/IEC 10646-1:1993, Information technology - Universal Coded
Character Set (UCS)

ISO/IEC Directives:1990, Procedures for the technical work of
ISO/IEC JTC 1 on Information Technology.

ISO/IEC Directives Part 2:1989, Methodology for the development
of International Standards.

ISO/IEC Directives Part 3:1989?, Drafting and presentation of
International Standards.

ISO/IEC 9899:1990, Programming languages - C.

ISO/IEC 9899 AM 1:1993, Multibyte Support Extensions.

TSG-1 Final Report (ISO/IEC JTC 1 N1335).

IEEE P1003.0/D16 (August 1993), ISO/IEC JTC1/SGFS N1030, Draft
Guide to the POSIX Open Systems Environment.

IEEE P1003.18/D5 (September 1991), Draft Standard for Information
Technology - Standardized Profile - USI-P1001 Platform.

ISO/IEC TR 10000-1:1990, Information technology - Framework and
taxonomy of International Standardized Profiles - Part 1:
Framework.

ISO/IEC TR 10000-2:1990, Information technology - Framework and
taxonomy of International Standardized Profiles - Part 2:
Taxonomy of Profiles.

3. Definitions

For the purpose of this technical report the following
definitions apply.

3.1 Terms defined in this report

3.1.1 POSIX Profile

A profile for an International Standard is a set of
specifications of the parameters, the selections of the optional
items and the recommendations of the implementation related
matters. A POSIX Profile corresponds to the Profile concept for
the POSIX International Standard.

3.1.2 POSIX National Profile

A National Profile is a subset of a POSIX Profile which is
strongly related to the culture dependent aspects of the POSIX.
It also contains the definitions and recommendations for the
usage of national/regional standards which support the handling
of the nation and/or area specific aspects (e.g. the use of the
coded character sets and so on).

3.1.3 POSIX National Locale

A National Locale is a subset of a National Profile, which gives
profile options in the POSIX localedef format.

3.1.4 POSIX National Body Conformance

It is the concept of the degree of the preciseness of the
coincidence between the specifications of a realized POSIX system
and the POSIX National Profile. Since the POSIX National Profile
is not necessarily included in the POSIX Profile, systems which
conforms to the POSIX National Body Conformance may not pass the
POSIX Conformance requirements.

3.2 Terms defined in other documents

This part of the report uses the following terms defined in other
relevant documents:

a. Internationalization -- TSG-1 Final Report, IEEE P1003.0

b. Localization -- TSG-1 Final Report, IEEE P1003.0

c. Portability -- TSG-1 Final Report

d. Locale -- ISO/IEC 9945-1, ISO/IEC 9945-2, ISO/IEC 9899

4. Abbreviations

Note: removed ISP because SGFS does not consider POSIX National
Profiles as Profiles.

5. Purpose of National Profile and National Locale

5.1 Purpose of National Profiles

National Profiles for POSIX based international standards define
culture- and language- dependent adaptation and interpretation of

POSIX for the following purposes.

- National Profile identifies the base international and
national/regional standards and clarify the relationships
among them.

- National Profile identifies the base standards, together
with appropriate culture- and language-specific classes,
subsets, options and parameters, which are necessary to
assure higher degree of portability.

- National Profile gives detailed description of locale-
dependent functions that are out of the scope of the Base
International Standard which provides frameworks for
internationalization so that national bodies can define
appropriate language and culture dependent adaptation and
interpretation based on it,

- National Profile provides reference systems on top of which
culture- and language-dependent applications can be built to
promote POSIX based standards among users and vendors,

- National Profile promotes the development of conformance
tests that produce consistent results for the systems
compliant with POSIX and a given national profile.

Various bodies throughout the world are undertaking work in the
definition of National Profiles for POSIX based international
standards.

This Guideline for POSIX National Profile Writers has been
developed by SC22/WG15 to make the National Profiles consistent
and the harmonization of the National Profiles easier by defining
the followings;

- Define style, documentation scope and classification scheme
for National Profiles.

- Define those items that should be written in National
Profiles

- Define those items that should not be written in National
Profiles

5.2 The purpose of the National locale

The purpose of the national locale is to specify for a given
culture, given by the country and the language and specified by a
ISO member body, a POSIX locale that is directed towards this, so
that users can refer to this locale and obtain consistent
behaviour across the hardware and software platforms conforming
to this locale. It is expected that many national standardisation
organisations will make national standards on their locales,
which then can be used also for procurement.

The national locale will in most cases build on already existing
national standards, for example on formatting and collation, but
will sometimes reflect customary specifications, for example for
date and time there often does not exist an adequate national
standard.

6. Concept of National Profiles

POSIX is a platform of Open System Environment (OSE), and APE
(Application Environment Profile) is a set of parameters and the
selection of options for the base standards included in OSE to
support the execution of application programs for a given
application field. It includes the parameters and option
selections for the relevant base standards such as the platform
standards like POSIX and application specific standards like GKS,
SQL and so on.

A National Profile for a specific cultural region or a nation is
a set of parameters and option selections for several base
standards like POSIX. These standards may be National Standards
like JIS X0208, and they may be extensions of international
standards. National Profile cannot avoid such non-international
standards because it should specify the local cultural aspects.

Application Environment Profile and National Profile may be based
on National Standards, and therefore it is necessary to
coordinate in defining the parameters and option selections from
the view point of international harmonization to support
international application portability and interoperability.

Granting this fact, there are several levels of conformance both
for a given POSIX application environment profile and a given
POSIX National Profile as follows:

For Application Environment Profile:

(1) Strictly Conforming POSIX Application for POSIX AEP

An application that can be executed for any parameters and
options for POSIX

(2) ISO/IEC Conforming POSIX Application for POSIX AEP

An application that requires only specific POSIX related
parameters and options.

(3) ISO/IEC Conforming POSIX Application using Extensions for
POSIX AEP

An application that requires not only specific POSIX related
parameters and options but also other ISO/IEC standards and their
international profiles.

For POSIX National Profile:

(1) National Body Conforming POSIX Application for POSIX NP

An application that requires only the POSIX related parameters
and options defined in POSIX National Profile.

(2) National Body Conforming POSIX Application using Extensions
for POSIX NP

An application that requires POSIX related parameters and options
defined in POSIX National Profile, national profiles for other
ISO/IEC standards, and national body standards.

6.1 The relationship to base standards

Base standards specify procedures and formats that facilitate the
development of internationally portable applications across many
countries/regions. They may provide mechanisms for supporting
language/cultural dependent (locale specific) aspects, hopefully
in a locale-independent way as much as possible.

National profiles promote applicability of the base standards to
specific countries/regions by defining how to use mechanisms
specified in the base standards for a specific country/region
with appropriate choice/value-setting of options/parameters.
National profiles may also specify additional standards which are
required for locale specific features support.

National profiles shall not contradict base standards but shall
make specific choices where options and ranges of values are
available. The choice of the base standard options should be
restricted so as to maximize the application portability across
National profiles, consistent with achieving the objectives of
the National profiles.

6.2 The relationship to Registration Authority

Some objects specified in National Profile shall be administered
and registered to keep identification and to avoid conflict of
values or names adopted by each of the countries.

The administration and registration of such objects may be
performed by Registration Authorities ,e.g.,ISO/IEC/JTC1
SC22/WG15, or an appropriate organization authorized by
ISO/IEC/JTC1, with the procedure recognized and agreed
internationally.

The following objects specified in National Profile should be
registered and maintained by Registration Authorities.

(a) locale definitions and their names
(b) symbolic character names
(c) coded character set and their names
(d) character class names

6.3 Principles of National Profile Content

6.3.1 General Principles

General Principles for a Profile specified in ISO/IEC/TR 10000-1,
subclause 6.3 are applied to a POSIX National Profile.

6.3.2 Principles of National Profile Content

A National Profile places a set of requirements which are useful
in maximizing application's portability for a specific
country/region. It does not specify all of the functionalities of
a system, but only that part relevant to the function being used
for locale-specific operation.

The content of a National Profile shall be specified in a coded
character set independent way where it's possible. When some
requirements are recognized locale-specific but no clear
indication can be made by a National Profile, it may include an
informative guidance to implementors.

6.3.3 Main elements of a National Profile Definition

The definition of a National Profile shall comprise the following
elements:

(a) a definition of the scope of the countries/regions for which
the National Profile is defined, and of its purpose;

(b) normative reference to base standards, including precise
identification of the actual texts of the base standards being
used and of any approved amendments and technical corrigenda
(errata), conformance to which is identified as potentially
having an impact on achieving portability using the National Profile;

(c) normative and informative reference to any other relevant
source documents, including National Body standard;

(d) specification of the application or the function of each
referenced base standard, covering recommendations on the choice
of classes or subsets, and on the selection of options, ranges
of parameter values, etc.;

(e) specification of the locale information of each referenced
base standard;

(f) a statement defining the requirements to be observed by
systems claiming conformance to the National Profile.

6.4 The meaning of conformance to a National Profile

The concepts of Implementation Conformance and Application
Conformance are incorporated in the concept of National Profiles.
These conformances which are defined in a National Profile are
applied to only an application platform, for interoperability and
for portability of applications and data. A real system is said
to exhibit conformance if it compiles with the requirements of
applicable POSIX standards.

A National Profile shall address the following two topics:

(a) Implementation Conformance requirements (details as given in
6.6);

(b) Application Conformance requirements (details as given in
6.7);

These requirements are stated in a POSIX National Profile.

In order to conform to a National Profile, a system shall perform
correctly all the capabilities defined in the POSIX as mandatory
and also any options of the POSIX which it claims to include.
Conformance to a base standard in this context is conformance to
a particular identified publication of a referenced base
standard.

A National Profile shall be defined in such a way that testing of
its implementation can be carried out in the most complete way
possible being given the available testing methodologies.

6.5 Conformance requirements of POSIX National Profiles

(to be completed)

6.6 Implementation Conformance

[ NOTE: The two chapters "Static Conformance" and "Dynamic
Conformance" are changed into "Implementation Conformance" and
"Application Conformance". - 1991-10-18 ]

6.6.1 General

The choices of interfaces and functional behaviour made in a
National Profile's implementation conformance requirements are
specific to that National Profile and provide added facilities to
the base standards.

The choices are not, therefore, arbitrary but need to be
consistent with the purpose of the National Profile and
consistent across the base standards referenced by it.

In order to avoid ambiguity between the National Profiles and the
base standards, the implementation conformance requirements of a
National Profile shall be specified, where possible, by reference
to the conformance requirements of the referenced base standards.

6.6.2 Requirements

All systems claiming conformance to a National Profile shall
support the required interface and functionality defined in the
National Profile. The system may provide additional functions or
facilities not required by the National Profile.

6.7 POSIX Application Conformance for National Profiles

All POSIX applications claiming conformance to the National
Profile shall use only language-dependent services for one or
more of the Language Options defined in the National Profile and
the facilities provided by the National Profile and referenced
base standards, and shall fall within one of the following
categories:

6.7.1 <National Body> Conforming POSIX Application

A <National Body> Conforming POSIX Application requires only the
parameters and options defined in POSIX National Profile for the
said National Body. Such an application shall include a statement
of conformance that documents all options and limit dependencies,
and all other <National Body> standards used.

6.7.2 <National Body> Conforming POSIX Application Using
Extensions

A <National Body> Conforming POSIX Application Using Extensions
is an application that requires not only the parameters and
options defined in POSIX National Profile but also other ISO/IEC
standards and their National Profiles and several National
Standards for the said National Body. Such an application shall
fully document its requirements for these extended facilities, in
addition to the documentation required of a <National Body>
Conforming POSIX Application.

(to be completed)

7. Contents of National Profile

POSIX National Profile shall have the following structure.

1. General

1.1 Scope The scope of the National Profile shall be described.
Provision of this section is mandatory.

1.2 Normative Reference

The standards which are referred by the National Profile shall be
listed. Provision of this section is mandatory.

1.3 Objectives

The objectives of the National Profile shall be described.
Provision of this section is mandatory.

1.4 Conformance

1.4.1 Levels of conformance

If the National body enacts some levels of conformance, the
levels shall be specified. Provision of this section is
mandatory.

1.4.2 System conformance

The requirements to the National body conforming implementation
shall be specified. Provision of this section is mandatory.

1.4.3 Application conformance

The requirements to the National body conforming application
shall be specified. Provision of this section is mandatory.

2. Registry

The names which must not conflict with other National Profile
shall be listed. The names described here shall be registered to
ISO, when official registration mechanism is established.
Provision of this section is mandatory.

2.1 Locale names

The name of locales which are specified in the National Profile.
Provision of this section is mandatory.

2.2 Symbolic name of characters

The list of extended character's symbolic names or the naming
conventions for symbolic name of extended characters shall be
specified. Provision of this section is mandatory.

2.3 Name of coded character sets

The name of coded character sets which are referred by the
National Profile shall be listed. The names may be used for code
conversion utilities/functions, also. Provision of this section
is mandatory.

2.4 Character classes

If the National body specifies extra character class in LC_CTYPE
category, the names and descriptions shall be specified. This
section is optional.

2.5 Environment variables

If the National body specifies environment variables which are
not specified in POSIX standard, name of the environment
variables and its descriptions shall be specified. This section
is optional.

2.6 Others

3. Parameters

3.1 POSIX

The range of POSIX related parameters which are allowed by the
National Profile shall be specified. Provision of this section is
mandatory.

3.1.1 Charmap

The contents of Charmaps shall be specified. Provision of this
section is mandatory.

3.1.2 Locale definition
The contents of locale definitions shall be specified. Provision
of this section is mandatory.

3.1.3 System parameter

The range of values of following system parameter e.g.
POSIX_NO_TRANC, NAME_MAX, and NAME_MAX shall be specified.
Provision of this section is mandatory.

3.2 C Language

The range of C Language related parameters which are allowed by
the National Profile shall be specified, e.g. CHAR_BIT. Every
National Profile shall provide this section. Provision of this
section is mandatory.

4. Options

Options which are required to be implemented shall be specified.

4.1 POSIX

The required optional facilities which are related to POSIX
standard shall be listed, e.g. charmap option of localedef
utility. Provision of this section is mandatory.

4.2 C Language

The required optional facilities which are related to C Language
standard shall be listed, e.g. ISO 9899 addendum 1 (with
Multibyte Support Extension ). Provision of this section is
mandatory.

Note from editor: the MSE is always needed to be conforming to IS
9899?

5. Error/exception handling

If the National body specifies the error/exception handling of
some functions, the methods shall be specified. This section is
optional.

6. Extensions

6.1 POSIX Extension

If the National body requires implementation of any enhanced
facility, e.g. addition of environment variable, function,
utility and option parameter of utility, the enhanced facilities
shall be specified. Provision of this section is mandatory.

6.2 Other Standards

If the National body requires implementation of any standards
other than POSIX standard to the National body conforming
systems, the standards shall be listed. Provision of this section
is mandatory.

7. Data exchange

If the National body specifies any formats and mechanism, or
requires implementation of standards, the facilities shall be
specified. This section is optional.

7.1 Archive file format

Format of archive files. e.g. tar and cpio, shall be specified.

7.2 Identification of coded character set

The mechanism to identify coded character sets in a file shall be
specified.

7.3 Protocols

Communication protocols which the National body conforming
implementation must be implemented shall be listed.

7.4 Profile for OSI

The profile which the National body specified for OSI shall be
referred.

7.4 Media

If the National body has requirements on media which is used for
data exchange, the requirements shall be specified.

Annex A Informative reference

If the National body has any recommended parameters, options and
extensions, though not required for the profile conformance,
these features should be listed in this section. This section is
optional.

Annex B Notes and Rationale

(to be completed)

8. Concept and consideration of National Locale

8.1 Concept of POSIX national locale and charmap

The benefits of a national locale is exemplified with the Danish
example locale included in ISO/IEC 9945-2.

Work with the Danish locale produced a quite elaborate locale
defined for a lot of character sets, including parts of ISO/IEC
10646-1 and almost all of the ISO 2375 registry (done by JIS),
and some 60 vendor specific character sets, in all about 140
character sets. The locale is available electronically together
with the 140 charmaps.

Thus with just one specification of a national locale, uniform
collating for many character sets is defined - the characters
will always come in the same sequence regardless of which
character set employed. Also there is just one definition of date
format and the other cultural items to be done, and that
specification is then
valid for many character sets.

8.2 Contents of national locale

In creating a national locale, many things must be considered.
Some data may be easier determined than others. For each locale
category there is given some recommendations below.

8.3 consideration on character classification and transformation

The character classification section of the locale is normally
straightforward; an "A" is considered a letter in about all
languages and is mapped to an "a" when the lower case letter
should be found. Normally the LC_CTYPE definition in POSIX.2
Annex G can be used without change.

8.4 consideration on numeric format

The data here is normally easy to determine, the ISO standard
being using comma as decimal punctuation, and period as the
thousands delimiter.

8.5 consideration on monetary format

The monetary formats may be a bit difficult to specify. The ISO
4217 currency code must be specified for the international
format. the local specification may be a choice, but there may be
guidelines in national orthography specifications.

8.6 consideration on date-time format

There may be problems with specifying the date format, including
time zone names, which may not be well defined. In the Danish
case we consulted as many official sources as possible, including
orthography definitions and numeric rendering standards. One
thing we changed late in the process was to write day names with
an initial small letter - which was in accordance with the Danish
orthography dictionary.

8.7 Consideration on Collating sequence

The Danish collating sequence was hard to define. There are many
levels of complication for collation. For example the telephone
level, with Mc the same as Mac, numbers spelled out, certain
words like "the" ignored or moved to the end etc. Actually Danish
has some rules like that, also in the official collating standard
DS 377 from 1980. Another level is the phonetic level - soundex,
which is a little less complicated. A third level is transcripted
characters, as the librarians use when they see a
greek alpha and order that as a normal "a".

The level that Danish Standards have decided on for its POSIX.2
locale is the systems interface level. The collating order should
be usable in POSIX systems tools like ls and sort. A requirement
has been that it is deterministic; if two strings are different
they will also differ when compared. Another issue has been
efficiency. POSIX has provisions for substituting "Mc" with
"Mac", but this is considered too inefficient and avoided in the
Danish example national locale.

The problem of pronunciation and transliteration has not been
addressed. Instead it had been considered adequate just to look
at the characters themselves - only considering characters at the
systems level - and not sounds. The level provided by the Danish
locale is a service for comparing strings which are intended for
a replacement to the standard strcmp() etc routines, just a
little more intelligent and adhering to Danish collating rules.

We have however put as much intelligence in there as possible at
this level. The two letters <a><a> are sorted as the single
letter <aa> (A WITH RING), but the <aa> single letter is before
<a><a>
in homonyms. The 4 level scheme of the Canadian-French sorting is
being used, with the four levels being letter, accent, case and
special character. This was actually also specified in the DS
377. In cause of harmonization we decided to use the reverse
sorting for the accents as the Canadians do; the natural choice
may have been forward sorting here too, but as most of these
words would be of French origin anyway, we decided to follow
their rules. For <ss> we implemented what we think is the German
rule, as seen in several German dictionaries. <ss> is ordered as
<s><s> but before it in homonyms.

For the accents there was some indicated rules in the DS 377 and
in the official Danish orthography dictionary, but it was far
from complete. Then the accent sequence in several ISO standards
were used, when there were no clear Danish rule. About 25 accents
have been ordered.

For the non-latin scripts we decided not to transcribe. This also
allows us to use the native collation order for these scripts,
like alpha, beta, gamma for Greek and a be ve ghe for Cyrillic.
Accented Greek and Cyrillic letters and ligatures have been put
into the right places.

The sequence of the scripts was taken as in the ISO 10646 draft.
That should solve the question on which scripts should come
before others. Current scripts addressed are: Latin, Greek,
Cyrillic, Hebrew, Arabic, Kana and special characters.
Ideographic characters are in the works.

Together with the Danish collating sequence a more general
collating sequence was specified. This collating sequence could
be used as a reference sequence, as mentioned below, and it
should produce an order which is compliant with at least English,
French, German, Italian, Dutch, Portuguese, Greek, Russian,
Hebrew and Arabic.

We recommend that similar decisions are taken when producing a
new collating sequence.

8.8 consideration on messages

The messages category are a hook to provide real message service
in the applications, and only yes/no is considered by the POSIX
standard.

For the yes/no it is recommended that only the first letter of
the answer in the natural language is required, and also to allow
the English form "Yes"/"No", and the more cultural neutral 0/1 as
answers.

9. Using existing locale

Much work is done on locales, and making them quite general. WG15
has on its programme of work to harmonize locales as far as it is
feasible. The POSIX.2 standard introduced a copy command for all
sections of the locale. This is good for many purposes and it
ensures that two locales are equivalent for this category. A
further step in building on previous art is proposed here.

The collating sequences vary a bit from country to country, but
generally much of the collating sequence is the same. For
instance the Danish sequence is quite equal to the German,
English or French, but for about a dozen letters it differs. The
same can be said for Swedish or Spanish: generally the latin
collating sequence is the same, but a few characters are
different.

With the advent of the quite general coded character set
independent locales like the example Danish in POSIX.2 annex G,
it would be convenient if the few differences could be specified
just as changes to an existing one. This would also improve the
overview of what the changes really are. Therefore it is
recommended to use the following replace-after construct in the
LC_COLLATE section of the locale file format for producing new
national locales.

9.1 WG15 locale collection

WG15 has been collecting POSIX locales for a number of years, and
about 40 locales and 100 charmaps are available now.

9.2 replace-after technique

See description in CEN ENV 12005 registration standard

Annex A. Locale related descriptions in POSIX

We have an extract in source form from the POSIX editor, with
permissions to reproduce it. It is not reproduced here due to
considerations for the rain forests, as it is about 70 pages. It
is an extract of POSIX.2 on the first sections including 2.5
locales, and the 4.13 date format.

Annex B. Symbolic character names

As in POSIX.2 annex G. As it is about 40 pages, it is not
reproduced here.

Annex C. Convenient tools for producing national locale

A script has been written in the "awk" language defined in
POSIX.2 to implement the "replace-after" construct.

BEGIN {
   comment = "%";
   back[0]= follow[0] = 0
   }

/LC_COLLATE/ { coll=1 }

/END LC_COLLATE/ { coll=0; for (lnr= 1; lnr; lnr= follow[lnr]) print cont[lnr] }

{ if (coll == 0) print $0 ;
 else {
   if ($1 == "copy") {
	file = $2
	while (getline < file )
	if ( $1 == "LC_COLLATE" ) copy_lc = 1
	else if ( $1 == "END"
	&& $2 == "LC_COLLATE" ) copy_lc =0
	else if (copy_lc) {
	     lnr++
	     follow[lnr-1] = lnr
	     back [ lnr ] = lnr-1
	     cont[lnr] = $0
	     symb[ $1 ] = lnr
	}
	close (file )
   }
else if ($1 == "replace-after") { ra=1 ; after = symb [ $2 ] } else if ($1 == "replace-end") ra = 0 else {
lnr++ if (ra) follow [ lnr ] = follow [ after ] if (ra) back [ follow [ after ] ] = lnr follow[after] = lnr back [ lnr ] = after cont[lnr] = $0 if ( ra && $1 != comment && $1 != "" ) { old = symb [ $1 ] follow [ back [ old ] ] = follow [ old ] back [ follow [ old ] ] = back [ old ] symb[ $1 ] = lnr } after = lnr } } }

Annex D. Examples of National Profile - Japan

[It is ready to include an example of Japanese National Profile
here. Since the text is so large, the example is intentionally
omitted from this review version of document. Please contact
Japanese National Body for the details of Japanese National
Profile.]

Annex E. Examples of National Locale - Denmark

[An example of Denmark National Locale will be provided here.]

Annex F. Use of ISO/IEC 10646 in POSIX standards

F.1. Introduction and scope

For servicing the widest possible audience, POSIX standards
should be able to handle the most encompassing character set, and
the best candidate for this is the new ISO/IEC 10646-1:1993
standard.

WG15RIN was asked by WG15 to give guidance on how to utilize UCS
in POSIX standards, also as requested by SC22 policies. RIN
believes this to be of use in many areas such as global or-
ganisations interested in just one character set organisation-
wide, in European government institutions, in eastern Asia and
many other places.

ISO/IEC 10646-1:1993, the Universal Multiple-Octet Coded
Character Set (UCS), provides the capability to encode multi-
script text within a single coded character set.

However, because UCS is designed to use all code points avail-
able, null bytes and the code values of the other ISO/IEC
646:1991 IRV (also known as ASCII) characters, including the code
value of the ISO 646 solidus ("/") character, are not protected.
This makes the UCS character encoding incompatible with many
existing ISO 646 based POSIX operating system implementations.
That UCS also uses code points also used for ISO 6429 control
characters introduces further problems for communication and
application software. From these problems it was clear that a
POSIX internal encoding was required.

This paper gives first a survey of the possible coded represen-
tation forms of UCS and UCS transformation formats and their
respective characteristics. Then each of the handling areas (data
storage, file names, internal processing, communications, inter-
process communication) of the POSIX operation is analyzed.
Finally a recommendation is given for POSIX standards.

JTC1/SC22/WG20 is revising TR 10176 with guidelines for support
of IS 10646, and there may be further recommendations in this
work of relevance to POSIX. The work is out for CD ballot ending
in May 1996.

F.2. UCS coded representation forms and UCS transformation
formats

F.2.1. POSIX internal encoding

For the POSIX internal encoding UTF-8 was considered suitable.

The objective of UTF-8 is to provide an UCS transformation format
which also meets the requirement of being usable on historical
POSIX operating system file systems in a non-disruptive manner.

The UTF-8 transformation format represents both UCS-2 and UCS-4
in a compatible format using multiple-octet coded characters of
lengths 1, 2, 3, 4, 5, and 6 octets:

Bits Hex Min Hex Max Byte Sequence in Binary
1 7 00000000 0000007F 0vvvvvvv
2 11 00000080 000007FF 110vvvvv 10vvvvvv
3 16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv
4 21 00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
5 26 00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
6 31 04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

The UCS value is the concatenation of the v-bits in the multiple-
octet encoding, where the v-bits are the 0's and 1's that
constitute the UCS value.

Thus UTF-8 has the capability of handling existing ISO 646 files
without change, and all codes in the ISO 646 range (having an
octet value in the range 0-127) can be safely assumed to be
representing the normal ISO 646 character.

F.2.2. Other forms of IS 10646

IS 10646 has two forms: UCS-2 and UCS-4, a 16-bit and 31-bit
coded representation of the character set, respectively. It is
clear from work in JTC1/SC2/WG2 that IS 10646 may have more
characters than what is representable in 64 k, so we are here
considering the general case of UCS-4.

ISO/IEC 10646-1:1993 had a transformation format UTF-1, which was
informative, and it has now been removed from the standard by the
amendment ISO/IEC 10646-1 AM4:1996. UTF-8 is aimed at the same
purpose, and has more capability. UTF-8 has been approved as part
of UCS via the amendment ISO/IEC 10646-1 AM2:1996.

Another Transformation Format of IS 10646, UTF-16, has also been
approved, as ISO/IEC 10646-1 AM1:1996, but this cannot
accommodate all of IS 10646 (it accommodates about 1 million
characters) and it will employ techniques like in UTF-8 with
ranges indicating how many octets are required to form one
character, without the added functionality of being backwards
compatible with ISO 646 and ISO 2022 encodings (which is a func-
tionality of UTF-8).

The most general of the above encodings of IS 10646, is then UCS-
4. It has the property of being constant-width, which may be
easier to handle than the multiple-octet UTF-8. As a file and as
an interchange code it has the problematic property of using
codes in conflict with ISO 646, ISO 2022 and ISO 6429, dependency
on byte-ordering (little-endian vs big-endian) of the hosting
machine architecture, and also of using 4 octets per character.
Here UTF-8 is clearly superior for POSIX internal encoding. UCS-4
may have advantages as an internal processing code, and as an
inter-process encoding, for C language widechar-like encodings,
but with the new ISO C language amendment with full support for
multibyte coded character sets, that advantage may be diminis-
hing. UTF-8 is here as well defined and capable of representing
all IS 10646 characters, and given its strengths in other areas
it may well be chosen also for the internal processing, and
inter-process communication. Internal processing is not in the
scope of POSIX interfaces, anyway.

F.2.3. UCS levelling

IS 10646 has 3 levels of support, level 1 without combining
characters, level 2 with combining characters in some scripts,
and level 3 with unrestricted use of combing characters. SC22 has
by resolutions from the 1993 Paris plenary recommended that all
SC22 standards be enabled for level 3 data, but that the
semantics of combining characters not be addressed currently.
Thus there is not specific SC22 request for further support of
level 2 and 3, but eventually there could be a need for support
of these levels. SC22 also recommended use of IS 10646
terminology thruout SC22 standards, and this may need an
alignment of current POSIX work, though it is the belief that
current POSIX work is already well aligned with IS 10646 with
respect to terminology.

F.3. Problems in POSIX handling of UCS

There are several challenges presented by UCS which must be dealt
with by present implementations of the POSIX operating system.

F.3.1. Data storage

The most significant of these challenges is the encoding scheme
used by UCS. More precisely, the challenge is the marrying of the
UCS standard with existing programming languages and existing
operating systems. Prominent among the operating system UCS
handling concerns is the representation of contents of data in
files. An underlying assumption is that there is an absolute re-
quirement to maintain the existing operating system software
investments while at the same time taking advantage of the use
the large number of characters provided by UCS.

For UTF-8 the representation of ISO 646 data is exactly the same,
and for ISO/IEC 8859 parts right hand side characters will need
two octets for representation. For idiographic characters in the
BMP the representation will be three octets. This does not give a
dramatically changed requirement for what is currently consumed
for data storage.

F.3.2. File names and internal processing

The UTF-8 transformation format was originally conceived as a
file system safe transformation format of UCS to allow his-
torically ISO 646 based POSIX operating systems to cope with
representation and handling in file names of the large number of
characters that are possible to be encoded by UCS. In addition,
from an internal operating system (kernel) viewpoint this hand-
ling of a large character set is only a problem for handling file
names, which are only analyzed for the solidus ("/") delimiter to
parse a name into filename components. As UTF-8 can represent the
full encoding of IS 10646 and is backwards compatible with ISO
646, UTF-8 handling is sufficient for POSIX internal encoding.

F.3.3. Communications

Current ISO POSIX standards do not address communication, but as
ISO 6429 control characters are often used in communication, and
the UTF-1 transformation format was originally created for
avoiding control character problems in communication, UTF-1 could
be the choice. As UTF-1 is being removed from UCS and UTF-8
introduced, having the same capabilities with respect to control
character problem solving, UTF-8 should be the recommended choice
in POSIX communication interfaces.

F.3.4. Interprocess communication

Communication between POSIX processes would probably use internal
data formats, for example integers should be transferred in
binary form. As it could be recommended that programs internally
use a C language widechar style encoding of characters, a UCS-2
or UCS-4 format could be recommended.

On the other hand interprocess communication is often across
networks and between heterogeneous systems, therefore since UCS-2
and UCS-4 are dependent on machine architecture, UTF-8 may be the
preferred candidate. UTF-8 would in many cases also be less
space-consuming, which may be a significant plus when using low-
capacity network lines.

F.4. Recommendation

According to the above analysis, UTF-8 is the best candidate for
POSIX internal encoding of UCS in the areas of data storage, file
names and internal operating system (kernel) processing, and com-
munication, where otherwise UCS-2 or UCS-4 would have been used
for coded data. Furthermore UTF-8 is a good candidate for UCS
representation in interprocess communication.

It is thus the recommendation of WG15RIN to use the UTF-8
transformation format whenever UCS is used in POSIX interfaces.

As POSIX interfaces in principle should be coded character set
independent, there is no general need to require the use of UTF-8
in POSIX standards, but guidance could be given in rationales.

A specific recommendation is that the portable archive exchange
utility "pax" be revised to be able to specifically use UTF-8 for
file names, and the use of UTF-8 should be clearly identified.

F.5. Consequences

X/Open has raised a number of problems with use of ISO/IEC 10646
in POSIX in the document WG15 N621. With the preceding
recommendation the problems can be addressed as follows:

- In UTF-8 the repertoire of ASCII is encoded as ASCII (ISO 646 IRV).

- We know no codesets with control characters encoded in the
full single octet range 0 thru 7f, but many use 0 thru 1f hex
and 7f, and some the range 80 thru 9f. UTF-8 has reserved
these octet ranges for control characters.

- zero value octets and octets equating '/' only appear in UTF-
8 as representations of the NUL and '/' character
respectively.

- "combining characters" need not have special processing as
per SC22 resolutions, except for possibly a width
specification in a locale.

- According to the ISO/IEC 10646 standard there is no
equivalences prescribed between sequences of characters with
combining characters and some "precomposed" characters, and
the SC22 plenary recommendation is that there need not be
special handling of this.

- It should not be needed to process composite sequences in a
special way.