JTC1/SC22/WG15 N675
                                                        WG15 RIN  SD-3
                                                        1996-Jun-25
ISO/IEC JTC1/SC22/WG15 RIN Issues List -:- FINAL

Source: WG15 and RIN

Status: Approved by the joint WG15 and RIN meeting of 20/23 May 1996

        Rationale:
        At the WG15 RIN meeting in Twente, 11-12 May 1995, it was
        decided to remove the Agenda Items traditionally listed under
        3.1 to entries in this document, the WG15 RIN Issues List.  This
        was because the status and raison d'etre of these items had been
        obscured over time, and the debate on each item was being
        revisited at each meeting.
        A triumvirate of David Cannon (UK), Keld Simonsen (DK) and
        George Kriger (Ca) was charged with exhuming the argument and
        status of each item from past RIN and WG15 papers and minutes,
        and encapsulating them here.
        Strangely, RIN has been here before:
From WG15 RIN Stockholm, November 1991:
        Keld Simonsen suggested that the group should have an issues log.
        There was some discussion of the function of such a log, where it
        should appear, and of whether the group has any issues suitable
        for such a log.
From WG15 RIN Annapolis, October 1993:
        Canada proposes to remove a swathe of items under RIN Agenda
        item 3.1, and focus the agenda more closely on the papers
        submitted.  The UK, US agreed.  It was intended that items which
        were still relevant but had no immediate input, should be moved
        to an issues list.  The issues list to be visited and reviewed
        at each meeting.
        ...we made it, eventually...
        WG15, at its May 1996 meeting, debated the then current version
        of this document, and resolved that it be updated in line with
        that discussion.  The updated document would be preserved as a
        WG15 paper, and any outstanding issues would be copied into the
        WG15 Issues List to ensure they remained under review.
        Executive Summary:
                Closed: The Issue is closed in RIN - not necessarily
                        everywhere else.  MBs or WG15 may still regard
                        the Issue as active.  This is the RIN Issues
                        list - no-one else's.  Closed in RIN means that
                        RIN has no further legitimate interest in the
                        Issue.  WG15, at its discretion, may request RIN
                        re-open it.
                Open:   The Issue is open in RIN - RIN regards the Issue
                        as receiving its active attention.  WG15 has
                        asked RIN to consider the issue, and RIN has not
                        yet reached conclusion on the Issue.  Upon
                        conclusion RIN shall advise WG15 of its
                        recommendations.
From WG15 RIN Orlando, October 1995:
        RIN adopts the following process with regard to this document,
        SD-3:
                . Upon Closure of an Issue, RIN shall advise WG15 of the
                  status of the Issue by transferring to it the complete
                  section of this document, RIN SD-3, which concerns the
                  newly Closed Issue.
                . In order to advise WG15 of RIN's Open Issues which are
                  under active consideration, RIN shall advise WG15's May 
                  1996 meeting of those Issues by copying to WG15 a summary 
                  of those Open Issues. The summary shall consist of the 
                  'Title' of the Issue, together with the 'Keywords', 
                  'Description', 'Originator', 'Alternatives', 'Documents',
                  'Solution' and 'Status' sections from this document.

Index:

0. Extended Identifiers in 1003.2b                    [Closed]
1. localedef iswctype()                               [Closed] 
2. localedef user-specified collation weight names    [Open] 
3. localedef "substitute"                             [Closed] 
4. localedef "reorder-after"                          [Closed] 
5. removal of NUL special handling                    [Closed] 
6. full support for state-dependent charsets          [Closed] 
7. charmap-based charset conversion                   [Closed] 
8. "file" user-specified recognition algorithm        [Closed] 
9. "pax" extended character set support               [Closed] 
10. C MSE widechar support                            [Closed] 
11. Invariant ISO 646 support                         [Closed] 
12. charsymb/CHARIDS                                  [Closed] 
13. regexps                                           [Closed]
14. Canadian Collation Weight minimum levels          [Closed] 
15. Japanese proposal for LC_CTYPE extension          [Open] 
16. Character concepts in POSIX                       [Closed] 
17. Range expression                                  [Open]

0. Title: Extended Identifiers in 1003.2b [Closed]

Keywords:
                characterset, lex, awk, shell, scripts, small, language
Description:
                A proposal to permit a more extensive set of characters
                in the small languages supported by the POSIX Shell and
                Utilities standards.
Originator:
                WG20, DK
Alternatives:
                To remain with the status quo.
Documents:
    RIN N047    A representation for the shell in ISO 646
        N264    SC22/WG20 N085: Extended identifiers
        N283    SC22/WG15 liaison statement to WG20
        N294    P1003.2b D4 (Shell & Utilities Amd)
        N417    WG20 liaison report to WG15
        N420    Extended characterset in Posix identifiers
        N515    US Action Item Report
        N532    WG15 minutes and resolutions, Oct 1994
        AN12    WG20 current and intended work (WG20 N223)
Solution:
                WG15 and the US development body have accepted the
                proposal contained in N420.
Status:
                Issue in RIN is closed, the proposal in N420 having been
                accepted.  WG15 is requested to (re-)endorse N420.
History:
        N264 was the first relevant identifiable WG15 paper input on
        this subject.
From WG15 Hamilton, May 1992:
        The plenary considered N264 and prepared the following liaison
        statement to WG20 as WG15 N283:
        WG15 has reviewed WG20 document N085 entitled "Extended
        Identifiers", which encouraged discussion of its proposal, and
        offers the following comments:
        1)  The POSIX Shell and Utilities standard (DIS 9945-2) provides
            facilities for locale-dependent specifications of character
            attributes that optionally are adjustable by the user or
            application.
            WG15 recognises that allowing characters outside the POSIX
            portable character set is a feature that directly impacts
            portability, but it is a desirable localisation facility in
            some environments.
        2)  WG15 believes that any extensions to programming language
            identifier requirements should be accomplished within the
            framework described in 1) above.
        3)  9945-2 contains several "small languages", such as shell and
            awk, that WG15 intends to enhance in this area.  It believes
            that the proper approach would be to allow characters in
            classification "alpha" in the current locale whereever the
            current specifications allow alphabetics from the portable
            character set (equivalent to the ISO 646 repertoire).  (The
            "alpha" classification may include syllabic and ideographic
            characters, and is named "alpha" for historic reasons.)
            Because of differing requirements in the various languages,
            WG15 considers any additional degree of flexibility to be
            infeasible across all languages.
        WG15 plenary resolved to pass the above statement through its
        liaison to WG20:
        RESOLUTION 201. LIAISON STATEMENT TO WG20
        WG15 instructs its liaison to WG20 to transmit WG15 N283 as a
        WG15 liaison statement to WG20.
From WG15 Annapolis, October 1993:
        4.4 Liaison statements & actions related thereto
                                        [N417, AN12, N420, N421, N422]
        N420 is intended to be an amendment to the Posix 'small'
        languages.  It proposes an extended characterset for lex, awk,
        shell scripts, and as such might break them as they are
        currently specified.
        Re N417 point 7:  Keld maintains that N420 is implied by areas
                        of work defined in AN12 (WG20 N223).  This is
                        the one which may break things.  KS suggests
                        that this is solved via the locales mechanism.
                        No action is required... ???
        22.41          additional utilities   {2b}    CD reg:
                                                        [N416, N420]
        Proposed action on the US to take these on board.  Nl accepts
        N420 proposal, but regards the N416 document as representing old
        technology superceded by ISO 10646.
        The original action was on DK to provide these papers as
        additional information to the US.  N416 and N420 will be passed
        to the US for comment.
        [The action item was carried forward to the May 1994 meeting]
From WG15 RIN Annapolis, October 1993:
        Resolution RIN 9310-04:  Internationalisation Concerns in 1003.2b
        WG15 RIN notes that the new Annex H to 9945-2 addresses the
        concerns of the international community, specifically of Japan
        and of Denmark.  9945-2 Annex H indicates that input is required
        from WG15 MBs on a number of specific issues and therefore WG15
        RIN requests an indication of the latest dates by which such
        input is required by the US development body, in order to
        maintain synchronisation of the ISO/IEC and IEEE work.
        ...after input from Arnie Powell it was decided to convert the
        Resolution on Annex H to an action item on the US RIN Rapporteur
        in order to achieve it in a more timely fashion.
From WG15 Tokyo, May 1994:
        9405-52 United States:  Review N416 and N420 and forward them to
        PASC for consideration.
From WG15 Vancouver, October 1994:
        The 9405-52 action was noted as Complete, the response being
        included in N515, the US action Item report:
        re N420...The languages specified by POSIX.2 specify behaviour
        when identifier names are chosen from the portable character
        set.  We have not found anything to preclude an implementation
        from recognising extended characters as part of an identifier.
        However, an application making use of those extensions would be
        non-portable.
From 9945-2:1993 Annex H.1
                7: 2.5 Locale
     (1)  Provisions should be made to allow characters beyond those in
          the portable character set in user-supplied identifiers for the
          shell, awk, bc, lex, make, and yacc.  A proposal has been made
          by Denmark to extend the locale definition to specify the set of
          identifier characters for all programming languages.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 Copenhagen, May 1996:
   |    Extended identifiers work in real compilers, but not for the
   |    small languages of lex, awk, etc.  WG15 does not support the use
   |    of extended identifiers in these POSIX small languages.  The .2b
   |    WG cannot see how to do this in a way which allows locales to
   |    drive the lexical analysers of these utilities 'on the fly'.
   |    Action on MBs to bring forward any technical means to solve this
   |    implementation problem.   The Issue remains closed.

1. Title: localedef iswctype() [Closed]

Keywords:
                locale, localedef, iswctype()
Description:
                iswctype() determines whether the wide character c has
                the property p.  For example:
                        iswctype(c, wctype("lower"));
                where wctype("lower") returns a value of type p.
Originator:
                J
Alternatives:
                None
Documents:
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N294    P1003.2b D4 (Shell & Utilities Amd)
        N531    IEEE P1003.2b D10: Shell & Utility Extensions
        N602    RIN N158: Japanese Action Item report to WG15, October '95
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                The issue is Closed.  The 1003.2b document includes
                appropriate support for iswctype().
Status:
                Closed
History:
        N245 was the first relevant identifiable paper on this subject:
From WG15 Stockholm, November 1991:
        The Japanese MB comments on CD 9945-2, quoted from N245, raises
        an objection [@ O o 4 <ITSCJ.4>] relating to "...additional
        character classes suitable for classes beyond the current ANSI/C
        and/or Latin based character classes.  The current draft says
        that such additional character classes may be supported by
        implementation, but which is implementation defined.
        "Action: As the ISO/C Multibyte Support Extension (MSE) is going
        to provide a new function iswctype(), some corresponding
        enhancement of LC_CTYPE description file should be considered so
        that 'user/implemetation definable character classes' can be
        supported in the POSIX environments in the standard manner.
        "Japan will probably be able to cooperate with the POSIX.2
        developing member body (US - IEEE) on how to solve these issues."
        N281 contained the following disposition:
        We also believe that this functionality should be studied for
        inclusion in the POSIX.2b revision and the full international
        standard.  We are aware of efforts within X/Open to address this
        area and would like to take advantage of their developments.
        An action 9111-23 was devised to reformat the Japanese comments
        on 9945-2 to items in the WG15 Issues list.
From WG15 Hamilton, May 1992:
        At WG15 Hamilton, this was transformed into:
        9205-32: Japan to provide to the US Member Body proposals for
        areas identified in their 9945-2.2 comments #s 2, 3, 4, 10, 11,
        54, and 57 addressing resolution comments in N281.
From WG15 Reading, October 1992:
        Action 9205-32 was noted as Complete.  No document is cited, no
        action recommended.
        WG15 plenary considered N294, the P1003.2b Draft 4 document.
        This contained on Page 5 the following:
        2.5.2.1 LC_CTYPE Add the following keyword items between the
        items labeled blank and toupper:
        charclass   Define one or more locale-specific character class
                    names as strings separated by semicolons.  Each
                    named character class can then be defined
                    subsequently in the LC_CTYPE definition. ...
        charclass-name
                    Define characters to be classified as belonging to
                    the named locale-specific character class.  In the
                    POSIX Locale, the locale-specific named character
                    classes need not exist. ...
        This addition was adopted from XPG4 to satisfy the following
        requirement from ISO/IEC DIS 9945-2:1992 Annex H:
        (3) The LC_CTYPE (2.5.2.1) locale definition should be enhanced
        to allow user-specified additional character classes, similar in
        concept to the proposed C Standard {7} Multi-byte Support
        Extension (MSE) iswctype() function.
From WG15 RIN Reading, October 1992:
        RIN considered N088, a proposal for an LC_CTYPE extension to
        support additional character mappings.  There is no record of
        further action on this document.
From WG15 Vancouver, October 1994:
        N531, Draft 10 of P1003.2b, was made available and contains only
        minor changes to references in the above section.
From WG15 Copenhagen, May 1996:
   |    Closed, 1003.2b already includes support for this.

2. Title: localedef user-specified collation weight names [Open]

Keywords:
                localedef, collation, weight, LC_COLLATE
Description:
                A mechanism for the specification of named collation
                weights in the LC_COLLATE section of locales,
                particularly to support non-latin character scripts to
                manage a number of sorting algorithms.
Originator:
                J
Alternatives:
                None
Documents:
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N330    Japanese comments on Posix .2b/D4
    RIN N106    Japanese Proposal to POSIX 1003.2b
        N602    Japanese Action Item Report to WG15, October 1995
        N640r   US TAG N573, N587: AI 9510-14, Report on POSIX.2b Issues
Solution:
                None as yet.  The proposal has been accepted in
                principle.  The US development body has asked for
                specific wording to be supplied by Japan for inclusion
                in a revision to the standard.
Status:
                Open.  Awaiting input from the Japanese MB to 9945-2Amd2b.
History:

From WG15 Hamilton, May 1992:

        N245, the comments on CD 9945-2, and N281, the disposition of
        those comments, contained the Japanese MB objection <ITSCJ.30>
        relating to collation weight names; a similar later version
        (below) was recorded at the WG15 Reading meeting.  The proposed
        disposition of <ITSCJ.30> is contained in N281 as:
        We believe that this change, or something similar to accomplish
        the same objective, should be studied for inclusion in the
        POSIX.2b revision and the full international standard.
From WG15 Reading, October 1992:
        N330 contained the Japanese MB comments on POSIX.2b D4; they
        included:
        <ITSCJ.2b.9>  Sect 2.5.2.2.3  (LC_COLLATE)  PROPOSAL
        Problem:
        In most cases of ideographic characters, it is a requirement
        that a user be able to specify collation weights as he/she
        wants.  In case of Japanese characters (Kanji), for example,
        there are five possible collation weights for supporting
        Japanese SORT.  The five weights are On-yomi (psuedo-Chinese
        pronunciation), Kun-yomi (Japanese pronunciation, number of
        strokes, radical (components of Kanji), and Kanji character
        code.  There could be more weights.  The LC_COLLATE part of
        localedef specifications should allow a user to describe these
        weights and give names to the weights.  Any combinations of the
        defined weights should be able to be specified by the user at
        run-time.
        Proposal:
        LC_COLLATE extension for specifying weight name
        => 2.5.2.2.3 order start Keyword.  Add the following directive
        description and the Example.
            It is implementation defined whether the following optional
            directive shall be recognised.  If they are not supported,
            but present in a localedef source, they shall be ignored.
            name    specifies the name of a collation weight by a
                    string.  An order of weights may be specified by
                    using the name at run time.
                    The syntax for the name directive shall be:
                    "name =
            Example:
                    order_start
                    forward,name="kunyomi";forward,name="radical"
            If an operand has a name directive, the definition of the
            primary, secondary, or subsequent weights for the collation
            element may be different from the order of operands to the
            order_start keyword.
        => 2.5.3.2 Locale Grammar.  Modify the opt_word description as
        follows:
            opt_word            : 'forward' | 'backward' | 'position'
                                  | 'name' '=' weight_name
            weight_name         : '"' char_list '"'
        Rationale:
        User's requirements for character collation in Asia are diverse.
        Ideographic characters have several rules to sort such as by
        pronunciations, strokes, etc. and the combination of the rules
        are used for their sorting.  Those properties for a charcter
        such as pronunciation can be assigned as weights for a character
        element.  However, no standard primary weight, secondary weight
        and so on exists for the weights (properties).  The weight name
        extension for LC_COLLATE allows the order of multiple weights to
        be defined at run time in the different order than the order
        than the order of operands to order_start keyword.  To make the
        different order effective, the weight names can be specified in
        the setting of LC_COLLATE category.
                order_start forward,name="kunyomi";forward,name="radical"
        When a ja_JP.eucJP locale has the above definition in the
        LC_COLLATE part, the order of sorting rules can be specified as
        follows by using the weight names:
                LC_COLLATE = ja_JP.eucJP@weights=radical,kunyomi
        This means that the sort-rule "radical" is used as the primary
        weight and "kunyomi" is used as the secondary weight.
From WG15 RIN Heidelberg, May 1993:
        3.1.3 user-specified collation weight names based upon phonetic,
        character based(radical), or code based.  Dynamic based control
        of collation based upon sort key.  The ability to switch pointer
        dynamically to bring collation tables into correct sequence.
        Japanese delegation has submitted two written requests without
        supporting material.[?]  Next version would be submitted by June
        18, 1993.
From WG15 RIN Annapolis, October 1993:
        Action Item reports:
        The action list was lost.  The minutes of the previous meeting
        were scanned to recover as many action items as possible;  these
        were determined to be as follows:
        9305-01 Requirement for user-specified collation weights.
                MDR-02 contains the Japanese proposal on collation
                weights.  (Closed)
                MDR-02  ->    RIN N106: Japanese Proposal to POSIX 1003.2b
        3.1    I18N in POSIX.2b
        Specific actions were taken in Annex H to address Denmark and
        Japanese concerns for May 93 Heidelberg meeting.  Japan needs
        feedback for timeline to produce material for coordination with
        1003.2b  Resolution to be produced asking for timeline for
        national body contributions.  The rest of 3.1 [including N106]
        was postponed to the next meeting, due to lack of knowledge of
        the current status of .2b and lack of input papers received in
        time.
        9310-09 Lead Rapporteur:  distribute documents N105, N106, N109
                and N113 to the RIN mailing list together with a cover
                note indicating that these documents will be discussed
                at the next WG15 RIN meeting, May 1994, and also
                indicating which agenda items will be touched by the
                documents.
From WG15 RIN Vancouver, October 1994:
        9405-05 Member Bodies to review N105 (Japanese comments on .1a),
        N106 (Japanese comments on .2b), N109 (SC22/WG20 guidelines for
        the use of extended identifiers in programming languages), N113
        (CEN standard for string ordering) for determination of
        appropriate action prior to Oct. Meeting 10/94:  OPEN:  Prof.
        Saito noted they are preparing a Japanese standard for character
        ordering.
        The above action item was carried through from May 1994 to the
        May 1995 meeting.
From WG15 RIN Twente, May 1995:
        3.1.3  localedef user-specified collation weight names--Japan
                making proposal for Annex H--removed to issues list
From 9945-2:1993 Annex H.1:
     (4)  The LC_COLLATE (2.5.2.2) locale definition should be enhanced to
          allow user-specified names for collation weights.  A proposal
          from Japan is expected in this area.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 RIN Orlando, October 1995:
        N158 [WG15 N602] includes new input to this item; Japan is still
        working on this item; solution to some of the problems are not
        yet obvious.  Japan needs discussion of their paper to help them
        go forward.
        [N602 includes the following:]
        LC_COLLATE extension for user-specific names of collation weights
        Title:  Japanese proposal to POSIX.2b on LC_COLLATE extension for
                user-specified names of collation weights
        Status: Japanese position
        Short description: 
                Japan proposes to extend LC_COLLATE locale definition in 
                POSIX.2b so that names can be assigned to collation 
                weights. This proposal is the response to the item (4) of 
                ISO/IEC 9945-2:1993 Annex H.1 in which a proposal from 
                Japan is expected.
        Text of contribution:
        [Note: The page numbers refer to the ones of P1003.2/D10.]
        Sect 2.5.2.2.3 (LC_COLLATE)  PROPOSAL.                  page 10:
        Problem:
        1. General Requirements
        In most cases of ideographic characters, it is a requirement that
        a user be able to specify the combination of collation weights as
        he/she wants. Japanese kanji characters, for example, have five
        (or more) typical collation weights to support Japanese SORT.
        The five weights are On-yomi (pseudo-Chinese pronunciation),
        Kun-yomi (Japanese pronunciation), Number of strokes, Radical
        (components of Kanji), and Kanji character code. There are many
        possible combinations of these weights and the requirements for
        them (number and order of weights) may change according to the
        type of data sorted, the purpose of sorting, user's preference,
        etc. Users (or applications) want to specify the method of
        sorting by specifying the primary weight and the secondary
        weight, and so on. Because no names are available for the
        combination of multiple weights, it is reasonable requirement
        that users can use the name of each collation weight for
        specifying the method of collation. That is the way in which most
        sorting utilities existing in Japan are implemented.
        The concept of each weight for kanji characters mentioned above
        are common knowledge for Japanese. However, there are no
        standards for the weights of Japanese kanji characters. So the
        detail of assigning weights can be slightly different among
        implementations depending on which information source
        (dictionary, etc.) is used for making the weights. It is
        difficult to handle such difference by using pre-defined sorting
        method. If each weight can be handled independently, it will be
        easier to manage.
        ISO 10646 (UCS) is now a standard. UCS can be used as a codeset
        for any locale whose character sets are included in. Even if UCS
        can be used for many different countries, the requirements for
        sorting characters are different country by country. The size of
        locale databases are concerns about using UCS. It is a
        requirement that there should be no problem for providing
        solutions to the above kanji sorting requirements when UCS is
        used as a codeset.
        2. Problem in using current POSIX.2 standards specification
        Current locale model seems to assume having a well-defined
        collation definition for each locale. However, it does not match
        with the requirements for sorting ideographic characters. There
        is an opinion that it's not totally impossible for the current .2
        specification to allow implementation of satisfying most of (not
        all) the above requirements. Producing locales for all possible
        combinations of weights as well as naming each locale is the
        possible solution based on the existing standards specification.
        In addition to that it is not a complete solution, the approach
        seems not practical in the following points.
            a. Size of locale databases
            There are about 12,000 kanji characters defined in JIS standards
            (JIS X0208 + JIS X0212). Because each possible combination of
            available weights needs to have a database, the total size of
            locale databases containing such big number of characters cannot
            be ignored. (for examples, 12,000 characters x 20 databases) When
            a local for ISO 10646 code set is defined, the problem must be
            more serious.
            b. Identification of each collation method
            "Onyomi", "Kunyomi", etc. are well-known names as methods of
            sorting kanji characters. However, the problem is that no names
            are available for the combinations of the primitive methods.
            Implementors need to invent new names for the methods. (for
            example, onyomi_strokes_radical, kanji0102, etc.) The possibility
            of making standard or de facto standard for the names of these
            combinations are very low. Hence, this approach will not be
            portable.
            Considering these problems, without extending current
            specification of LC_COLLATE, standard collation API such as
            wcscoll can support only limited ways of collation for kanji
            data, for example JIS code values. In this situation,
            applications which handle character orderings (for example,
            database applications) cannot rely on locale databases to sort
            kanji data. Some applications will support several collating
            methods by having their own ordering databases. Some applications
            will simply neglect the various sorting requirements for Kanji.
        3. Overview of LC_COLLATE proposal
        By extending LC_COLLATE specification, single locale database can
        define multiple definitions of weights for kanji with their
        names. It is envisioned that the order of multiple weights can be
        specified at run time in the different order than the order of
        operands to order_start keyword. To make the different order
        effective, extension of another part of POSIX standards may be
        necessary. The weight names specified in the database should be
        referenced by a user or an application and the behavior of
        collation API needs to be modified according to the specified
        sorting method.
        The proposal for allowing users to specify collation methods is
        expected to work as follows.
            a. Define collation weights with names in LC_COLLATE
            Define collation weights with names in the locale database.
            EXAMPLE
             order_start forward,name="kunyomi";forward,name="radical"
             <char-1>   <kunyomi weight for char-1>;<radical weight for char-1>
             <char-2>   <kunyomi weight for char-2>;<radical weight for char-2>
                :
                :
             order_end
            b. Specify sorting methods
            There are two possible extensions to specify preferred collation.
            One is to introduce new environment variable (b.1), and the other
            is to use LC_COLLATE (b.2).
            b.1 Set the environment variable COLLWEIGHTS to preferred
               collation combination using names defined in the locale database.
               EXAMPLE
                COLLWEIGHTS=radical,kunyomi
                (Primary weight=radical, Secondary weight=kunyomi)
            b.2 Alternatively, existing LC_COLLATE environment variable
                can be used to specify user's preference. The weight
                names are specified after the string "@weights=" modifier.
               EXAMPLE
                LC_COLLATE=ja_JP.eucJP@weights=radical, kunyomi
            c. Initialize collation data
                There are two possible extensions to set collation methods
                at run time. One is to introduce new API (c.1), and the
                other is to use setlocale() (c.2).
            c.1 The call to setweights() initialize the collation method
                from the setting of COLLWEIGHTS environment variable. The
                setweights function can be used to change the method of
                collation at run time.
            c.2 The call to setlocale(LC_ALL, "") initialize the collation
                method from the setting of COLLWEIGHTS (or LC_COLLATE)
                environment variable.  The setlocale function can be used
                to change the method of collation at run time.
            d. API behavior
               Collation APIs such as wcscoll work depending on the current
               setting of collation method.
        The details of the proposal for extended use of environment
        variables and the initialization by API are not decided yet. The
        proposed extension to locale definition file is described below.
        The detail proposals for other parts are not ready yet.
        4. Proposal for POSIX.2b LC_COLLATE locale definition file
        Proposal: [LC_COLLATE extension for specifying weight name]
        The LC_COLLATE part of localedef specifications should allow a
        user to give names to the weights.
        => 2.5.2.2.3 order_start Keyword. Add the following directive
           description and the Example.
                It is implementation defined whether the following optional
                directive shall be recognized. If they are not supported, but
                present in a localedef source, they shall be ignored.
                name    specifies the name of a collation weight by a string.
                        An order of weights may be specified by using the name
                        at run time.
                        The syntax for the name directive shall be:
                                "name = \"%s\"", <weight-name>
                Example:
                    order_start forward,name="kunyomi";forward,name="radical"
                If an operand has a name directive, the definition of the
                primary, secondary, or subsequent weights for the collation
                element may be different from the order of operands to the
                order_start keyword.
        => 2.5.3.2 Locale Grammar. Modify the opt_word description as follows:
                opt_word        : 'forward' | 'backward' | 'position'
                                | 'name' '=' weight_name
                                ;
                weight_name     : '"' char_list '"'
        [Attachment : Example]
        Possible LC_COLLATE definition
        ==============================
        # Stroke
        collating-symbol <3stoke>
        collating-symbol <4stoke>
        collating-symbol <6stoke>
        collating-symbol <7stoke>
        collating-symbol <10stoke>
        # Onyomi
        collating-symbol <a>
        collating-symbol <i>
        collating-symbol <ka>
        collating-symbol <san>
        # Radical
        collating-symbol <ninben>
        collating-symbol <kuchi>
        collating-symbol <yama>
    
        order_start     forward,name="stroke";forward,name="onyomi";\
                        forward,name="radical";forward,name="JISnumber"
        <j1602>         <10stroke>;<a>;<kuchi>;<j1602>
        <j1643>         <6stroke>;<i>;<ninben>;<j1643>
        <j1644>         <7stroke>;<i>;<ninben>;<j1644>
        <j1829>         <4stroke>;<ka>;<ninben>;<j1829>
        <j2719>         <3stroke>;<san>;<yama>;<j2719>
    
        Changing the order by assigning values to LC_COLLATE (b.2 method)
        ====================================================
        LC_COLLATE=ja_JP.eucJP@weights=stroke,onyomi,radical,JISnumber
    
    
        Behavior of collation functions
        ===============================
    
        Output from weights=stroke,onyomi,radical,JISnumber (default)
                <j2719> < <j1829> < <j1643> < <j1644> < <j1602>
    
        Output from weights=radical,onyomi,stroke,JISnumber
                <j1643> < <j1644> < <j1829> < <j1602> < <j2719>
From WG15 Copenhagen, May 1996:
   |     PASC WG has captured this issue and has emailed an awk script
   |     (in N640r) which solves the problem.  Japan would like to take
   |     the proposed solution back to Technical Experts to ensure it
   |     answers their concerns.  The US DB would like comments ASAP to
   |     ensure it hits the .2b ballot window.  Action on Denmark and
   |     Japan to ensure the script works for them.  The issue remains
   |     open - the US DB believes their solution will not be changed.

3. Title: localedef "substitute" [Closed]

Keywords:
                locale, localedef, substitute, LC_COLLATE
Description:
                The "substitute" statement in LC_COLLATE is needed for
                describing higher levels of Danish Standard DS 377
                sorting, and should be re-introduced.
Originator:
                DK
Alternatives:
                None identified.
Documents:
(WG15RIN.136)   substitute in LC_COLLATE
(WG15RIN.246)   substitute
        N170r   WG15 RIN N036: Minutes & resolutions, Rotterdam, May 1991
        N213    WG15 RIN N046: Japanese national profile for POSIX: Vn 1.2
        N215    WG15 RIN N051, N052: RIN Minutes and resolutions, November 1991
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N323r   WG15 RIN N096: Minutes & resolutions, Reading, October 1992
        N370    RIN N103: RIN Minutes from Heidelberg, 10-11 May 1993
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                Substitute is requested only by Denmark; other
                potentially interested MBs - Canada, Japan, US, UK and
                the Netherlands have indicated that they do not require
                the substitute feature.
                The concensus is that this support can best be provided
                at application level - Denmark disagrees.
Status:
                The Issue in RIN has been revisited many times without
                concensus being reached.
                WG15 at its Copenhagen meeting resolved that 'substitute'
                is not required, and that the Issue is closed.
History:

From WG15 RIN Rotterdam, May 1991:

        N170r noted a debate on substitute:
        3.2.2.  localedef
        ...  A particular problem is the substitute command, and its use
        of regular expressions.  It has been suggested that string-for-
        string substitution would be adequate; however, the CSA -- and,
        by implication, most western -- collation standards cannot be
        met without regular expressions.  Given rationale that regexps
        are not necessary for practical national collation sequences,
        Greger Leijonhufvud would be happy to drop them.  [Actions
        9105-08 and 9105-20 were devised to check if Japan and Canada
        needed 'substitute']
From WG15 Stockholm, November 1991:
        RIN9105-8  Erik van der Poel: Determine whether substitute is
        necessary to implement Japanese collation.
        Closed.  The substitute operation is not required -- see RIN N046.
        RIN9105-20  Patric Dempster: Clarify, through discussion with Alain
        LaBonte, whether the CSA ordering standard requires the substitute
        operation.
        Closed.  The substitute operation is not required.
From WG15 Hamilton, May 1992:
        N245 included a number of Danish MB comments on the 2nd CD of
        9945-2.  Item 3 of the Danish comments was the request to
        re-introduce the "substitute" facility.
        N281, the Disposition of Comments, proposed the following:
        We believe that this change, or something similar to accomplish
        the same objective, should be studied for inclusion in the
        POSIX.2b revision and the full international standard.  It
        should be deferred because there currently exists no firm
        consensus on its necessity within the US or international
        communities.  An informative statement concrning future
        directions for 'substitute' will be included.
From WG15 RIN Reading, October 1992:
        (WG15RIN.246) substitute:
        From: keld@dkuug.dk
        Substitute specification in the LC_COLLATE section of localedef
        DS proposes to use the wording contained in ISO/IEC 9945-2 DIS
        annex G.
        3.1.4   12.  The use of 'substitute' in collation was suggested.
        A review of the history of this shows that this gives recursive
        definitions between the locale and regular expressions - which


        cannot in general be shown to be finite.  DIN 5007 and the
        Canadian standard on sorting do not use this, but the highest
        level of the Danish sorting standard (DS377) does.
                13.  The Danish national body is to produce a paper
        before the next meeting on its perceived need for the use of
        substitution in the collating order category of a locale
        vis-a-vis DS377 and in particular the level at which that
        appears to be necessary (RIN AI 9210-01)
From WG15 RIN Heidelberg, May 1993:
        2.0  Action Item Reports:  9210-01  Defer discussion [to 3.1.4]
        [The minutes do not record a paper responding to 9210-01]
        3.1.4  Canada has trouble with nested substitute routines which
        allows no character control within application.
From WG15 Twente, May 1995:
        Denmark: One thing has not been provided - text for "substitute"
        facility, from an old draft of .2.  Denmark believes that US has
        text in its archives.
From 9945-2:1993 Annex H.1:
                10: 2.5.2.2 LC_COLLATE
     (5)  The collation substitute facility, removed from 2.5.2.2 in an
          early draft, should be restored.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 RIN Orlando, October 1995:
        Denmark indicated that the problem was not a simple one, and
        that various other MBs would need it, if only they thought about
        it for a while.  HW said he would go back to check what was
        required by the Netherlands.
        9510-02 HW to check on requirement for 'substitute' by the Netherlands.
        DB said that while this was required in telephone book sorting in 
        Canadian English, this was an application issue, not an API one. 
        KS disagreed; there should be API support at this level to prevent 
        repetition of this functionality within multiple applications, with
        the possibility of them differing. DC indicated that the UK felt 
        this could be supported by other means than the API.
        RIN has identified no widespread need for the functionality. UK,
        US, Canada and Japan do not need it.  Netherlands are checking.
        The Issue is closed.
From WG15 Copenhagen, May 1996:
   |    The Netherlands reported that they saw no requirement for
   |    'substitute'.  WG15 maintained that the Issue remain closed.

4. Title: localedef "reorder-after" [Closed]

Keywords:
                locale, reorder-after, replace_after
Description:
                A mechanism for building on the collation sequence
                constructed for one locale by allowing the specification
                of a set or sets of differences in the construction of
                other, similar collation sequences for other locales.
Originator:
                DK
Alternatives:
                reorder_after was substituted for replace_after in 'mid
                1992.
Documents:
    RIN N035    Proposal for building on other locales (replace_after)
    RIN N092    Danish note on reorder_after and replace_after
    RIN N127    Procedures for European Registration of Cultural Elements, CEN draft 5
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N391    DIS 9945-2 Disposition of Comments ballot
    RIN N154    RIN Minutes, Orlando, 26/27 October
        N640r   US TAG N573, N587: AI 9510-14, Report on POSIX.2b Issues
Solution:
                WG15 RIN resolved at its October 1992 Reading meeting
                NOT to proceed with either replace_after or
                reorder_after.
Status:
                The issue is closed, having been reopened since the
                Reading meeting.  RIN believes that there is no
                requirement for this functionality.
                WG15 is advised by RIN that 'reorder-after' is not
                required.
                NB:  The above Resolution and Status is disputed by
                Denmark, which believes that the functionality is
                required as specified in CEN ENV 1205.
History:

From WG15 Stockholm, November 1991:

                d.  RIN N035, Proposal for building on other locales
                    (replace after)
>>  Consensus in RIN was that functionality of "replace after" should
        be explored (Canada volunteered to do some prototyping)
>>  Denmark should include proposal as part of their ballot comments.
        COPY statement exists in .2.2 but may work on binary data
        only (e.g. contents of locale after compilation)
        Canada had no technical objections to exploring functionality
        but was concerned about affect on existing consensus if a
        change is made at a late point in balloting, and potential
        effect on portability.
        Denmark position not final but is seeking consensus on issue;
        if consensus is to explore inclusion in later extension of
        standard, that would be OK.  [This is in relation to the
        original 9945-2 standard]
From WG15 Hamilton, May 1992:
        N245 included Danish MB comments on CD 9945-2:
        9.  ...collating sequences vary a bit from country to country,
        but generally much of the collating sequence is the same.  For
        instance the Danish sequence is quite equal to the German,
        English or French, but for about a dozen letters it differs.
        The same can be said for Swedish or Spanish; generally the
        collating sequence is the same, but a few characters are
        collated differently.
        With the advent of the quite general coded character set
        independent locales like the example Danish in POSIX.2 Draft 11
        annex F, it would be convenient if the few differences could be
        specified just as changes to an existing one.  This would also
        improve the overview of what the changes really are.  Therefore
        DS propose the following.
        For the LC_COLLATE definition, a new command is allowed:
        replace_after <collating element>
        <collating-el1> ...
        <collating-el2> ...
        ...
        replace_after ...
        ...
        replace_end
        This construct is allowed also when a "copy" statement has been
        given.  More than one replace_after / replace_end construct can
        be given.
        The <collating-el1> ... are removed from the current collating
        sequence and inserted after <collating-element> in the collating
        sequence.
        For this to work the "copy" statement should be allowed to be
        used together with other statemants in the LC_COLLATE section
        ...
        The replace-after proposal can be included in the Annex F, where
        its use is demonstrated.  Then the specification can be moved to
        the normative part of 9945-2 in a later issue.
        N281 contained the response to this proposal:
        We believe that this change, or something similar to accomplish
        the same objective, should be studied for inclusion in the
        POSIX.2b revision and the full international standard.  It
        should be deferred because there currently exists no firm
        consensus on its necessity within the US or international
        communities.
        The response goes on to indicate that the original concept of
        the "copy" statement was to duplicate an actual object
        description - the source text may not exist on the current
        system - and therefore replace-after would require the locale be
        'de-compiled'.
From WG15 RIN Reading, October 1992:
        RIN N092 renamed 'replace_...' to 'reorder_...' and proposed:
        The following section is inserted in the description of LC_COLLATE
        keywords in POSIX.2 D11.3 section 2.5.5.2.
        2.5.2.2.6 'reorder_after' keyword
        The 'reorder_after' keyword specifies a starting point for
        reordering collating elements. It is followed by one or more
        collation reorder statements, reassigning character
        collation weights to collating elements. The syntax is:
                 "reorder_after %s\n",<collating-symbol>
        2.5.2.2.6 Collation Reordering
        Each 'reorder_after' statement shall be followed by one or more
        collation element reordering entries. The definition of
        collation element reordering entries are equivalent to the
        collating element entries in 2.5.2.2.4, specifying collation
        elements and associated weights. The collating element reordring
        entries are terminated by a 'reorder_after' keyword or a
        'reorder_end' keyword.
        Each collation element specified via a collation element
        reordering entry is removed from the current collating sequence,
        if present, and inserted in the collating sequence after the
        previous reordering collation elements. The collating element
        specified on the previous 'reorder_after' statement specifies
        the first reordering collation element. The last reordering
        collation element is followed by the follower to the collation
        element specified on the 'replace-after' statement.
        Example:
         order_start
         <collating-el1>
         <collating-el2>
         <collating-el3>
         <collating-el4>
         <collating-el5>
         order_end

         reorder_after <collating-el4>
         <collating-el1>
         <collating-el2>
         reorder_after <collating-el2>
         <collating-el6>
         reorder_end

        The resulting order is then:
         <collating-el3>
         <collating-el4>
         <collating-el1>
         <collating-el2>
         <collating-el6>
         <collating-el5>
        2.5.2.2.8 'reorder_end' keyword
        The collating reorder entries shall be terminated with a
        'reorder_end' keyword.
        WG15 RIN minuted the following:
        3.1.5   18.  Discussion of RTN014 [RIN N092] resulted in a
        decision not to proceed with either 'reorder_after' or
        'replace_after' mechanism in locale ordering.
                ...the debate was however pursued through both the
                Heidelberg and Annapolis meetings through a series of
                WG15 action items: 9205-31, 9210-10, 9305-06 - RIN
                needs to advise WG15 of its decision at Reading.
From WG15 Heidelberg, May 1993:
        5.2.1 (JTC1 22.21.02.01) Shell and Utilities base {2} DIS
        The DIS ballot on 9945-2 closes June 6, 1993.  Comments and
        negative ballots are expected.  Member Bodies are requested to
        send electronic copies of ballot comments to the Project Editor
        (hlj@posix.com).  The Project Editor will prepare a preliminary
        Disposition of comments and circulate this to WG15 in July,
        1993.  The US will host an Editor's Meeting in conjunction with
        the October, 1993 WG15 meeting (see open action items 9305-41
        and 9305-42).
        N391 presented the Disposition of Comments on DIS 9945-2: they
        included -
        5. Other.  The following comments will result in no changes to
        the IS, for the reasons indicated:   ...
        Denmark 4:  The concept of "binary" or "compiled" locales has
        been quite popular among implementors of the standard and no
        attempt has been made to mandate interfaces that would make such
        implementations non-conforming.  The "localedef copy" and
        "replace-after" modifications proposed here would make binary
        locales extremely difficult to support.  Furthermore, they are
        merely alternatives to existing, standard UNIX (tm) text-file
        manipulation tools.  Since these modifications have received
        little support in WG15/RIN after repeated discussions, and none
        from the US development body or any known implementors, they
        should not be required.
From 9945-2:1993 Annex H.1:
     (6)  A facility should be added to allow simple modifications to
          existing locale collation definitions.  A proposal for such a
          replace_after keyword in LC_COLLATE is being developed by
          Denmark.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 RIN Orlando, October 1995:
        Canada indicated that this functionality is not required.  KS
        indicated that this is a major building-block for WG20 work:  he
        went on to outline the mechanism for the proposal.
        HW proposed that the Issue be recorded as Closed.  DC pointed
        out that RIN (at Reading) had already closed the Issue.  The
        consensus was that the Issue is Closed.
        9510-03 Canada to check its view of the status of the 'reorder-after'
        Issue at the request of Denmark, and to report back to the next
        WG15 meeting.
        Add Issue .. re Dk concerns with the COPY statement - is this
        source or binary? (Ref Pp 18 of the [then] existing Issues list).
                Debate on revisiting the Issues list decided to remove
                the above as an Issue for the time being, reinstating it
                if the response from 9510-12 fails to resolve the problem:
        9510-12 US to request clarification on the COPY issue from the US
                development body dealing with .2b and report back to RIN.
        [DS currently (27-Oct-95) believes that COPY works at source
         level - the IEEE development group believes it works at binary
         level.  The COPY functionality may become the focus of a
         separate RIN Issue if the response is inconclusive.]
From WG15 Copenhagen, May 1996:
   |    WG15 N640r responds to this.  An awk script to give this
   |    functionality will be added to the rationale of 1003.2b.  The
   |    Issue remains closed.

5. Title: removal of NUL special handling [Closed]

Keywords:
                NUL, character, byte
Description:
                Clarification of the form of NUL, to address the
                problems of null bytes (an eight-bit sequence with all
                the bits set to zero) appearing in multibyte character
                strings and appearing to be string terminators to C
                language library routines.
Originator:
                DK, J
Alternatives:
                None
Documents:
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N294    P1003.2b D4 (Shell & Utilities Amd)
Solution:
                NUL: A character with all bits set to zero, which is
                defined as <NUL> in the character set description file.
Status:
                Closed.  The resolution was reached in 1992.
History:

From WG15 Hamilton, May 1992:

        N245 contained the Danish MB comments on 9945-2, including:
        11.  Page 78 line 2212-2213, 2215, page 55 line 1249-1250:
        We see no need for a specific encoding and collating order for a
        character NUL, and we request that this be removed.  The current
        specifications make the POSIX specification character-encoding
        dependent, and make unnecessary constraints on this character
        when collating.
        N281 contained the following disposition:
        This will be considered as part of the P1003.2b revision.  NUL
        is the only special character, and that is because it has a
        special meaning in POSIX: it cannot be included in text files,
        and it is used to delimit strings in C.  Its value is required
        by ISO/IEC 9899, on which most POSIX.2 implementations will be
        based.  Consequently, it IS special (see also regular
        expressions).  Most of the utilities using the collation
        definition are processing text strings; certainly neither
        strxfrm() or strcoll() can handle nulls except as string
        terminators.  Making NUL the lowest character makes the
        end-of-string processing simpler and in line with the standards
        POSIX sorting rules (shorter string sorts before longer).  Also
        leading ellipsis doesn't work if NUL isn't first.
        N245 contained the Japanese MB comments on 9945-2, including:
        <ITSCJ.6> Sect 2.2.2.91 (NUL) OBJECTION.  page 37, line 647:
        Problem:
            "NUL: A character with all bits set to zero" is ambiguous,
            since by the POSIX definition "a character" means "a
            multibyte character" in general.
            It is unclear that the phrase "with all bits .. zero" this
            definition specifies a single byte null character, a
            multibyte null character (in generic), or both/neither
            (regardless of number of bits).
        Action:
            If it implies a single byte null character, change to:
            "NUL: a single byte character with all CHAR_BIT set to zero."
            If it specifies a unique null characters regardless of
            number of bits in the POSIX environment, change to:
            "NUL: A character with all bits set to zero, which is
            defined as <NUL> in the character set description file."
        N281 contained the response to this proposal:
        It is the second choice.  We added a forward pointer to 2.4 in
        2.2.2.91, where the requirements for NUL are already listed.
From WG15 Reading, October 1992:
        N294, the Shell & Utilities Amendment, Draft 4 contained the
        following entry:
        => 2.5.2.2.4 Collation Sequence.  Remove the following sentence
        from the second paragraph:
        The NUL character shall compare lower than any other character.
        Rationale:  This change partially satisfies the following
        requirement from ISO/IEC DIS 9945-2:1992 Annex H:
            (7) The specific encoding and collation requirements for the
                character NUL should be removed.
        The specific encoding was retained because the C Standard {7}
        requires it.
From WG15 RIN Reading, October 1992:
        3.1.6  19.  It was reported that the requirement for NUL to be
        handled separately had been dropped.  It was suggested that NUL
        would be defined as in ISO 6429:1988 for all possible character
        sets.  This is to be checked.
        921003 The Danish national body is to provide a proposal for a 
        definition of NUL to this group and to the US development body 
        for consideration at its January meeting (Minute 20).


From WG15 RIN Heidelberg, May 1993:

        The RIN Lead Rapporteur was unable to attend.  There was no
        input on the above action item.
From WG15 RIN Annapolis, October 1993:
        The action list was lost.  The minutes of the previous meeting
        [Heidelberg] were scanned to recover as many action items as
        possible.  The action item on NUL was not amongst them.
From WG15 Copenhagen, May 1996:
   |    NUL special handling  was dropped  in .2b,  however, NUL  was
   |    not dropped because  it had to be kept to allow POSIX locale to
   |    be a superset of the C locale.  This is acceptable to Denmark.
   |    The Issue is closed.

6. Title: full support for state-dependent charsets [Closed]

Keywords:
                charmap, character, encoding, shift-state, state-
                dependent, stateful
Description:
                A mechanism to allow otherwise-identical byte values to
                be interpreted as different characters by preceding them
                by implementation-defined escape sequences.  The escape
                sequence forces a change of state, and thus a different
                interpretation of:
                    . a subsequent byte  (single-shift encoding)  or
                    . subsequent bytes  (locking-shift encoding).
                In the latter case, a further escape sequence is
                necessary to force further state-changes.
Originator:
                J
Alternatives:
                None
Documents:
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N330    Japanese comments on Posix .2b/D4
        N362    Japan action item report
        N365    US Action Item Report
        N436    Japanese action item response for October 1993
        N602    RIN N158: Japanese Action Item report to WG15, October 1995
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                Japan believes that in view of more recent developments
                - the adoption of ISO/IEC 10646-1 and the imminent
                standardisation of UTF-8 - POSIX has an alternative way
                to represent multi-script/multi-lingual text without
                using state-dependent encodings.
                Japan therefore proposes not to pursue support for state-
                dependent encodings, however Denmark asked for the
                opportunity to offer new input to this Issue before May
                1996.
Status:
                Closed.  While Japan has decided not to pursue this
                approach, their decision was reached only a few days
                before the October '95 RIN meeting.  At that meeting
                Denmark requested the Issue be held open until the May
                meeting of WG15/RIN to allow time for additional input.
                RIN determined that the Issue will be closed in May 1996
                if no further input is received.
                No further input was received.  The Issue is closed.
History:

From WG15 Hamilton, May 1992:

        N245 and N281 (Disposition of comments on CD 9945-2 in N245)
        were considered by WG15 Hamilton.  They contained:
        <ITSCJ.57> Sect B.5 (regcomp() family)       OBJECTION.    page 788,
                                                               line 618:
        Problem:
        The functions regcomp() and regexec() should have wchar_t
        version interface because of the following reasons:
        (1) To use regcomp() and regexec() functions in a program which
        handles its internal character data in wchar_t data type,
        for example a text editor, it should do the following process:
                1. convert internal text data from wchar_t
                   array to char array.
                2. search pattern using regexec().
        The conversion should be done every time the program
        searches a pattern, for each line.  It is too heavy overhead
        to such programs and it will make wchar_t based programming
        too hard.  If wchar_t version of regcomp()/regexec() functions
        are provided, no wchar_t-to-char conversion is needed.
        (2) If regexec() is used on a system which uses state-dependent
        encoding, the following problem should occur.
        When the function regexec() is called with REG_NOSUB flag in
        the cflags argument is not set, and when a match is found,
        the function returns matched position in pmatch argument.
        If state-dependent encoding is used, this pmatch information
        may be useless because it sometimes will not returns state
        information.
        For example, suppose we are using a state-dependent
        encoding, which has two shift state and switches initial
        shift state to another shift state by SO (Shift Out) code
        and return from another shift state to initial shift state
        by SI (Shift In) code.
        If searched pattern is:
                #define SO      0x0e
                #define SI      0x0f
                char *pattern = { SO, 'X', 'Y', 'Z', SI, ' ' };
        and the string is:
                char *string = { SO, 'A', 'B', 'C', 'X', 'Y', 'Z', 'U', SI, ' ' };
        the regexec() function will return pmatch information which
        says:
                pmatch[0].rm_so = 4 (start of matched string)
                pmatch[0].rm_eo = 7 (end of matched string)
                pmatch[1].rm_so = -1
                pmatch[1].rm_eo = -1
        But in this case, naive program will treated the matched
        string as
                { 'X', 'Y', 'Z' }
        in INITIAL SHIFT STATE, not in ANOTHER SHIFT STATE, because
        returned string position information does not contains any
        state information.
        Action:
        Define wchar_t version of regcomp(), regexec() functions,
        which takes (wchar_t *) type string argument, not (char *)
        type.  Because wchar_t string has no state dependent
        information, this problem does not happen.
        It is also useful for programs which treats all character/string
        information in wchar_t type, instead of char type.
        _______________________________________________________________
        RESOLUTION:
        We believe that this subject should be studied for inclusion in
        the POSIX.2b revision and the full international standard.  See
        resolution ITSCJ.3.
From WG15 RIN Reading, October 1992:
        3.1  7. H Jesperson reported on the WG15 9945-2 ad hoc meetings
        in Utrecht as follows:-
            a.  State-dependent encoding was discussed and it was agreed
            that individual utility options should not handle the
            problem.
        3.1.7 21. A review of Uniforum and X/Open documents on state- 
        dependent text encodings has led the Japanese C-language group 
        to develop a minimal set of functions for their manipulation. The 
        whole matter of state-dependent encoding is agreed to be necessary, 
        but the question of exactly what needs to be included is left for 
        later consideration and further discussion.


From WG15 Reading, October 1992:

        SC22/WG14 working on an amendment for C, Derek Jones is the
        Project Editor.  It is also looking at locale specifications.
        Japan pointed out that concern has been voiced in RIN about
        "stateful" encoding. The SC22/WG14 Multibyte Support Extension
        will introduce this into standard. The issue should be reviewed
        carefully. The Japanese proposed MSE does not support stateful
        encoding., however is being changed to introduce 6 new functions
        to support this. It is possible that there could be a mis-match
        between POSIX and WG14 directions on stateful encoding.
        N330, Japanese MB comments on POSIX.2b Draft 4, contained three
        references to state-dependent encoding problems:
        <ITSCJ.2b.1> Sect 2.4.x (State-dependent encoding)  DISCUSSION.
        Discussion:
        [Background]
        ISO CD POSIX.2/D11.2 Ballot resolution on shift (state-dependent)
        encoding issues raised by ITSCJ (Japan) chose the option (c)
        among the following candidates:
                (a) State-dependent encoding is out of scope.
                (b) State-dependent encoding is allowed, but it is a
                    feature of implementation defined.
                (c) To support state-dependent encoding is one of the
                    issues, and it would be considered in the future draft.
        [Goal of POSIX.2b]
        ISO DIS POSIX.2/D12 Annex H says:
                (8) The support of state-dependent character encoding (*)
                    should be addressed fully.
                    [*: Original text of POSIX.2/D11 Annex H uses "state-
                        dependent character sets".  However, it is not an
                        appropriate expression.]
        [Current status of POSIX.2b/D4]
        As the first cut, it keeps space holders for
                (a) 2.4 Character Set section
                (b) 2.5 Locale section
                (c) 2.8 Regular Expression Notation section
                (d) 4-5 several utilities sections
        [What are must]
                (1) give a definition of "state-dependent encoding" or
                    "state-dependent encoded character set"
                (2) give a clear scope of POSIX(.2) on what kind of state-
                    dependent encodings shall/should/may be supported.
                (3) give specification on how to define a state-dependent
                    encoding in charmap file and/or locale
                (4) give specification on how to handle state-dependent
                    encodings (by what utilities/functions)
        <ITSCJ.2b.2> Sect Global (State-dependent encoding)  OBJECTION.
        Problem:
        State-dependent encoding features are generic over almost all the
        string/character handling functions and utilities.  For example,
        the following operations are very sensitive.  They have to keep
        track of "state" transition.
                - string/character search
                - substring/character manipulations (add/delete/modify/
                                                     insert/...)
        However, the current POSIX.2b/D4 picked up several utilities for
        enhancement of stateful-dependent encoding support.  Since the
        Japanese Ballot Comments on POSIX.2/D11.2 in terms of state-dependent
        encoding issues may not cover all the utilities that would be effected
        by state-dependent support, the POSIX.2b/D4 may mislead that other
        utilities have no problems on state-dependent encoding support.
        Action:
        In stead of addressing state-dependent encoding support in each
        potential utility section (except specific requirements for a
        specific utility), create a new subsection in Section 2 to describe
        global issues and generic requirements regarding state-dependent
        encoding support.
        In particular, list up all the possible character/string processing
        operations which shall be carefully done in state-dependent
        encoding environments and specify desirable/requested result of
        such operations.
        <ITSCJ.2b.3> Sect 2.4.x (state-dependent encoding)  DISCUSSION.
        Discussion:
        [ Support of State-dependent  Encoding ]
        Charmap cannot describe character sets encoded by stateful encoding
        schemes well because, in a stateful encoding, there is no one-to-
        one correspondence between octet values and characters, and the
        same sequence of bytes represent different characters according
        to the state that is changed by locking shift escape sequences.
        It is possible to write a charmap for such characters by placing
        locking shift to the both sides of character, where the second
        locking
                <locking shift><character><locking shift>
        shift specifies the default state.  Although this virtually makes a
        state-dependent coding stateless, it is not the common practice
        as it uses a lot of extra bytes.
        Single shift is an exception.  This form of shift is used to change
        the state temporarily for interpreting a character that immediately
        follows it.  In other words, every character in a character set
        invoked by a single shift has that single shift preceding it.
        Therefore, in charmap, it can be treated as a part of multibyte
        characters.  Unfortunately, single shifts are by far the less used
        than the locking shifts.
        Besides their description in charmap, the support of state-dependent
        character sets poses the following problems:
        (1) In searching or comparing statefully encoded strings,
            byte-par-byte comparison does not always yield valid results.
            It is allowed to insert locking shifts at arbitrary character
            boundaries even if they are redundant.
        (2) In dividing, truncating or making substrings of statefully
            encoded strings, simply returning part of them can produce
            strange results because they do not contain preceding and/or
            following locking shifts.
        (3) Concatenated strings may have redundant locking shifts which
            causes the comparison problem mentioned above.
        In order to alleviate these difficulties, an implementation that
        supports state-dependent character sets shall:
        (1) process the statefully encoded strings as a concatenation of
            state-independent character.
        (2) insert (if necessary) locking shifts at the beginning and at
            the end of substring to retain correct state information when
            extracting substrings of a string.
        (3) eliminate redundant locking shifts whenever possible.
        WG15 Plenary produced the following action items:
        9210-22: Member Bodies: Review WG15/N330 and provide feedback
        through their RIN rapporteurs.
        9210-23: Member Bodies: Bring the issues of stateful encoding
        within the new WG14 activities to the attention of their national
        experts, with special care given to issues that may conflict with
        9945-2.
From WG15 Heidelberg, May 1993:
        The 9210-22 action was noted as CLOSED: the referenced documents
        [N362, N365] (US and Japanese AI reports) contain no substantive
        argument.
        The 9210-23 action item was noted as Open and redesignated
        9305-10: the assignee was changed to Japan: see [N362, N365]
From WG15 Annapolis, October 1993:
        9305-10 was flagged as Complete at Annapolis.  N436, the
        Japanese MB report to WG15, included an attachment on State-
        Dependent Encoding Support in POSIX.2:
        RATIONALE:
        State-dependent encoding is widely used in Japan and other
        countries for data communication and data processing.  There are
        several examples:
            - When using terminals with a terminal server that do not
            allow 8-bit non-parity transmission, Japanese characters are
            transmitted to/from terminal with 7-bit stateful encoding.
            If the host is using 8-bit non-stateful encoding, which is
            very common situation, code conversion is done within the
            terminal driver.
            - For the Internet mail and news message transmission, 7-bit
            stateful encodings are used in Japan, Korea and Taiwan,
            because the underlying message transmission protocol, SMTP,
            does not allow 8-bit transmission (See RFC 821 and RFC 822).
            For detailed description of the encoding used in Japan, see
            RFC 1468.
            - On IBM-compatible mainframes using EBCDIC-based encodings,
            stateful encodings are used to process multibyte characters.
            This is true not only in Japan, but in Taiwan, Korea and
            mainland China.
        But in the current description of the POSIX standards does not
        fully address the support of state-dependent encodings, as
        written in the "2.4 Character Set" section of POSIX.2 (Page 61
        in DIS 9945-2).
        Not to prohibit implementing POSIX interfaces on the systems
        that use state-dependent encodings, some description for state-
        dependent encoding is necessary.  Please note that our intention
        is not to mandate the support of state-dependent encodings on
        all POSIX-conforming systems, but just to allow state-dependent
        encodings as an optional feature.
        THE CURRENT DISCUSSIONS IN JAPAN:
        (charmap syntax extension)
        Currently one proposal to extend charmap syntax to allow
        definition of state-dependent encodings is proposed.  It is very
        raw idea and not fully agreed one, so some feasibility study is
        needed to complete the proposal.
        The idea is to introduce "shift state declaration" syntax in the
        charmap file.  A shift state declaration declares the "shift
        sequence" (one or more bytes which indicate the change of shift
        states) to switch into the shift state.  If a shift state
        declaration is appeared, the character set mapping definitions
        following the definition defines characters in that shift state.
        The proposed syntax for shift state declaration is as follows:
            "<shift_state_%d> %s %s\n", <shift_num>, <shift_seq>, <comments>
        where:
            <shift_num> Indicates shift state number (0, 1, 2...).
                        <shift_state_0> shall be the initial shift state.

            <shift_seq> Indicates shift sequence.  The syntax of shift
                        sequence is the same as that of <encoding> part
                        of character set mapping definition.
            <comments>      Indicates comments.
From WG15 RIN Orlando, October 1995:
        N602, the Japanese Action Item Report, offered the following:
        Input from Japan to POSIX.2b:
        It has been an action item assigned to Japan that Japan propose
        an extension of charmap syntax for supporting state-dependent
        encodings. When Japan raised the issue of state-dependent
        encoding support by ISO/IEC 9945-2, the ISO/IEC 2022, which is a
        typical state-dependent encoding, is the only one international
        standard code extension technique to include multiple scripts
        (multi-lingual text) in a character stream or character string.
        However, since the ISO/IEC 10646-1 became available in 1993 and
        UTF-8 is now being standardized, the user of POSIX standards got
        alternative way to represent multi-script/multi-lingual text
        without using state-dependent encodings. And that must be the
        way which POSIX standards will endorse.
        Therefore, Japan believes that the requirements for supporting
        state-dependent encodings with POSIX systems are very small now.
        It is difficult to get support from vendors and users for any
        proposed extension on this topic.
        Considering the above situation, Japan would propose not to
        pursue the extension for supporting state-dependent encodings.
        [Discussions in the RIN minutes [RIN N154] indicated:]
        Japan have indicated that they do not wish to pursue this Issue
        in very recent email.  Denmark indicated that it wishes to pick
        up the problem and attempt to resolve it.  UK has no objection.
        Canada proposed applying a time limit on holding the issue open
        - if no input is forthcoming within 6 months RIN will close the
        Issue; this was agreed.
        9510-04 Dk to supply an input paper to Issue 6 within 6 months 
        or the Issue will be closed.


From WG15 Copenhagen, May 1996:

   |    This issue is now Closed in RIN: no additional input has been
   |    received by RIN.  WG15 resolved to regard this issue as closed.

7. Title: charmap-based charset conversion [Closed]

Keywords:
                charmap, iconv, code-set, locale, character
Description:
                A coded character-set conversion technique based on the
                charmap mechanism, with a charmap- or locale-based
                fallback.
Originator:
                WG20, DK
Alternatives:

Documents:

    RIN N111    WG20 NP on Cultural Convention-Set Registry
    RIN N112    WG20: Subdivision for cultural convention specification standard
    RIN N113    CEN: Information Technology-European Multilingual Ordering
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
        N284    WG15 minutes, Hamilton, May 1992
        N294    P1003.2b D4 (Shell & Utilities Amd)
        N330    Japanese comments on Posix .2b/D4
        N444    CEN cultural elements registry
        N462    Ca: Proposal for inclusion of CHARIDS in next amd 9945-2
        N515    US Action Item Report.
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                The proposal has been accepted in principle by RIN and
                the development body.  Charmap-based conversion appears
                in 1003.2b Draft 11 but is not yet fully developed.
Status:
                Closed.
History:

From WG15 Stockholm, November 1991:

        New DS Issues:
            3.  Want command to convert between code sets based on charmaps
                Keld has indicated that DS has done this.  The DS solution,
                however, is not known to the other members of the small
                group.
>>  KS to submit proposal.  That proposal should be reviewed
                by RIN, with coordination with the IEEE working group, with
                the potential of being included in P1003.2b
                Ultimate solution should align, where possible, with
                technology of XPG4 iconv
        [I could find no record of an appropriately-titled document to
         either RIN or WG15 in reponse to this]
From WG15 RIN Stockholm, November 1991:
         4.11.  Interface routines for locale and charmap
         Keld Simonsen introduced Danish suggestions for interface routines for
         locales and charmaps, adding that it was related to work in progress
         within X/Open.  Donn Terry pointed out that, when a well-finished
         proposal corresponding was forthcoming, it should be accompanied by a
         statement justifying the requirement for such a facility.  Given such
         justification, the facility appeared to him to be suitable as a
         component of a revision to 9945-1.
From WG15 Hamilton, May 1992:
        N245 included Danish member body comments on 9945-2:
        5. We miss a utility that can convert files based on charmaps or
        locales.  The charmaps are the formal place to specify the
        character sets, and this information should be used also to
        convert files.  As heterogeneous environments become more
        commonplace, viz. world-wide networking, and some frequent
        Danish letters occur in different positions in various character
        sets, there is much need for a specification for scripts and for
        user extensibility.  We intend to have a proposal ready for a
        later issue of 9945-2, and we see a place for this in a revised
        "tr" utility.  We would like a statement in 9945-2 that this is
        an area where work is to be done.
        N281 contained the response to this proposal:
        We have added a statement to the tr rationale.  Such a statement
        of future intentions is limited by ISO rules to a footnote or
        informative annex.
From WG15 Reading, October 1992:
        WG15 Plenary considered the responses to the following action
        item from Stockholm:
        9205-09 Danish Member Body to prepare and submit a specific
        proposal regarding conversions between code sets (based on
        charmaps, or otherwise; proposal should give appropriate
        consideration to XPG iconv). (open action item 9111-20)
        Status: Done - proposal is included in P1003.2b.
        [I could find no appropriately-titled document to either RIN or
         WG15 describing the proposal]
        N294, the 1003.2b (Shell & Utilities Amd) Draft 4, was available
        at the meeting.  The draft included a new iconv utility to
        convert codesets.
        N330, the Japanese MB comments on N294, included a number of
        objections to the iconv section:
        <ITSCJ.2b.11> Sect 4.73.3 (iconv)  OBJECTION.       page 72, line 2022:
        Problem:
        [iconv command option]
        The description of the "-f fromcode" option says that "If the
        option-argument is the pathname of a readable file, iconv shall
        attempt to use it as a charmap file, as defined in 2.4.1."  This
        semantics may cause unexpected results depending on the current
        working directory, because if a file or a directory in the
        current directory happens to be the same name of "fromcode" (or
        "tocode"), iconv will treat the file as charmap file.  This
        behavior restricts users to use file name same as codeset name.
        Because there are no standards for charmap file name, it will be
        impossible to use iconv command in a portable manner.  I think
        there should be a mean for users to specify explicitly the
        "fromcode" and "tocode" arguments to be used as charmap files.
        Action:
        There are three proposals for the modification of iconv
        specification.
        (1) The first proposal is to add a new option, "-c", to specify
        the "fromcode" and "tocode" option-arguments are charmap file
        names.  If "-c" option is not specified, iconv will treat
        "fromcode" and "tocode" option-arguments as implementation-
        defined codeset names.
        Change the description of "-f fromcode" option (lines 2021-2028)
        to:
        -f fromcode
                  Identify the codeset of the input file.  Valid values
                  for fromcode are specified in the system documentation.
                  If this option is omitted, the codeset of the current
                  locale shall be used.
        and add the following option description after the line 2030:
        -c        Treat the fromcode and tocode option-arguments as the
                  names of charmap files.  If the option-arguments are
                  the pathnames of readable files, iconv shall attempt to
                  use them as charmap files, as defined in 2.4.1.  If the
                  readable file is not a valid charmap file, the results
                  are undefined.  If the option-argument is not the
                  pathname of a readable file, the results are
                  implementation defined.
        (2) The second proposal is to add new set of options which
        specify charmap file names.  In this proposal, "-f fromcode"
        option is always used to specify codeset name.  To specify
        charmap file, you must use "-F fromcharmap" option.
        Change the description of "-f fromcode" option (lines 2021-2028)
        to:
        -f fromcode
                  Identify the codeset of the input file.  Valid values
                  for fromcode are specified in the system documentation.
                  If this option is omitted, the codeset of the current
                  locale shall be used.
        and add the following option description after the line 2030:
        -F fromcharmap
                  Identify the codeset of the input file.  If the option-
                  argument is the pathname of readable file, iconv shall
                  attempt to use them as charmap file, as defined in
                  2.4.1.  If the readable file is not a valid charmap
                  file, the results are undefined.  If the option-
                  argument is not the pathname of a readable file, the
                  results are implementation defined.  If this option is
                  omitted and -f fromcode option is not specified, the
                  codeset of the current locale shall be used.  If both
                  of the -F fromcharmap and the -f fromcode options are
                  specified, the results are undefined.
        -T tocharmap
                  Identify the codeset of the output file.  The semantics
                  are equivalent to the -F fromcharmap option.
        (3) The third proposal is to add a mechanism to identify fromcode
        (or tocode) option-argument is charmap filename or not.  In the
        following description, if fromcode or tocode option-argument has
        a <slash> character in it, it will be used as charmap file.
        Change the description of "-f fromcode" option (lines 2021-2028)
        to:
        -f fromcode
                  Identify the codeset of the input file.  If the option-
                  argument contains <slash> character in it and the
                  pathname of a readable file, iconv shall attempt to use
                  it as a charmap file, as defined in 2.4.1.  If the
                  readable file is not a valid charmap file, the results
                  are unspecified.  If the option-argument does not
                  contain <slash> character, the results are
                  implementation defined.  If this option is omitted, the
                  codeset of the current locale shall be used.
        <ITSCJ.2b.12> Sect 4.73.5.3 (iconv)  OBJECTION. page 73, line 2058:
        Problem:
        [LC_CTYPE environment variable description of iconv command]
        In the description of "-t tocode" option of iconv command, it
        says that "The semantics are equivalent to the -f fromcode
        option." and the last sentence of "-f fromcode" says "If this
        option is omitted, the codeset of the current locale shall be
        used."  It means that if the "-f fromcode" option is specified
        and the "-t tocode" option is omitted, the codeset of the current
        locale is used as the output file's codeset.  This behavior
        should also be noted in the LC_CTYPE description.
        Action:
        Add the following sentence after the line 2058:
                  If -t tocode option is omitted, this variable shall
                  determine the codeset of the output file.
From WG15 RIN Annapolis, October 1993:
        Mapping locales on to the underlying character set is
        problematic.  There is the charmap approach, but there are
        misgivings that this is inelegant at best and inefficient in the
        case of large character sets, such as used by the Japanese.
        9310-07 MBs are asked to consider the impact and problems
        associated with the support of locales by the charmap mechanism,
        and to consider the need for the establishment of a charmap
        registry.  Responses to RIN Lead Rapporteur prior to the WG15
        meeting, May 1994.
        9310-08 Lead Rapporteur to report to WG15 that RIN is considering
        the need and possible alternatives for charmaps.  RIN is looking
        for technical input on whether charmaps provide the best solution
        to the problem.  RIN notes that CEN is currently constructing a
        charmap registry, <MDR-12>, and that WG20 are also taking this
        approach - <MDR-10> and <MDR-11> refer.
                        MDR-10  ->  RIN N111
                        MDR-11  ->  RIN N112
                        MDR-12  ->  RIN N113
From WG15 Annapolis, October 1993:
        Plenary considered N515, the US action item report, which
        responded to AI 9405-56:
        9405-56 United States:  Forward N444 to PASC for possible
        inclusion 1003.2b and report back to WG15 on actions taken;
        reference WG15 resolution 94-283.               (Closed)
        CLOSED...The US has identified two proposals for change to
        9945-2 presented in N444.  The first of these is the Charsymbmap
        proposal described in section 6.9.  We beleive this proposal to
        be essentially the same as the Canadian CHARIDS proposal
        contained in N462.  See the response to action item 9405-55.
        The second proposal is the "replace-after" proposal described in
        Annex A.  The US believes this extension to be unnecessary as
        demonstrated in Annex A.4 of the same document.
        Denmark had problems with the US reponses here.  This was
        discussed in WG15 Plenary as follows:
        4.9.2   Charsymbmap (US report back on [N444])     [N515]
        Denmark believes they have consensus on this proposal now.
        Canada disagree.  The US response in N515 to Action item 9405-56
        states that they believe the proposed extension to be
        unnecessary, the functionality being provided by the CHARID
        proposal - see above.  Germany noted that if CEN adopts the
        charsymbmap proposal then Europe would have two incompatible
        standards - Posix and charsymbmap.  Denmark suggested that the
        WG15 review of 1003.2b D10 should resolve any outstanding
        issues.  The Canadian (CHARID) solution addresses a smaller set
        of problems than the Danish (charsymbmap) proposal.  It may be
        possible to resolve any shortfall in CHARIDs by suitable
        proposals to enhance it from the European members.
From WG15 RIN Orlando, October 1995:
        KS reported from his discussions with the .2 group that this
        work was in process of being added to the draft.  The Issue is
        Closed.
From WG15 Copenhagen, May 1996:
   |    Accepted in principle.  WG15 awaits the 1003.2b group, which is
   |    working on an appropriate mechanism in iconv.

8. Title: "file" user-specified recognition algorithm [Closed]

Keywords:
                file, utility, locale, file-types, LC_CTYPE
Description:
                A proposal to extend the set of file types recognised by
                the "file" utility by adding a command-line parameter
                specifying a file containing descriptions of file types.
Originator:
                DK
Alternatives:
                None
Documents:
        N271    DK: Danish comments on 9945-2 Amd 1
        N282    Disposition of comments on CD 9945-2 Amd 1
Solution:
                This proposal was accepted and will be added in the
                final standard.
Status:
                Accepted and closed.
History:

From WG15 Stockholm, November 1991:

        N271 is the first relevant document on the subject of the "file"
        utility:
        Danish comments on 9945-2 Amd 1
        Sect 5.14 OBJECTION, page 163
        Problem:
        The specification of the FILE-utility is too small a subset of
        implementations normally seen.
        a.  It should as a minimum be possible to extend the number of
            file-types recognised in a reliable (or unreliable way).  We
            need something like the  /etc/magic-filetype-specification.
        b.  It should be possible to test, if a file is of type text
            according to the LC_CTYPE class printable.
        Action:
        1.  Add a fileformat-specifications.  Use /etc/magic if nothing
            better is available.  Could be an option like
                                                [-m file]
        2.  Add the ability to recognise (printable) text according to
            the locale.  This may also be done with an option like -t or
            with a separate utility.
        _______________________________________________________________
        RESOLUTION:
        1.  This will be considered for inclusion in POSIX.2b.
        2.  This will be added in the final standard.
From WG15 Copenhagen, May 1996:
   |    Accepted and closed in RIN and WG15.

9. Title: "pax" extended character set support [Closed]

Keywords:
                file, exchange, portable, format, character,
                character-set, transport
Description:
                A mechanism whereby the exchange format may accommodate
                the full set of characters in a portable way.
Originator:
                Ca
Alternatives:
                Status quo.
Documents:
(WG15RIN.185) pax -e comments
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N266    SC22/WG14 N197: Support for symbolic character names
        N281    Disposition of comments on CD 9945-2.2
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                The revision to pax in 1003.2b Draft 11 satisfies the
                requirement.  'pax' extended headers include support for
                ISO 10646
Status:
                Closed
History:

From WG15 Stockholm, November 1991:

        The Danish MB's comments on CD 9945-2, quoted from N245 include:
        "7.  We want the text for 'pax -e' (in previous drafts) to be
        included, as we need a better quasi-portable way of transporting
        such files.  It may be included in Annex F.
        "It could be included in the normative part of the standard at a
        later stage, and we would like indications in the standard that
        an extended exchange format is being planned."
        The response, in N281, was:
        RESOLUTION:
        The text has been added to Annex G (the previous F).  Statements
        about future plans are already in the draft (See D11.2 page 551
        lines 9965-68 and page 558 lines 10251-65).
From the plenary:
        New DS Issues:
            1.  Want pax -e in -2.2
                Canada needs to have a meeting of their TAG to determine
                position.  Will meet in December and advise Hal.
        (WG15RIN.185) pax -e comments:
        Keld, here are the pax -e objection texts.    Hal
        The -e stuff is very complicated and there is a lack of
        standardized C language support to implement this feature.
        Trying to standard this at this point is a mistake.  Why not
        place an optional record in one of the archive headers that
        states "this archive was created in the foobar locale" and leave
        it up to the recipient to handle the foobar locale. Even with -e
        the way it is stated, there is no guarantee that any locale but
        the portable one will be properly handled by recipients.

        ----------------------
        Problem:
        I stated this once before -- it deserves repeating: The creeping
        proliferation of charmap is getting out of control.
        The charmap started out to be a simple and straight forward
        device to allow code set independent specifications of locale
        definitions. It is trying to generate a life of its own. It is
        this type of thing that causes those whose who do not have an
        appreciation for internationalization to oppose any and
        everything having to do with internationalization and characters
        and character sets beyond ASCII.
        I am strongly opposed to the -e option of the pax utility and
        the introduction of charmap where it should not be.
        The introduction of the -e option and charmap to the pax utility
        only serves to reduce consensus on POSIX.2.
          Action:
          Delete "[-e charmap]" from lines 9614, 9615, and 9617.
          Delete lines 9694-9713.
          Delete lines 10140-10170.
        ----------------------
        Drop this whole mess.  It's too new, I don't think that it's
        well thought out in the context of the full problem.  The time
        to address this class of issue is when the new file format is
        addressed.  When the full file format is addressed, this can be
        done in concert with controlling the format and having the
        ability to represent both very long file names and to indicate
        the character set in use.  (The use of -e could cause distinct
        filenames to be truncated to the same name.)
        Asking for warnings when a name might not translate is OK with
        me.
From WG15 Hamilton, May 1992:
        Keld's Proposal (N266):
        Danish proposal adding two functions was discussed. One function
        takes a code point and returns the symbolic character name. The
        other function takes a symbolic character name and returns the
        code point.
        DK explained these would be used in the implementation of things
        like pax -e, and iconv().
        Some discussions about the first record of the new pax format
        containing a character set name.
        There still needs to be a translation between code pages, that
        the symbolic name routines do not help with. Keld is concerned
        that industry groups are leaning towards the use of symbolic
        character names. Additionally, there are a number of Danish
        proposals in the pipeline which depend on this particular
        proposal.
        Donn Terry is still concerned with general portable applicability.
        Because the timing of iconv() and pax -e are still indeterminate
        and these routines are being proposed solely because of these,
        it was felt it is too soon.
        9205-40 US Member Body: Forward the Danish proposal, N266 to IEEE
        POSIX.1 for their review.
From WG15 Reading, October 1992:
        This action was noted as Complete at the start of the WG15
        Reading meeting.
From WG15 RIN Orlando, October 1995:
        Canada indicated that it was satisfied with the text as
        presented in 1003.2b D11.  The meeting agreed that the Issue is
        Closed.
From WG15 Copenhagen, May 1996:
   |    Accepted and closed in RIN and WG15.

10. Title: C MSE widechar support [Closed]

Keywords:
                wide, char, character, MSE, multibyte, encoding
Description:
                POSIX interfaces should normatively reference the C MSE
                wide character support APIs.
Originator:
                J
Alternatives:

Documents:

    RIN N105    Japanese Comments on POSIX.1a (MSE)
    RIN N106    Japanese Proposal to POSIX 1003.2b
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                To normatively reference the amended 9899:1995 C
                standard in POSIX standards is a necessary but
                insufficient resolution of the problem.  WG15 at its
                Copenhagen meeting acknowledged the requirement and
                requested the US development body to include an
                acceptable solution in the next draft (12) of the
                1003.2b document, following expert advice.
Status:
                Closed.  WG15 and the IEEE development body accept the
                requirement.
History:

From WG15 Stockholm, November 1991:

            a.  SRTN8, Japanese concerns re CD 9945-2
        - Japan would like to make this document visible to other
        countries - need to assign number although Japan plans to expand
        the document and deliver more detailed response before the end
        of the year.
        Japan needs to handle multiple char sets simultaneously, per ISO
        2022; data files often contain various escape sequences which
        indicate which char set data follows;  discussion of these
        requirements in relation to nature of LC_CTYPE:
        - Hal indicated that he did not feel that LC_CTYPE would prevent
        interpretation of command line args consistent with Japanese
        needs
        - item 3 on Pg2 of comments really deal with 9945-1 features?
        Japan has difficulty dealing with wide char data with
        traditional Lib C; would like to see wide char handling
        capabilities in .2 utilities, both for functionality and as an
        example of wide char handling for programmers.  Japan is not
        sure whether it would be more appropriate to include wide char
        (ISO C/MSE) features in .1 or .2; .1a might be the appropriate
        place to include these extensions. (Although it might be
        feasible to include in the LIS spec, WG15 has told US body that
        LIS MUST be the same as the 1990 standard, thus no extensions
        could be included).
        Hal suggested that these comments be included in the Japanese
        ballot, so that they would be on record officially, and the US
        could deal with them as work on .1 AND .2 proceed.
        A Japanese "Yes" vote with this comment, creating a WG15 issue,
        would allow Hal to insist that extensions be included in .2b
        (and .1a)
        [N245 includes the Japanese MB comments on 9945-2, and details
        the Japanese MSE proposal].
From WG15 Reading, October 1992:
        SC22/WG14 working on an amendment for C, Derek Jones is the
        Project Editor.  It is also looking at locale specifications.
        Japan pointed out that concern has been voiced in RIN about
        "stateful" encoding. The SC22/WG14 Multibyte Support Extension
        will introduce this into standard. The issue should be reviewed
        carefully. The Japanese proposed MSE does not support stateful
        encoding, however is being changed to introduce 6 new functions
        to support this. It is possible that there could be a mis-match
        between POSIX and WG14 directions on stateful encoding.
        WG15 Reading produced the following resolution:
        RESOLUTION 92-223       9945 multibyte/wide character handling
        Whereas the current ISO/IEC 9945-1 (POSIX.1) does not support any
        APIs for multibyte/wide character handling that are defined by
        ISO/IEC 9899 (C Language), and
        Whereas the DIS 9945-2 (POSIX.2) does specify generic character
        handling features based upon a character definition that "a
        character means a sequence of one or more bytes representing a
        single symbol", and
        Whereas an amendment to ISO/IEC 9899 is scheduled in 1993, in
        which Multibyte character Support Extensions (MSE) are proposed
        to provide a set of functions for multibyte/wide character handling,
        aiming at improvement of worldwide portability of C programs that
        need generic character handling capabilities, and
        Whereas the CD 9945-2 Ballot Dispositions and the POSIX.2b Draft
        has indicated that certain extensions will be needed in conjunction
        with the proposed ISO C MSE and its derivatives in the POSIX
        environment, and an API part of which should be included in a
        future amendment to 9945-1,
        Therefore, SC22/WG15 requests that the US:
        1.      Consider the LIS and language-binding interface changes
                necessary to handle character-oriented features as symbol
                and not storage patterns for a future revision of 9945-1.
        2.      Inform SC22/WG15 of any plans for supporting such features
                in future revisions of all parts of the 9945 Standard.
From WG15 RIN Annapolis, October 1993:
        RIN considered two papers submitted by Japan, touching on the
        MSE issue - N105, N106
From WG15 Annapolis, October 1993:
        22.39          Extensions to base     {1a}   na
        Japan will be proposing the inclusion of the 'C' MSE amendment
        in the Posix series of standards.  This is still under
        discussion in WG14.  Flags have been raised within RIN that this
        will happen.
From WG15 RIN Twente, May 1995:
        3.1.11  C MSE widechar support --Japan will make a proposal--open
From WG15 RIN Orlando, October 1995:
        This Issue was originated by Japan.  The C MSE amendment is now
        a full international standard; it should be supported by 9945-1.
        9510-05 Japan to check if the reference to IS 9899:1995 would satisfy 
        their requirements for MSE support in 9945-1, and to report their 
        findings back to WG15.
        9510-06 KS to investigate the possibility of having the latest 
        versions of the 9945-1 standard reference the 9899:1995 C standard, 
        including the MSE addendum.


From WG15 Copenhagen, May 1996:

   |    Reference to the MSE C standard is not sufficient to resolve the
   |    problem.
   |
   |    Debate diverted to what the real problem was here, and whether
   |    it was better solved in the locale or the charmap regimes.  The
   |    US offered to take the problem back to the IEEE development
   |    body, and proposed closing the issue based on the understanding
   |    that Draft 12 of 1003.2b would include a resolution of the
   |    issue.  The requirement for the functionality is accepted by
   |    WG15.  The Issue is closed.

11. Title: Invariant ISO 646 support [Closed]

Keywords:
                ISO 646inv, shell, awk, 9945-2, ISO 10646
Description:
                A proposal to permit the characterset defined by ISO 646
                inv in the shell and the small languages supported by
                the POSIX Shell and Utilities standards.
Originator:
                DK
Alternatives:
                a)      No change
                b)      Support ISO 10646
Documents:
    RIN N047    A representation for the shell in ISO 646
        N323r   WG15 RIN N096: Minutes & resolutions, Reading, October 1992
        N416    Invariant ISO 646 support in Posix 9945-2
        N640r   US TAG N573, N587: AI 9510-14, Report on POSIX.2b Issues
Solution:
                RIN regards the issue as closed.  WG15 and the US
                development body also regard the proposal as being
                rejected.
Status:
                Issue in RIN is Closed: the issue is now between DK and
                WG15 who have invited DK to supply further documentation
                to support their proposal.
                DK has been in contact with the development body and
                has submitted a proposal in their ballot comments to
                the CD registration of 1003.2b D11, October 1995.
                WG15, following advice and debate, rejected the proposal
                at its Copenhagen meeting, May 1996.  The Issue is
                closed.
History:

From WG15 Stockholm, November 1991:

            c.  RIN SRTN7/N047, A representation for the shell in ISO 646
        Proposal from Denmark relates to a long identified problem and an
        inconsistency with the recommendations of ISO TR10176 (programming
        languages should not use certain characters; note that TR10176 states
        that it may not be globally applicable, and seeks further input;
        9945-2 may be a case in point), but the Danish proposal should be
        expanded and clarified so that it:
            1)  addresses all aspects of proposed standard, rather
                than JUST the shell, (e.g. it should work with
                not only shell, but also regular expressions,
                awk, etc)
            2)  should allow use of all features of the proposed standard,
                maintaining conformance,  (e.g. currently proposed
                use of "--" would conflict with existing use)
            3)  should provide a general solution for similar requirements
                of other countries
            4)  should be sensitive to the cost/benefit ratio of imposing
                the solution in relation to existing implementations.
        Issue that proposal addresses is the ability of using national
        characters within file names etc, without impact on shell
        interpretation (e.g. Danish "slashed-O" occupies the same space
        as the POSIX pipe symbol, thus file names cannot include a
        slashed-O without the shell interpreting that character as a
        pipe).
        Presentation of national characters on displays and printers is
        a separate issue.
From WG15 RIN Reading, October 1992:
        3.1.15  28. The Danish draft on invariant ISO 646 is seen as a
        rehash of the original trigraph proposals to digraphs.  This
        should be approved by WG14 [!] before this issue may be
        re-opened in this group.  Closed pending such approval.
From WG15 Heidelberg, May 1993:
        9305-04 Denmark: Expand and clarify proposal contained in RIN
        N047 regarding usage of national characters (as defined in ISO
        646 national positions), giving consideration that such proposal:
        1) addresses all aspects of proposed standard, rather than JUST
        the shell, (e.g. it should work with not only shell, but also
        regular expressions, awk, etc.)
        2) should allow use of all features of the proposed standard
        maintaining conformance, (e.g. currently proposed use of " "
        would conflict with existing use)
        3) should provide a general solution for similar requirements of
        other countries
        4) should be sensitive to the cost/benefit ration of imposing
        the solution in relation to existing implementations
          (open action item 9111-25, 9205-11, 9210-4)
From WG15 Annapolis, October 1993:
        The above action was noted as closed.
From WG15 RIN Annapolis, October 1993:
        RIN AI 9305-05 Invariant ISO 646:  Input required from Denmark.
        This action was noted as (Open) going into the RIN meeting - but
        was not present in the list of actions at the end of the meeting,
        possibly due to the appearance in WG15 of:
        N416    Invariant ISO 646 support in Posix 9945-2
        22.41          additional utilities   {2b}    CD reg:
                                                        [N416, N420]
        Proposed action on the US to take these on board.  Nl accepts
        N420 proposal, but regards the N416 document as representing old
        technology superceded by ISO 10646.
        The original action was on DK to provide these papers as
        additional information to the US.  Done deal.  N416 and N420
        will be passed to the US for comment.
From WG15 Tokyo, May 1994:
        9405-52 United States:  Review N416 and N420 and forward them to
        PASC for consideration.
From WG15 Vancouver, October 1994:
        This action was flagged as (Closed) in the review of action
        items going into the WG15 Vancouver meeting; debate on the item
        was summarised as:
        5.2.3 22.41          additional utilities   {2b}     CD reg:
                                                        [N416,N420]
        Denmark is not happy with the response (not going to include
        extended characterset support because it would reduce consensus)
        to its request and would like to enter into a dialogue with the
        IEEE group responsible.  Denmark is invited to offer further
        supportive argument.
From 9945-2:1993 Annex H.1:
     (2)  The shell, awk, other small languages, and regular expressions
          should be supported by national variants of ISO/IEC 646 {1}.  A
          proposal from Denmark is expected in this area.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 Copenhagen, May 1996:
   |    N640r responds to this at length.
   |
   |    The IEEE development body does not believe this proposal is
   |    useful - its incorporation would reduce concensus.  Adding this
   |    to RegExp support would comprehensively break it.  The extension
   |    in its effect on meta-characters in the small languages would
   |    introduce grammar inconsistencies which would be difficult to
   |    gain approval for.
   |
   |    WG15 regards the issue as closed.  Technical experts view the
   |    problem as insoluble in the POSIX small languages.  WG15 invites
   |    technical contributions which would indicate the problem is
   |    soluble, or has been solved.

12. Title: charsymb/CHARIDS [Closed]

Keywords:
                CHARIDS, charmap, locale, localedef, UCS, code-point,
                code-set
Description:
                A mechanism to enable the automated production of a
                charmap file through the addition of a reference to a
                code-point in ISO 10646 for each symbol in the CHARID
                file.
Originator:
                Ca, DK
Alternatives:
                charmap
Documents:
    RIN N127    Procedures for European Registration of Cultural Elements, CEN draft 5
        N316    Canadian contribution to SC22/WG20 - Short character names
        N462    Ca: Proposal for inclusion of CHARIDS in next amd 9945-2
        N515    US Action Item Report
        N554    Ca Action Item Report
        N555    US Action Item Report
        N558    RIN N150: DK Action Item Report
        N566    CEN/TC 304 N437: Procedures for the registration of cultural
                elements:  Draft 9
        N605    RIN N160: DS Additional comments on P1003.2b/D11
(SC22WG15.498)  Comments on WG15 Action Item 9410-24 (Canadien questions)
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                The Issue is Closed.  Canadian and Danish inputs have
                been accepted into 1003.2b Draft 11 or later.
Status:
                Closed.
                The proposal to extend the charmap file to accomodate
                references to code points has been accepted.  The US
                development body is developing text in 1003.2b draft 12
                to address the requirement.
History:

From WG15 Tokyo, May 1994:

        Plenary considered N462, the Canadian MB contribution on CHARIDS:
        Introduction:  As defined in the current text of iso/iec 9945-2
        a locale definition file that uses mnemonic character naming
        cannot stand alone, but must be associated with a Charmap file
        that maps the mnemonic names to code points.  This mapping is
        necessarily dependent on the character set in use.
        Therefore any locale definition requires:
                - the locale definition file;
                - at least one CHARMAP file;
                - for each CHARMAP file, a statement of what character
                  set it corresponds to;
        Further there is no standardized machine-readable way of
        specifying the second and third items.  As a result it is not
        possible to write a locale definition that is independent of
        implementation.  ...
        Proposal:  We are in the process of defining a Canadian Locale
        and we need to make this definition both unambiguous and
        implementation independent.  We propose a "CHARIDS" file to
        address this deficiency.  We feel that this is an international
        requirement and should be included as a normative amendment to
        ISO/IEC 9945-2.
        The "CHARIDS" file would be very similar to CHARMAPS.  The only
        differences are that the file/header name is CHARIDS and that
        the character value operand is a reference to a code point in
        ISO 10646.  This permits anyimplementation, given a way of
        mapping ISO 10646 to the desired character set, to produce a
        corresponding CHARMAP file, without human intervention.  Note
        that the existence of a CHARIDS mechanism does not preclude the
        use of CHARMAP files as currently specified.  Document ISO/IEC
        JTC1 SC22/WG15 N316 outlines an approach based on ISO 10646 that
        we feel staisfies the CHARID requirement.
        The header and trailer would be as follows:
                CHARIDS
                END CHARIDS
        Between these two statements the symbol definitions would look
        like
                <symbol> <Uxxxx> "optional comment"
        where:
                <symbol> is a symbol representing a character and used
                         in the LOCALE definition:
                <Uxxxx>  would be U (standing for UCS) followed by the
                         hexadecimal coding value attributed to that
                         character in iso/iec 10646 (4 hexadecimal
                         digits);  mapping of UCS coding to the actual
                         code used by an environment would be
                         implemented by this particular environment's
                         designers/implementors/providers, based on this
                         standard reference.
        It should be noted that X/Open already uses this approach
        although it is not standardized.  Canada plans to use this
        syntax in its LOCALE definition.  ...
        The discussion took place at agenda point 6.6:
        6.6)    CHARID (Canada)                            Reference N462
        This is a better way of doing charmaps based on Canada's
        experience in this area.  This document has been presented to
        WG20 who has accepted it.  The CEN registry and X/Open is
        aligned with this proposal.  Canada would like to give this to
        PASC for inclusion in 1003.2b.  Resolution forwarded to the
        drafting committee to forward this to PASC.  Action item 9405-55
        on the United States to forward N462 to PASC for inclusion
        1003.2b and report back to WG15 on actions taken.
From WG15 RIN Vancouver, October 1994:
        3.1.13    Charsymb/CHARIDs (N119, N127)
        There was discussion over conflicting proposals (conflicting to
        a minor extent) presented by Mr. Kriger and Mr. Simonsen.  Mr.
        Kriger noted he believes the US-proposed changes will not be
        upwardly compatible.  Mr. Simonsen explained why they would.
        Mr. Hill noted the US noted its response to SC22/WG15 action
        item 9405-55 is relevant.  Mr. Hill noted the US expects
        substantive discussion of this item to take place in SC22/WG15.
From WG15 Vancouver, October 1994:
        Action item 9405-55 was noted as complete.  The US AI report,
        N515 refers:
        CLOSED...The US believes the proposal is not complete since it
        does not provide any way way to transform CHARIDS files into
        charmap files.  Therefore there still isn't a way to create
        portable locale definitions.  A couple of straightforward
        extensions to the localedef utility and the charmap files in
        9945-2 will provide a portable way to define locales.  We
        believe this is the intent of the Canadian proposal.
        The following list summarises changes the US proposes as an
        alternative solution to this problem:
        1. Expand the legal values for the RHS of the charmap file to
           include UCS2 and UCS4 values.  These values would be of the
           form <Uxxxx> and <Uxxxxxxxx>, respectively.
        2. Add a -u <code-set-name> option to localedef to indicate the
           target code-set to be used by the compiled locale.  If the -u
           option is given then all the values of the forms <Uxxxx> and
           <Uxxxxxxxx> will be translated from those UCS2 and UCS4
           values to corresponding code-points in the code-set specified
           by the -u option.
        3. That implementations have localedef predefined mappings for
           the standard symbolic names for characters in the character
           set defined by 9945-2 Section 2.4.
        The US believes that these changes would allow application
        writers to build portable charmap and locale source definition
        files that could be used on any implementation providing the
        9945-2 option that includes the localedef utility as long as the
        implementation recognised the target code-set for the compiled
        locale.
        The US intends to flesh out this proposal for inclusion in the
        next distributed draft for IEEE ballot of P1003.2b.  The
        proposal was not received by the US in time for distribution to
        SC22/WG15 in Draft 10.  If you have any comments, the US would
        appreciate receiving them in time for discussion at our January
        IEEE PASC meetings.
        The WG15 Plenary discussion on this was as follows:
        4.9.1   Charid (US report back on [N462])          [N515]
        Canada raised a query on why the US response to 9405-55 in N515
        offered the changes it did, and what the rationale for them was.
        The US could offer no immediate explanation, and offered to get
        a more detailed response, to be distributed by email.  Canada to
        consider whether the changes have the effect required.  The US
        had brought a number of copies of Draft 10 of 1003.2b, currently
        being distributed through the SC22 secretariat, which they
        invited comments on from WG15 MBs, preferably direct to the IEEE
        group.
        WG15 AI 9410-24 was created to require the US to provide Canada
        with the rationale.
From WG15 Twente, May 1995:
        N555, the US report, included the following:
        9410-24 United States: Distribute to the WG15 Email list the
        details on its proposal on CHARIDS, (see action item 9405-55)
        and US Response (SC22/WG15 N515)
        Response:  CLOSED The resulting changes to P1003.2b will appear
        in Draft 11 of that document.  Draft 11 was being prepared at
        the 4/95 PASC Meeting and is already approved for distribution
        as CD/PDAM Registration and Ballot.  This was mailed to
        cpwg-mail@revcan.ca and SC22WG15 mailing list on 4/27/95.  [As
        (SC22WG15.498)]:
        IEEE P1003.2 N269        April 26, 1995
        SC22/WG15 US TAG N520
        Topic:    Response to SC22/WG15 Action Item A9410-24
        From:    Donald W. Cragun
        The questions submitted by Canada with our responses are below:
        1) a)    Could the US present the precise format of the proposed new
            charmap file?
            Draft 10.9 will be available from the US delegation at the
            Enschede meeting.  Draft 11 will be distributed for concurrent
            registration and ballot soon.
           b)    Specifically, could the US explain the relationship of the
            new proposed field to the portion of each line that is now
            considered "comment" or explanatory material?
            The proposal does not include a new field.  If just allows two
            additional forms for specifying the <encoding> part of the the
            existing forms.
            The <comments> portion of the lines between CHARMAP and END
            CHARMAP are not changed.
           c)    Has the <comment_char> been used to delimit RHS comments (i.e.
            those comments that do not start at the beginning of the line)?
            Empty lines and lines starting with the <comment_char> are
            comments.  The <comments> field can contain any characters
            (within the context of a line in a text file).  Comments are
            separated from the <encoding> by one or more <blank> characters.
            A <comment_char> could be used after the required <blank> as a
            convention to make the charmap files easier to read by humans,
            but are not required by the current standard or the proposed
            changes.
        2) a)    Could the US explain the need for the addition of a new
            parameter to the localedef utility?
            The new option (-u code_set_name), specifies the name of a code
            set to be used as the target mapping of character symbols and
            collating element symbols whose encodings are defined in terms
            of ISO 10646 position constant values.
           b)    Would not a similar effect be achieved by manipulating the
            charmap with the standard text utilities and then using the
            existing localedef utility?
            None of the other standard utilities specified in 9945-2 (even
            the iconv utility in P1003.2b) is designed to translate from
            ISO 10646 16- or 32-bit values encoded as strings of the form
            <Uxxxx> or <Uxxxxxxxx> to octal, decimal, or hexadecimal
            encodings of the forms expected in charmap files by localedef.
            Scripts could be created using awk or sed to perform these
            translations manually, but the P1003.2 working group believes
            that implementations should be able to translate from 10646 to
            codesets supported by the implementation without manual
            assistance.
        3) a)    Could the US explain what is meant by "... have localedef
            predefine mappings for the standard symbolic names for
            characters in the character set defined by 9945-2 Section 2.4"?
            Canada is aware that 9945-2 specifies standard symbolic names
            for the characters referenced in Section 2.4.  Canada's
            question relates to the "... localedef predefine mappings
            ...".
            Since the 10646 encodings for all of the characters in Table
            2-4 in section 2.4 of 9945-2 are always the same, they need not
            be specified in charmap files that are encoded using the new
            formats; localedef will be required to supply the encoding
            information using the <symbolic-name> values specified in Table
            2-4 implicitly.
        N558, the Danish report, responded to 9410-35 as follows:
        9410-35 Member bodies: Look at the technical aspects of SC22/WG15
                N444 and the applicable portion of SC22/WG15 N515, [the
                US AI report] in time for the May 1995 SC22/WG15 meeting.
        DS:  ...
        1. specify a repertoire format, as earlier decided in WG15 and WG20
        3. specify repertoiremap files for locale and charmap with
           localedef, this is a further enhancement of the US recommendation
           2 of N515 9405-55 response,
        4. There is no need for the proposal 1. in the US contribution,
           if 1. and 3. above is specified.  This is also in line with
           current X/Open work.  The <Uxxxxx> information can still be
           provided, as a form of comments.
From WG15 RIN Orlando, October 1995:
        Canada expects that 1003.2b Draft 11 will resolve the Issue.
        Denmark expects that their concerns will be addressed in Draft
        12, following their discussions with the IEEE group.
From WG15 Copenhagen, May 1996:
   |    Closed.  WG15 accepts the proposal; the US development group is
   |    working on it in .2b draft 12.

13. Title: regexps [Closed]

Keywords:
                regular, expression, small language, NUL, special character
Description:
                Internationalisation of regular expressions.
Originator:
                DK
Alternatives:

Documents:

        N170r   WG15 RIN N036: Minutes & resolutions, Rotterdam, May 1991
        N245    Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities
        N281    Disposition of comments on CD 9945-2.2
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                None.
Status:
                Closed.  Insufficient expertise currently exists to
                solve the problem within the IEEE, RIN, and possibly the
                known universe.
History:

From WG15 RIN Rotterdam, May 1991:

        3.2.1.3.  Regular expressions
        There was a serious error in the definition of longest leftmost
        match for regular expressions in the last draft of 1003.2. This
        will be fixed.
        The issue of when '$' and '^' are special in regular expressions
        is contentious.  Some want 'ab$cd' to be allowed ('$' not
        special); others want it to be illegal (as it is in extended
        regular expressions). Traditionalists counter by saying that
        this would break too many existing scripts, and will probably
        win the day.  RIN is happy with this situation.
        The result of the application of a regexp to a sequence of
        characters containing an embedded null is currently permitted;
        there has been an objection to this, as current practice in the
        C language and utilities written therein is that null is
        special. This suggests that the issue is language-dependent: RIN
        is in favour of putting language in the LIS which does not
        require that null is special, but allowing bindings to make it
        (or perhaps some other character) special if they wish.
        tr no longer knows about multi-character collating sequences,
        or, indeed, anything much relating to regular expressions.
From WG15 Hamilton, May 1992:
        N245 included a number of Danish MB comments on the 2nd CD of
        9945-2, including:
        13.  We are still not satisfied with the current regular
        expression syntax, but we have no better solution at present.
        N281, the Disposition of Comments, responded:
        No action proposed.
From 9945-2:1993 Annex H.1:
     (2)  The shell, awk, other small languages, and regular expressions
          should be supported by national variants of ISO/IEC 646 {1}.  A
          proposal from Denmark is expected in this area.
        This text has been removed from P1003.2b Draft 11, May 1995.
From WG15 RIN Orlando, October 1995:
        KS's discussions at IEEE last week indicate that a new PAR will
        be forthcoming to address internationalisation issues in regular
        expressions.  It is accepted by the development body that there
        are problems with the existing specification.  BN said that the
        US TAG looked at this in some detail.  It is the area which
        receives most interpretation requests.  The .2 group does not
        currently have sufficient expertise to handle the existing
        problems, together with known internationalisation problems.
        It is not anticipated that these problems can be solved in the
        current work on .2b
     9510-07 Lead Rapporteur to investigate the availability of expertise
        to apply to the problem of regular expressions within 1003.2x and
        to report back to WG15 at its October 1996 meeting.
From WG15 Copenhagen, May 1996:
   |    Closed.  WG15 believes this request cannot be accommodated.

14. Title: Canadian Collation Weight minimum levels [Closed]

Keywords:
                collation weight, LC_COLLATE, locale, natural language
Description:
                The minimum number of weights for the LC_COLLATE feature
                is too small for the requirements of certain National
                Bodies.  IS 9945-2 specifies 2,  Canada requires at
                least 7, other NBs require 4 or more.
Originator:
                Ca
Alternatives:

Documents:

        N388    Minutes of WG15 meeting, Heidelberg, May 1993
        N577    WG15 Minutes, Enschede, 8-10 May 1995
    RIN N154    RIN Minutes, Orlando, 26/27 October
Solution:
                The Canadian requirement for extra collation weights was
                accepted by the development body.  Draft 12 of 1003.2b
                is anticipated to include the appropriate changes.
Status:
                Closed.
History:

From WG15 Heidelberg, May 1993:

        RESOLUTION 93-230   Collation Weights
             Whereas ISO/IEC DIS 9945-2, Utility Limit Minimum Value,
             Table 2-17, specifies that the maximum number of weights
             that can be assigned to an entry of the LC_COLLATE order
             keyword in the locale definition file is 2, and
             Whereas the value of 2 is insufficient to process natural
             language collation sequences,
             Therefore SC22/WG15 instructs the Project Editor to notify
             its development body that the collation weight is dependent
             on the language of the country and that Canada requires a
             minimum weight of 7.
From WG15 Twente, May 1995:
        9410-03 Project Editor: Notify the development body of collation
                weight requirements (resolution 93-230, open action item
                9305-60, 9310-23, 9405-12)  (Closed: has become 9505-02)
        9505-02 Canada - Provide collation weight question to the US again.
From WG15 RIN Orlando, October 1995:
        KS solved this one at the IEEE meeting!!  The IEEE accepted the
        Canadian proposal for 7 collation weights.
From WG15 Copenhagen, May 1996:
   |    Closed.  1003.2b draft 12 will support 7 levels.

15. Title: Japanese proposal for LC_CTYPE extension [Open]

Keywords:
                locale, char, character, character map, LC_CTYPE,
                wctrans(), towctrans(), charconv, charclass
Description:
                Japan proposes that LC_CTYPE locale definition should be
                extended to allow locale-specific character mappings to
                be specified. This extension is necessary to implement
                wctrans() and towctrans() functions in ISO C amendment
                on a POSIX conforming system.
Originator:
                J
Alternatives:

Documents:

        N602    RIN N158: Japanese Action Item report to WG15
        N657    Data specification format for transliteration and transcription
        N664    Proposal for culturally dependent fallback: Response
Solution:

Status:

                Open.
History:

From WG15 RIN Orlando, October 1995:

        N602 proposed the following extension to 1003.2b:
        [Note: The page numbers refer to the ones of P1003.2/D10.]
        Sect 2.5 (Locale) PROPOSAL.                             Page 8-9,12:
        Problem:
         The LC_CTYPE (2.5.2.1) locale definition should be enhanced to allow
         user-specified additional character mapping, similar in the concept
         to the user-specified additional character class. In the Amendment
         of ISO C standard, extended character mapping functions
         (wctrans/towctrans) are specified. The following proposed extension
         will serve for the machinery to define locale specific character
         mappings used by the functions. Without having this extension,
         POSIX conforming systems need to have their own extensions to
         implement ISO C Amendment specifications.
        Proposal:[LC_CTYPE extension for specifying character mapping]
         The proposed extension for character mapping is similar to the
         extension of character class, which is already specified in .2b
         draft.  New keyword 'charconv' is introduced to define locale-
         specific character mappings instead of 'charclass' keyword for
         character class.  The way of defining character mapping is not
         extended with this proposal.  The same specification for toupper/
         tolower mapping can be used for locale-specific character mappings.
            EXAMPLE:
             LC_CTYPE
             # define the names of locale-specific character mappings
             charconv tojkata;tojhira
             # tojkata: hiragana => katakana mapping
             tojkata (<j0401>,<j0501>);(<j0402>,<j0502>);\
                     .....definition.....
             # tojhira: katakana => hiragana mapping
             tojhira (<j0501>,<j0401>);(<j0502>,<j0402>);\
                     .....definition.....
             END LC_CTYPE
        [Proposed extension to .2b text]
        [Page 8]
        => 2.5.2.1 LC_CTYPE. Add the following keyword items after the item
           labeled tolower:
        charconv  Define one or more locale-specific character mapping names as
                  strings separated by semicolons. Each named character mapping
                  can then be defined subsequently in the LC_CTYPE definition.
                  A character mapping name shall consist of at least one and at
                  most fourteen bytes of alphanumeric characters from the
                  portable filename character set. The first character of a
                  character mapping name cannot be a digit. The name cannot
                  match any of the LC_CTYPE keywords defined in this standard.
        charconv-name
                  Define the named locale-specific character mapping.
                  In the POSIX Locale, the locale-specific named character
                  mapping need not exist.
                  If a mapping name is defined by a charconv keyword, but no
                  character mappings are subsequently assigned to it, this
                  is not an error; it shall represent a mapping without any
                  character pairs belonging to it.
        [Page 12]
        => 2.5.3.1 Locale Lexical Conventions. Add the following token
                   description:
        CHARCONV  A string of alphanumeric characters from the portable
                  character set, the first of which shall not be a digit,
                  consisting of at least one and at most fourteen bytes,
                  and optionally surrounded by double-quotes.
        [Page 12]
        => 2.5.3.2 Locale Grammar. Modify the ctype_keyword and
                   charconv_keyword descriptions as follows:
           ctype_keyword        : charclass_keyword charclass_list EOL
                                | charwidth_keyword charclass_list EOL
                                | defwidth_keyword defwidth_value EOL
                                | charconv_keyword charconv_list EOL
                                | 'charclass' charclass_namelist EOL
                                | 'charconv' charconv_namelist EOL
                                ;
           charconv_namelist    : charconv_namelist ';' CHARCONV
                                | CHARCONV
                                ;
           charconv_keyword     : 'toupper' | 'tolower'
                                | CHARCONV
                                ;
From WG15 Copenhagen, May 1996:
   |    N657 and N664 refer.  N657 is an expert contribution from
   |    Denmark, N664 is not an official US response - it comes direct
   |    from the .2b group.
   |
   |    The US development body asked for clarification of the Japanese
   |    proposal: does it require just character-to-character translation,
   |    or character-to-string, which is a much larger problem.
   |
   |    WG15 actioned KS to provide details of existing implementations
   |    of the proposal in N657 by 15-June.
   |
   |    WG15 further actioned KS to respond to the queries raised in N664
   |    by 1-July for consideration by the IEEE 1003.2b DB.

16. Title: Character concepts in POSIX [Closed]

Keywords:
                wchar_t, character, byte, internationalisation,
                localisation,
Description:
                Japan expressed a concern that POSIX standards blurred
                the terms byte and character.
Originator:
                J, DK
Alternatives:
                None.
Documents:
        N372    I18N Guidelines
        N388    Minutes of WG15 meeting, Heidelberg, May 1993
        N434    WG15 minutes and resolutions, October 1993
        N441    Character concepts in Posix standards
        N482    US TAG N472:  US Action Item Report
        N499    WG15 minutes and resolutions, May 1994
        N515    US Action Item Report.
        N532    WG15 minutes and resolutions, Oct 1994
Solution:
                RIN was actioned to produce guidelines to assist the
                Development Body and Project Editor to write interface
                definitions which clearly differentiated between the two,
                allowing better support of international character sets.
                Guidelines were offered in N372, and further comments in
                N441.
Status:
                Closed.  The standards take sufficient care to
                distinguish 'character' and 'byte'.
History:

From WG15 Heidelberg, May 1993:

        The Plenary minutes, N388, contained the following Action Item
        and response:
        Action 9210-32: RIN Lead Rapporteur: Investigate the production
        of guidelines for standards developers for the usage of the terms
        character and byte in the definition of interfaces, with especial
        attention to the internationalisation issues arising from
        character-based interfaces.
                CLOSED: see [N372]
        No specific action was assigned to N372 in WG15's minutes.
        N372, authored by Yasushi Nakahara, included the following:
        For your good understanding of this action item, some background
        information may be required.  If I remember correctly, this
        action was derived from my comments at the plenary session.  So,
        I'm adding some explanations.  See an excerpt from the Reading
        minutes and my comments below.
        > 2.8 Rapporteur Group report/status
        > 2.8.1 Security
        >
        > ...
        >
        > Japan further identified problems in the usage of the terms
        > "character" and "byte" in the P1003.6 document.  RIN should be
        > requested to provide guidance to standards developers in order
        > to avoid such problems in the future.  The specification of
        > character-oriented interfaces require careful consideration of
        > internationalisation issues that do not affect interfaces
        > specified in terms of bytes.
        The last paragraph was an actual (partial) log of such
        discussion, although at that time in conjunction with Jon's
        comment on I18N issues I added that not only P1003.6, but also
        almost all the P1003.x documents may have I18N issues wherever
        "character" interfaces are being specified.  More specifically,
        I explained that the recent P1003.4 and P1003.7(.x) drafts have
        the similar I18N issues to what Japanese POSIX WG has been
        actively commenting on POSIX.1 and POSIX.2 specifications since
        1989 in terms of I18N/L10N features and "character vs. byte"
        issues, and that Japan has to repeatedly send the similar
        comments again and again on each POSIX.n draft, which may be
        neither effective nor productive.  So, I suggested, rather than
        such patch works, that concerned National Bodies and/or RIN
        should develop certain designing/reviewing guidelines (or
        appropriate template) for I18N/L10N specifications, in order to
        make each ballot/disposition process of POSIX.n draft more
        productive and consistent (in terms of I18N/L10N
        specifications).
        Actually, the Japanese ballot comments on CD 9945-2 pointed out
        such cross functional aspects of I18N/L10N issues and introduced
        some proposed designing/reviewing guidelines for I18N/L10N
        specifications.
        With these things in mind, I'm enclosing draft proposed
        reviewing/designing guidelines for I18N/L10N specifications.
        _______________________________________________________________
          Draft Proposed I18N/L10N Guidelines for
          (POSIX) Standard Interface Design and Review
          1. Take into account of the following aspects:
             - Character counts != byte counts
             - Character counts != display width
             - Byte counts != display width
             - Only the "wchar_t" type in C language (known as a "wide
               character") corresponds to the concept of a character.
          2. Do not use a term "character" neither in the meaning of
             "byte" nor in the meaning of "display width" or "column
             position".
          3. Determine which interfaces are character-oriented (arguments
             or operands, input data, output data, I/O format and etc.)
             If the interface in question is byte-oriented, carefully use
             a term "byte" or an appropriate wording so that interpretation
             of the specification should not be mixed up with the concept
             (definition) of a character.  And, skip the following
             guidelines (which are fully character-oriented).
          4. Carefully study the features of character-oriented interfaces
             and give appropriate specifications (or review the proposed
             specifications in reviewing process) in terms of the following
             aspects:
              - Character boundary recognition
                [This shall be generic "character" based.]
              - Limit check & truncation in various units, in particular,
                make clear what units (byte, character, column, width,
                and etc.)  shall be applied.
              - Character/string width recognition
                [This shall be generic "character" based.]
              - Character/string parsing & manipulation
                [This shall be generic "character" based.]
                Also, locale dependency such as LC_CTYPE and LC_COLLATE
                shall be well defined.
              - Language dependency of text data including message data
                [Make clear what natural language dependencies are
                 (explicitly/implicitly) included in the target text.]
              - Culture dependency of representations
                [Make clear what (other) locale dependencies are covered
                 by the specification via suitable LC_XXX (such as LC_TIME,
                 LC_NUMERIC, LC_MONETARY, LC_MESSAGE ) and LANG variables.]
        No specific action was assigned for N372 in WG15's minutes.
From WG15 Tokyo, May 1994:
        N441, submitted by Keld Simonsen, contained the following:
        Action: for US NB consideration on WG15 action item 9305-24
        A comment on Nakahara-sans paper, his statement in the draft
        guideline, clause 1, that only "wchar_t" type in C conrresponds
        to the concept of a character.
        I would say that it is the multibyte character type of POSIX
        which corresponds to the concept of a character.
        The "wchar_t" type of C gives restrictions on the represen-
        tation of characters, as they all must be represented by the
        same number of bits, and there is restrictions on the values
        which must be harmonized with the "char" type, this is not the
        case with the POSIX multibyte characters. C multibyte charac-
        ters cannot have a null byte in them, but allowing null bytes
        is needed for a general representation of a character. I
        believe that the POSIX multibyte character concept does not
        have this limitation. If not, the limitation should be removed.
        As POSIX standards currently use the multibyte character as it
        "character" concept, there is no need to change this. But there
        is a great need to use the character terms consistently across
        POSIX standards.
        N499, the WG15 Plenary minutes, perpetuated an action on the US
        as follows:
        9310-10 United States:
        1)      Consider the LIS and language-binding interface changes
                necessary to handle character-oriented features as a symbol
                and not storage patterns for a future revision of 9945-1.
        2)      Inform SC22/WG15 of any plans for supporting such
                features in future revisions of all parts of the 9945
                standard (resolution 226, open action items 9210-71, 9305-24)
                Open.  Status in N482.  Pending response from PASC.  New
                action item 9405-03
        The Danish comments in N441 were dealt with under Agenda Item
        6.4, as:
                6.4)    Character concept.
        Reference:      WG15 N441.
                No action
From WG15 Vancouver, October 1994:
        N532, the WG15 Plenary minutes, noted Action 9405-03 on the US
        as 'Closed', with no comment.  N515, the US Action Item Report,
        contained the following:
        9405-03 United States: ... Status: CLOSED...Character interfaces 
        defined in ISO/IEC 9945-1 use containers for representations of 
        character strings, with size in bytes, since this is existing 
        behaviour. These interfaces support multi-byte character encodings 
        (with some restrictions), as defined in the C standard.
        Support for abstract characters is being considered in the
        9945-1 LIS.  There are no plans to add support for abstract
        characters in the C binding for 9945-1 or in the C-Language
        Bindings Option (Annex B) in 9945-2.
From WG15 Copenhagen, May 1996:
   |    The US DB is not aware of any blurring of the term byte and
   |    character in the current standards.
   |
   |    Japan and Denmark believe the current drafts are clean.  The US
   |    DB has done the right thing and adopted wherever possible the
   |    correct usage.  The Issue is closed.

17. Title: Range expression dependency [Open]

Keywords:
   |            collation, element, regular, expression, pattern,
                LC_COLLATE, localedef
Description:
   |            The user-defined ordering of collation elements in an
   |            LC_COLLATE table is inadequately specified.  Different
   |            but equally valid tables can produce differing results
   |            when used as the basis of regular expressions, pattern
   |            matching, etc
Originator:
                DK
Alternatives:
                None.
Documents:
        N605    RIN N160: DS Additional comments on P1003.2b/D11
Solution:

Status:

                Open.
History:

From WG15 RIN Orlando, October 1995:

        @ 2.8 o 5
        line 379: The range expression should not be dependent on the
        collation element order, but rather the result of the
        comparison using the relevant collation. Using the collating
        element order is not proper, and confusing to users that only
        have expectations as defined by the collation rules.
From WG15 Copenhagen, May 1996:
   |    1003.2 is ambiguous on this point and 1003.2b will not be able
   |    to fix  the problem.   There are two fairly  simple solutions,
   |    but they are mutually exclusive,  and the  proponents  of each
   |    solution  do not  readily  admit to  the  possibility that the
   |    alternative solution may be valid.
   |    This issue remains open.


Additional historical notes:

   |    This request was forwarded to IEEE from X/Open end 1993 for
   |    interpretation.
   |      (Section 2.5.2.2, LC_COLLATE,
   |      "User-defined ordering of collating elements. Each collating
   |      element shall be assigned a collation value defining its order
   |      in the character (or basic) collation sequence. This ordering
   |      is used by regular expressions and pattern matching and, unless
   |      collation weights are explicitly specified, also as the collation
   |      weight to be used in sorting."
   |    Given this passage, assume there are two similar LC_COLLATE
   |    fragments.  The fragments include lowercase letters only to
   |    simplify the examples.  Here is the first fragment:
   |    <a      <a>;<a>;<a>
   |    <a-grave<a>;<a-grave>;<a-grave>
   |    <a-acute<a>;<a-acute>;<a-acute>
   |    <b      <b>;<b>;<b>
   |    <c      <c>;<c>;<c>
   |    <d      <d>;<d>;<d>
   |    . . .
   |    <z      <z>;<z>;<z>
   |    . . .

   |    Here is the second fragment:

   |    <a      <a>;<a>;<a>
   |    <b      <b>;<b>;<b>
   |    <c      <c>;<c>;<c>
   |    <d      <d>;<d>;<d>
   |    . . .
   |    <z      <z>;<z>;<z>
   |    <a-grave<a>;<a-grave>;<a-grave>
   |    <a-acute<a>;<a-acute>;<a-acute>
   |    . . .

   |    Suppose a user wanted to find all words that begin with a letter
   |    in the range a-c. An XoJIG meeting agreed that a locale
   |    built using the first fragment returns words that begin with <a>,
   |    <a-grave>, <a-acute>, <b>, and <c>. However, there were varying
   |    opinions about whether the second fragment would return the same
   |    results, or would exclude <a-grave> and <a-acute>. So the
   |    question is this:
   |    Should an RE run against a locale built using the second fragment
   |    include the accented a's in the range because they are defined as
   |    being in the same equivalence class as <a>, or should it exclude
   |    the accented a's because they are listed outside the range of a-c?
   |    A preliminary response was obtained from IEEE in Feb 1994:
   |    The standard is unclear on this issue, and as such no conformance
   |    distinction can be made between alternative implementations based
   |    on this.  This is being referred to the Sponsors of the standard
   |    for clarifying wording in the next amendment.
   |    This response will be incorporated in an IEEE interpretations
   |    publication, and will be also made available on-line on the IEEE
   |    SPAsystem.
   |    IEEE Interpretation for 1003.2-1992
   |    -----------------------------------
   |    The standard is ambiguous in this area, since it is not clear
   |    what the phrase "collation sequence order" means or is.  The two
   |    possibilities are "the order in locale file", or "the order
   |    determined by the weights in the locale file".  The standard
   |    allows either behavior.  Concern over the wording of this area
   |    has been forwarded to the Sponsors of the standard.
   |    Rationale for Interpretation:
   |    -----------------------------
   |    None.
   |    ________________________________________________________________
   |    (c) 1994 The Institute of Electrical and Electronic Engineers, Inc.
   |    Not to be published without prior written permission of the IEEE.
   |    Andrew Josey
   |    PASC Vice-Chair Interpretations
   |    ------
   |    DS finds it unnecessarily complex to introduce two levels for
   |    comparisons, one that is related to the comparison functions,
   |    and then one that is related to the order the weights appear in
   |    a localedef definition file.  The latter is normally not part of
   |    the definition of the collation order, but becomes significant
   |    if this interpretation is favoured.  The first interpretation
   |    should be favoured, as the algoritm is already known by the user,
   |    and gives the less unexpected result.

0. Title: <> [Open/Closed]

Keywords:

Description:

Originator:

Alternatives:

Documents:

Solution:

Status:

                Open/Closed.
History: