From D.Cannon@exeter.ac.uk Sat Oct 14 20:47:48 1995 Received: from hermes (hermes.ex.ac.uk [144.173.6.14]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id UAA08334 for ; Sat, 14 Oct 1995 20:47:35 +0100 From: D.Cannon@exeter.ac.uk Received: from cen by hermes via ESMTP (UAA29434); Sat, 14 Oct 1995 20:47:24 +0100 Message-Id: <17779.199510141947@cen> Subject: Putative WG15 RIN Issues list To: wg15rin@dkuug.dk (WG15 RIN) Date: Sat, 14 Oct 1995 20:47:22 +0100 (BST) Acknowledge-to: D.Cannon@Exeter.ac.uk X-Organisation: University of Exeter, IT Services X-Disclaimer: The following is a personal statement and does not reflect University of Exeter policy or agreement. X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 97321 No-one has asked for me not to email the current version of the issues list document to this list, so here it is. Look forward to seeing you all in a week, Cheers, Dave. _____________________________________________________________ ISO/IEC JTC1/SC22/WG15 RIN Issues List Rationale: At the WG15 RIN meeting in Twente, 11-12 May 1995, it was decided to remove the Agenda Items traditionally listed under 3.1 to entries in this document, the WG15 RIN Issues List. This was because the status and raison d'etre of these items had been obscured over time, and the debate on each item was being revisited at each meeting. A triumvirate of David Cannon (UK), Keld Simonsen (Dk) and George Kriger (Ca) was charged with exhuming the argument and status of each item from past RIN and WG15 papers and minutes, and encapsulating them here. The intention is that this document will be maintained and updated as the argument on each of the Issues develops. Strangely, we have been here before: From WG15 RIN Stockholm, November 1991: Keld Simonsen suggested that the group should have an issues log. There was some discussion of the function of such a log, where it should appear, and of whether the group has any issues suitable for such a log. From WG15 RIN Annapolis, October 1993: Canada proposes to remove a swathe of items under RIN Agenda item 3.1, and focus the agenda more closely on the papers submitted. The UK, US agreed. It was intended that items which were still relevant but had no immediate input, should be moved to an issues list. The issues list to be visited and reviewed at each meeting. ...we made it, eventually... Executive Summary: Closed: The Issue is closed in RIN - not necessarily everywhere else. MBs or WG15 may still regard the Issue as active. This is the RIN Issues list - no-one else's. Closed in RIN means that RIN has no further legitimate interest in the Issue. WG15, at its discretion, may request RIN re-open it. Open: The Issue is open in RIN - RIN regards the Issue as receiving its active attention. WG15 has asked RIN to consider the issue, and RIN has not yet reached conclusion on the Issue. Upon conclusion RIN shall advise WG15 of its recommendations. Index: 0. Extended Identifiers in 1003.2b [Closed] 1. localedef is_wctype() [Closed ?] 2. localedef user-specified collation weight names [Open] 3. localedef "substitute" [Closed] 4. localedef "reorder-after" [Closed] 5. removal of NUL special handling. [Closed] 6. full support for state-dependent charsets [Open] 7. charmap-based charset conversion 8. "file" user-specified recognition algorithm [Closed] 9. "pax" extended character set support 10. C MSE widechar support [Open] 11. Invariant ISO 646 support [Closed] 12. charsymb/CHARIDS [Open] 13. regexps 0. Title: Extended Identifiers in 1003.2b [Closed] Keywords: characterset, lex, awk, shell, scripts, small, language Description: A proposal to permit a more extensive set of characters in the small languages supported by the POSIX Shell and Utilities standards. Originator: WG20, Dk Alternatives: To remain with the status quo. Documents: RIN N047 A representation for the shell in ISO 646 N264 SC22/WG20 N085: Extended identifiers N283 SC22/WG15 liaison statement to WG20 N294 P1003.2b D4 (Shell & Utilities Amd) N416 Invariant ISO 646 support in Posix 9945-2 N417 WG20 liaison report to WG15 N420 Extended characterset in Posix identifiers N515 US Action Item Report. N532 WG15 minutes and resolutions, Oct 1994 AN12 WG20 current and intended work (WG20 N223) Resolution: WG15 and the US development body have rejected the proposal in N416, and accepted that contained in N420. Status: Issue in RIN is closed, N416 having been rejected, N420 accepted. Any remaining issue re N416 is now between Dk and WG15 who have invited Dk to supply further argument to support the proposal. History: N264 was the first relevant identifiable WG15 paper input on this subject. From WG15 Stockholm, November 1991: c. RIN SRTN7/N047, A representation for the shell in ISO 646 >> Proposal from Denmark relates to a long identified problem and an inconsistency with the recommendations of ISO TR10176 (programming languages should not use certain characters; note that TR10176 states that it may not be globally applicable, and seeks further input; 9945-2 may be a case in point), but the Danish proposal should be expanded and clarified so that it: 1) addresses all aspects of proposed standard, rather than JUST the shell, (e.g. it should work with not only shell, but also regular expressions, awk, etc) 2) should allow use of all features of the proposed standard, maintaining conformance, (e.g. currently proposed use of "--" would conflict with existing use) 3) should provide a general solution for similar requirements of other countries 4) should be sensitive to the cost/benefit ratio of imposing the solution in relation to existing implementations. Issue that proposal addresses is the ability of using national characters within file names etc, without impact on shell interpretation (e.g. Danish "slashed-O" occupies the same space as the POSIX pipe symbol, thus file names cannot include a slashed-O without the shell interpreting that character as a pipe). Presentation of national characters on displays and printers is a separate issue. From WG15 Hamilton, May 1992: The plenary considered N264 and prepared the following liaison statement to WG20 as WG15 N283: WG15 has reviewed WG20 document N085 entitled "Extended Identifiers", which encouraged discussion of its proposal, and offers the following comments: 1) The POSIX Shell and Utilities standard (DIS 9945-2) provides facilities for locale-dependent specifications of character attributes that optionally are adjustable by the user or application. WG15 recognises that allowing characters outside the POSIX portable character set is a feature that directly impacts portability, but it is a desirable localisation facility in some environments. 2) WG15 believes that any extensions to programming language identifier requirements should be accomplished within the framework described in 1) above. 3) 9945-2 contains several "small languages", such as shell and awk, that WG15 intends to enhance in this area. It believes that the proper approach would be to allow characters in classification "alpha" in the current locale whereever the current specifications allow alphabetics from the portable character set (equivalent to the ISO 646 repertoire). (The "alpha" classification may include syllabic and ideographic characters, and is named "alpha" for historic reasons.) Because of differing requirements in the various languages, WG15 considers any additional degree of flexibility to be infeasible across all languages. WG15 plenary resolved to pass the above statement through its liaison to WG20: RESOLUTION 201. LIAISON STATEMENT TO WG20 WG15 instructs its liaison to WG20 to transmit WG15 N283 as a WG15 liaison statement to WG20. From WG15 Annapolis, October 1993: 4.4 Liaison statements & actions related thereto [N417, AN12, N420, N421, N422] N420 is intended to be an amendment to the Posix 'small' languages. It proposes an extended characterset for lex, awk, shell scripts, and as such might break them as they are currently specified. Re N417 point 7: Keld maintains that N420 is implied by areas of work defined in AN12 (WG20 N223). This is the one which may break things. KS suggests that this is solved via the locales mechanism. No action is required... ??? 22.41 additional utilities {2b} CD reg: [N416, N420] Proposed action on the US to take these on board. Nl accepts N420 proposal, but regards the N416 document as representing old technology superceded by ISO 10646. The original action was on Dk to provide these papers as additional information to the US. N416 and N420 will be passed to the US for comment. (The action item was carried forward to the May 1994 meeting) From WG15 RIN Annapolis, October 1993: Resolution RIN 9310-04: Internationalisation Concerns in 1003.2b WG15 RIN notes that the new Annex H to 9945-2 addresses the concerns of the international community, specifically of Japan and of Denmark. 9945-2 Annex H indicates that input is required from WG15 MBs on a number of specific issues and therefore WG15 RIN requests an indication of the latest dates by which such input is required by the US development body, in order to maintain synchronisation of the ISO/IEC and IEEE work. ...after input from Arnie Powell it was decided to convert the Resolution on Annex H to an action item on the US RIN Rapporteur in order to achieve it in a more timely fashion. From WG15 Tokyo, May 1994: 9405-52 United States: Review N416 and N420 and forward them to PASC for consideration. From WG15 Vancouver, October 1994: The 9405-52 action was noted as Complete, the response being included in N515, the US action Item report: CLOSED...re N416...POSIX.2b does not plan to include the suggested changes. The proposal provides separate sets of "tri-graphs" for each of the languages specified in POSIX.2. The sets of tri-graphs vary from language to language making it difficult for users to remember which tri-graph means what. The proposed sets of tri-graphs do not reflect historic practice. Some of these sets of tri-graphs introduce ambiguities into the language. Some of these sets of tri-graphs have not been completed. POSIX.2 has attempted to reflect historic practice and make the implementation of these utilitiesmore consistent with one another, so that users will have less difficulty learning to use the standard utilities. The proposed changes would be useful to a very small subset of the intended audience of the standard and would make it much more difficult for all users to write portable scripts. re N420...The languages specified by POSIX.2 specify behaviour when identifier names are chosen from the portable character set. We have not found anything to preclude an implementation from recognising extended characters as part of an identifier. However, an application making use of those extensions would be non-portable. The following discussion occurred: 5.2.3 22.41 additional utilities {2b} CD reg: [N416,N420] Denmark is not happy with the response (rejection of the proposal in N416 because it would reduce consensus) to its request and would like to enter into a dialogue with the IEEE group responsible. Denmark is invited to offer further supportive argument. From 9945-2:1993 Annex H.1 | 7: 2.5 Locale | (1) Provisions should be made to allow characters beyond those in | the portable character set in user-supplied identifiers for the | shell, awk, bc, lex, make, and yacc. A proposal has been made | by Denmark to extend the locale definition to specify the set of | identifier characters for all programming languages. | | This text has been removed from P1003.2b Draft 11, May 1995. 1. Title: localedef is_wctype() [Closed ?] Keywords: locale, localedef, is_wctype() Description: is_wctype() determines whether the wide character c has the property p. For example: is_wctype(c, wctype("lower")); where wctype("lower") returns a value of type p. Originator: J Alternatives: None Documents: RIN N088 LC_CTYPE extension for additional character mappings N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N294 P1003.2b D4 (Shell & Utilities Amd) N531 IEEE P1003.2b D10: Shell & Utility Extensions Resolution: ? An audit trace of WG15 and WG15/RIN minutes and ? resolutions indicates that this proposal was accepted ? and is complete. The current draft of P1003.2b flags ? that the US DB is expecting input from Japan despite ? Action 9205-32 being complete... Status: Closed History: N245 was the first relevant identifiable paper on this subject: From WG15 Stockholm, November 1991: The Japanese MB comments on CD 9945-2, quoted from N245, raises an objection [@ O o 4 ] relating to "...additional character classes suitable for classes beyond the current ANSI/C and/or Latin based character classes. The current draft says that such additional character classes may be supported by implementation, but which is implementation defined. "Action: As the ISO/C Multibyte Support Extension (MSE) is going to provide a new function is_wctype(), some corresponding enhancement of LC_CTYPE description file should be considered so that 'user/implemetation definable character classes' can be supported in the POSIX environments in the standard manner. "Japan will probably be able to cooperate with the POSIX.2 developing member body (US - IEEE) on how to solve these issues." N281 contained the following disposition: We also believe that this functionality should be studied for inclusion in the POSIX.2b revision and the full international standard. We are aware of efforts within X/Open to address this area and would like to take advantage of their developments. An action 9111-23 was devised to reformat the Japanese comments on 9945-2 to items in the WG15 Issues list. From WG15 Hamilton, May 1992: At WG15 Hamilton, this was transformed into: 9205-32: Japan to provide to the US Member Body proposals for areas identified in their 9945-2.2 comments #s 2, 3, 4, 10, 11, 54, and 57 addressing resolution comments in N281. From WG15 Reading, October 1992: Action 9205-32 was noted as Complete. No document is cited, no action recommended. WG15 plenary considered N294, the P1003.2b Draft 4 document. This contained on Page 5 the following: 2.5.2.1 LC_CTYPE Add the following keyword items between the items labeled blank and toupper: charclass Define one or more locale-specific character class names as strings separated by semicolons. Each named character class can then be defined subsequently in the LC_CTYPE definition. ... charclass-name Define characters to be classified as belonging to the named locale-specific character class. In the POSIX Locale, the locale-specific named character classes need not exist. ... This addition was adopted from XPG4 to satisfy the following requirement from ISO/IEC DIS 9945-2:1992 Annex H: (3) The LC_CTYPE (2.5.2.1) locale definition should be enhanced to allow user-specified additional character classes, similar in concept to the proposed C Standard {7} Multi-byte Support Extension (MSE) is_wctype() function. From WG15 RIN Reading, October 1992: RIN considered N088, a proposal for an LC_CTYPE extension to support additional character mappings. There is no record of further action on this document. From WG15 Vancouver, October 1994: N531, Draft 10 of P1003.2b, was made available and contains only minor changes to references in the above section. 2. Title: localedef user-specified collation weight names [Open] Keywords: localedef, collation, weight, LC_COLLATE Description: A mechanism for the specification of named collation weights in the LC_COLLATE section of locales, particularly to support non-latin character sets where sorting requirements are more extensive and complex. Originator: J Alternatives: None Documents: N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N330 Japanese comments on Posix .2b/D4 RIN N106 Japanese Proposal to POSIX 1003.2b Resolution: None as yet. The proposal has been accepted in principle. The US development body has asked for specific wording to be supplied by Japan for inclusion in a revision to the standard. Status: Awaiting input from the Japanese MB to 9945-2 Amd 2b. History: From WG15 Hamilton, May 1992: N245, the comments on CD 9945-2, and N281, the disposition of those comments, contained the Japanese MB objection relating to collation weight names; a similar later version (below) was recorded at the WG15 Reading meeting. The proposed disposition of is contained in N281 as: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. From WG15 Reading, October 1992: N330 contained the Japanese MB comments on POSIX.2b D4; they included: Sect 2.5.2.2.3 (LC_COLLATE) PROPOSAL Problem: In most cases of ideographic characters, it is a requirement that a user be able to specify collation weights as he/she wants. In case of Japanese characters (Kanji), for example, there are five possible collation weights for supporting Japanese SORT. The five weights are On-yomi (psuedo-Chinese pronunciation), Kun-yomi (Japanese pronunciation, number of strokes, radical (components of Kanji), and Kanji character code. There could be more weights. The LC_COLLATE part of localedef specifications should allow a user to describe these weights and give names to the weights. Any combinations of the defined weights should be able to be specified by the user at run-time. Proposal: LC_COLLATE extension for specifying weight name => 2.5.2.2.3 order start Keyword. Add the following directive description and the Example. It is implementation defined whether the following optional directive shall be recognised. If they are not supported, but present in a localedef source, they shall be ignored. name specifies the name of a collation weight by a string. An order of weights may be specified by using the name at run time. The syntax for the name directive shall be: "name = Example: order_start forward,name="kunyomi";forward,name="radical" If an operand has a name directive, the definition of the primary, secondary, or subsequent weights for the collation element may be different from the order of operands to the order_start keyword. => 2.5.3.2 Locale Grammar. Modify the opt_word description as follows: opt_word : 'forward' | 'backward' | 'position' | 'name' '=' weight_name weight_name : '"' char_list '"' Rationale: User's requirements for character collation in Asia are diverse. Ideographic characters have several rules to sort such as by pronunciations, strokes, etc. and the combination of the rules are used for their sorting. Those properties for a charcter such as pronunciation can be assigned as weights for a character element. However, no standard primary weight, secondary weight and so on exists for the weights (properties). The weight name extension for LC_COLLATE allows the order of multiple weights to be defined at run time in the different order than the order than the order of operands to order_start keyword. To make the different order effective, the weight names can be specified in the setting of LC_COLLATE category. order_start forward,name="kunyomi";forward,name="radical" When a ja_JP.eucJP locale has the above definition in the LC_COLLATE part, the order of sorting rules can be specified as follows by using the weight names: LC_COLLATE = ja_JP.eucJP@weights=radical,kunyomi This means that the sort-rule "radical" is used as the primary weight and "kunyomi" is used as the secondary weight. From WG15 Heidelberg, May 1993: RESOLUTION 93-230 Collation Weights Whereas ISO/IEC DIS 9945-2, Utility Limit Minimum Value, Table 2-17, specifies that the maximum number of weights that can be assigned to an entry of the LC_COLLATE order keyword in the locale definition file is 2, and Whereas the value of 2 is insufficient to process natural language collation sequences, Therefore SC22/WG15 instructs the Project Editor to notify its development body that the collation weight is dependent on the language of the country and that Canada requires a minimum weight of 7. From WG15 RIN Heidelberg, May 1993: 3.1.3 user-specified collation weight names based upon phonetic, character based(radical), or code based. Dynamic based control of collation based upon sort key. The ability to switch pointer dynamically to bring collation tables into correct sequence. Japanese delegation has submitted two written requests without supporting material.[?] Next version would be submitted by June 18, 1993. From WG15 RIN Annapolis, October 1993: Action Item reports: The action list was lost. The minutes of the previous meeting were scanned to recover as many action items as possible; these were determined to be as follows: 9305-01 Requirement for user-specified collation weights. MDR-02 contains the Japanese proposal on collation weights. (Closed) MDR-02 -> RIN N106: Japanese Proposal to POSIX 1003.2b 3.1 I18N in POSIX.2b Specific actions were taken in Annex H to address Denmark and Japanese concerns for May 93 Heidelberg meeting. Japan needs feedback for timeline to produce material for coordination with 1003.2b Resolution to be produced asking for timeline for national body contributions. The rest of 3.1 [including N106] was postponed to the next meeting, due to lack of knowledge of the current status of .2b and lack of input papers received in time. 9310-09 Lead Rapporteur: distribute documents N105, N106, N109 and N113 to the RIN mailing list together with a cover note indicating that these documents will be discussed at the next WG15 RIN meeting, May 1994, and also indicating which agenda items will be touched by the documents. From WG15 RIN Vancouver, October 1994: 9405-05 Member Bodies to review N105 (Japanese comments on .1a), N106 (Japanese comments on .2b), N109 (SC22/WG20 guidelines for the use of extended identifiers in programming languages), N113 (CEN standard for string ordering) for determination of appropriate action prior to Oct. Meeting 10/94: OPEN: Prof. Saito noted they are preparing a Japanese standard for character ordering. The above action item was carried through from May 1994 to the May 1995 meeting. From WG15 Twente, May 1995: 9410-03 Project Editor: Notify the development body of collation weight requirements (resolution 93-230, open action item 9305-60, 9310-23, 9405-12) (Closed: has become 9505-02) 9505-02 Canada - Provide collation weight question to the US again. From WG15 RIN Twente, May 1995: 3.1.3 localedef user-specified collation weight names--Japan making proposal for Annex H--removed to issues list From 9945-2:1993 Annex H.1: | (4) The LC_COLLATE (2.5.2.2) locale definition should be enhanced to | allow user-specified names for collation weights. A proposal | from Japan is expected in this area. | | This text has been removed from P1003.2b Draft 11, May 1995. 3. Title: localedef "substitute" [Closed] Keywords: locale, localedef, substitute, LC_COLLATE Description: The "substitute" statement in LC_COLLATE is needed for describing higher levels of Danish Standard DS 377 sorting, and should be re-introduced. Originator: Dk Alternatives: None identified. Documents: (WG15RIN.136) substitute in LC_COLLATE (WG15RIN.246) substitute N170r WG15 RIN N036: Minutes & resolutions, Rotterdam, May 1991 N213 WG15 RIN N046: Japanese national profile for POSIX: Vn 1.2 N215 WG15 RIN N051, N052: RIN Minutes and resolutions, November 1991 N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N323r WG15 RIN N096: Minutes & resolutions, Reading, October 1992 N370 RIN N103: RIN Minutes from Heidelberg, 10-11 May 1993 Resolution: Substitute is requested only by Denmark; other potentially interested MBs - Canada and Japan - have indicated that they do not require the substitute feature. It is suggested that further discussion of substitute in RIN is likely only to revisit old argument, therefore it is recommended that Denmark make a final submission of requirement for consideration by WG15 and the US development body, accompanied by this section of the RIN Issues list. Status: The Issue in RIN has been revisited many times without concensus being reached. WG15 should be invited to make an executive decision, based on this section of the RIN Issues list. History: From WG15 RIN Rotterdam, May 1991: N170r noted a debate on substitute: 3.2.2. localedef ... A particular problem is the substitute command, and its use of regular expressions. It has been suggested that string-for- string substitution would be adequate; however, the CSA -- and, by implication, most western -- collation standards cannot be met without regular expressions. Given rationale that regexps are not necessary for practical national collation sequences, Greger Leijonhufvud would be happy to drop them. [Actions 9105-08 and 9105-20 were devised to check if Japan and Canada needed 'substitute'] From WG15 Stockholm, November 1991: RIN9105-8 Erik van der Poel: Determine whether substitute is necessary to implement Japanese collation. Closed. The substitute operation is not required -- see RIN N046. RIN9105-20 Patric Dempster: Clarify, through discussion with Alain LaBonte, whether the CSA ordering standard requires the substitute operation. Closed. The substitute operation is not required. From WG15 Hamilton, May 1992: N245 included a number of Danish MB comments on the 2nd CD of 9945-2. Item 3 of the Danish comments was the request to re-introduce the "substitute" facility. N281, the Disposition of Comments, proposed the following: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because there currently exists no firm consensus on its necessity within the US or international communities. An informative statement concrning future directions for 'substitute' will be included. From WG15 RIN Reading, October 1992: (WG15RIN.246) substitute: From: keld@dkuug.dk Substitute specification in the LC_COLLATE section of localedef DS proposes to use the wording contained in ISO/IEC 9945-2 DIS annex G. 3.1.4 12. The use of 'substitute' in collation was suggested. A review of the history of this shows that this gives recursive definitions between the locale and regular expressions - which cannot in general be shown to be finite. DIN 5007 and the Canadian standard on sorting do not use this, but the highest level of the Danish sorting standard (DS377) does. 13. The Danish national body is to produce a paper before the next meeting on its perceived need for the use of substitution in the collating order category of a locale vis-a-vis DS377 and in particular the level at which that appears to be necessary (RIN AI 9210-01) From WG15 RIN Heidelberg, May 1993: 2.0 Action Item Reports: 9210-01 Defer discussion [to 3.1.4] [The minutes do not record a paper responding to 9210-01] 3.1.4 Canada has trouble with nested substitute routines which allows no character control within application. From WG15 Twente, May 1995: Denmark: One thing has not been provided - text for "substitute" facility, from an old draft of .2. Denmark believes that US has text in its archives. From 9945-2:1993 Annex H.1: | 10: 2.5.2.2 LC_COLLATE | (5) The collation substitute facility, removed from 2.5.2.2 in an | early draft, should be restored. | | This text has been removed from P1003.2b Draft 11, May 1995. 4. Title: localedef "reorder-after" [Closed] Keywords: locale, reorder-after, replace_after Description: A mechanism for building on the collation sequence constructed for one locale by allowing the specification of a set or sets of differences in the construction of other, similar collation sequences for other locales. Originator: Dk Alternatives: reorder_after was substituted for replace_after in 'mid 1992. Documents: RIN N035 Proposal for building on other locales (replace_after) RIN N092 Danish note on reorder_after and replace_after RIN N127 Procedures for European Registration of Cultural Elements, CEN draft 5 N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N391 DIS 9945-2 Disposition of Comments ballot Resolution: WG15 RIN resolved at its October 1992 Reading meeting NOT to proceed with either replace_after or reorder_after. Status: This Issue is dead. Its corpse is exhumed periodically. WG15 has not yet been advised by RIN that 'reorder-after' is not required. History: From WG15 Stockholm, November 1991: d. RIN N035, Proposal for building on other locales (replace after) >> Consensus in RIN was that functionality of "replace after" should be explored (Canada volunteered to do some prototyping) >> Denmark should include proposal as part of their ballot comments. COPY statement exists in .2.2 but may work on binary data only (e.g. contents of locale after compilation) Canada had no technical objections to exploring functionality but was concerned about affect on existing consensus if a change is made at a late point in balloting, and potential effect on portability. Denmark position not final but is seeking consensus on issue; if consensus is to explore inclusion in later extension of standard, that would be OK. From WG15 Hamilton, May 1992: N245 included Danish MB comments on CD 9945-2: 9. ...collating sequences vary a bit from country to country, but generally much of the collating sequence is the same. For instance the Danish sequence is quite equal to the German, English or French, but for about a dozen letters it differs. The same can be said for Swedish or Spanish; generally the collating sequence is the same, but a few characters are collated differently. With the advent of the quite general coded character set independent locales like the example Danish in POSIX.2 Draft 11 annex F, it would be convenient if the few differences could be specified just as changes to an existing one. This would also improve the overview of what the changes really are. Therefore DS propose the following. For the LC_COLLATE definition, a new command is allowed: replace_after ... ... ... replace_after ... ... replace_end This construct is allowed also when a "copy" statement has been given. More than one replace_after / replace_end construct can be given. The ... are removed from the current collating sequence and inserted after in the collating sequence. For this to work the "copy" statement should be allowed to be used together with other statemants in the LC_COLLATE section ... The replace-after proposal can be included in the Annex F, where its use is demonstrated. Then the specification can be moved to the normative part of 9945-2 in a later issue. N281 contained the response to this proposal: We believe that this change, or something similar to accomplish the same objective, should be studied for inclusion in the POSIX.2b revision and the full international standard. It should be deferred because there currently exists no firm consensus on its necessity within the US or international communities. The response goes on to indicate that the original concept of the "copy" statement was to duplicate an actual object description - the source text may not exist on the current system - and therefore replace-after would require the locale be 'de-compiled'. From WG15 RIN Reading, October 1992: RIN N092 renamed 'replace_...' to 'reorder_...' and proposed: The following section is inserted in the description of LC_COLLATE keywords in POSIX.2 D11.3 section 2.5.5.2. 2.5.2.2.6 'reorder_after' keyword The 'reorder_after' keyword specifies a starting point for reordering collating elements. It is followed by one or more collation reorder statements, reassigning character collation weights to collating elements. The syntax is: "reorder_after %s\n", 2.5.2.2.6 Collation Reordering Each 'reorder_after' statement shall be followed by one or more collation element reordering entries. The definition of collation element reordering entries are equivalent to the collating element entries in 2.5.2.2.4, specifying collation elements and associated weights. The collating element reordring entries are terminated by a 'reorder_after' keyword or a 'reorder_end' keyword. Each collation element specified via a collation element reordering entry is removed from the current collating sequence, if present, and inserted in the collating sequence after the previous reordering collation elements. The collating element specified on the previous 'reorder_after' statement specifies the first reordering collation element. The last reordering collation element is followed by the follower to the collation element specified on the 'replace-after' statement. Example: order_start order_end reorder_after reorder_after reorder_end The resulting order is then: 2.5.2.2.8 'reorder_end' keyword The collating reorder entries shall be terminated with a 'reorder_end' keyword. WG15 RIN minuted the following: 3.1.5 18. Discussion of RTN014 [RIN N092] resulted in a decision not to proceed with either 'reorder_after' or 'replace_after' mechanism in locale ordering. ...the debate was however pursued through both the Heidelberg and Annapolis meetings through a series of WG15 action items: 9205-31, 9210-10, 9305-06 - RIN needs to advise WG15 of its decision at Reading. From WG15 Heidelberg, May 1993: 5.2.1 (JTC1 22.21.02.01) Shell and Utilities base {2} DIS The DIS ballot on 9945-2 closes June 6, 1993. Comments and negative ballots are expected. Member Bodies are requested to send electronic copies of ballot comments to the Project Editor (hlj@posix.com). The Project Editor will prepare a preliminary Disposition of comments and circulate this to WG15 in July, 1993. The US will host an Editor's Meeting in conjunction with the October, 1993 WG15 meeting (see open action items 9305-41 and 9305-42). N391 presented the Disposition of Comments on DIS 9945-2: they included - 5. Other. The following comments will result in no changes to the IS, for the reasons indicated: ... Denmark 4: The concept of "binary" or "compiled" locales has been quite popular among implementors of the standard and no attempt has been made to mandate interfaces that would make such implementations non-conforming. The "localedef copy" and "replace-after" modifications proposed here would make binary locales extremely difficult to support. Furthermore, they are merely alternatives to existing, standard UNIX (tm) text-file manipulation tools. Since these modifications have received little support in WG15/RIN after repeated discussions, and none from the US development body or any known implementors, they should not be required. From 9945-2:1993 Annex H.1: | (6) A facility should be added to allow simple modifications to | existing locale collation definitions. A proposal for such a | replace_after keyword in LC_COLLATE is being developed by | Denmark. | | This text has been removed from P1003.2b Draft 11, May 1995. 5. Title: removal of NUL special handling [Closed] Keywords: NUL, character, byte Description: Clarification of the form of NUL, to address the problems of null bytes (an eight-bit sequence with all the bits set to zero) appearing in multibyte character strings and appearing to be string terminators to C language library routines. Originator: Dk, J Alternatives: None Documents: N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N294 P1003.2b D4 (Shell & Utilities Amd) Resolution: NUL: A character with all bits set to zero, which is defined as in the character set description file. The NUL character shall compare lower than any other character. Status: The resolution was reached in 1992. Debate recurs regularly but does not add weight to the proposal. History: From WG15 Hamilton, May 1992: N245 contained the Danish MB comments on 9945-2, including: 11. Page 78 line 2212-2213, 2215, page 55 line 1249-1250: We see no need for a specific encoding and collating order for a character NUL, and we request that this be removed. The current specifications make the POSIX specification character-encoding dependent, and make unnecessary constraints on this character when collating. N281 contained the following disposition: This will be considered as part of the P1003.2b revision. NUL is the only special character, and that is because it has a special meaning in POSIX: it cannot be included in text files, and it is used to delimit strings in C. Its value is required by ISO/IEC 9899, on which most POSIX.2 implementations will be based. Consequently, it IS special (see also regular expressions). Most of the utilities using the collation definition are processing text strings; certainly neither strxfrm() or strcoll() can handle nulls except as string terminators. Making NUL the lowest character makes the end-of-string processing simpler and in line with the standards POSIX sorting rules (shorter string sorts before longer). Also leading ellipsis doesn't work if NUL isn't first. N245 contained the Japanese MB comments on 9945-2, including: Sect 2.2.2.91 (NUL) OBJECTION. page 37, line 647: Problem: "NUL: A character with all bits set to zero" is ambiguous, since by the POSIX definition "a character" means "a multibyte character" in general. It is unclear that the phrase "with all bits .. zero" this definition specifies a single byte null character, a multibyte null character (in generic), or both/neither (regardless of number of bits). Action: If it implies a single byte null character, change to: "NUL: a single byte character with all CHAR_BIT set to zero." If it specifies a unique null characters regardless of number of bits in the POSIX environment, change to: "NUL: A character with all bits set to zero, which is defined as in the character set description file." N281 contained the response to this proposal: It is the second choice. We added a forward pointer to 2.4 in 2.2.2.91, where the requirements for NUL are already listed. From WG15 Reading, October 1992: N294, the Shell & Utilities Amendment, Draft 4 contained the following entry: => 2.5.2.2.4 Collation Sequence. Remove the following sentence from the second paragraph: The NUL character shall compare lower than any other character. Rationale: This change partially satisfies the following requirement from ISO/IEC DIS 9945-2:1992 Annex H: (7) The specific encoding and collation requirements for the character NUL should be removed. The specific encoding was retained because the C Standard {7} requires it. From WG15 RIN Reading, October 1992: 3.1.6 19. It was reported that the requirement for NUL to be handled separately had been dropped. It was suggested that NUL would be defined as in ISO 6429:1988 for all possible character sets. This is to be checked. 921003 The Danish national body is to provide a proposal for a definition of NUL to this group and to the US development body for consideration at its January meeting (Minute 20). From WG15 RIN Heidelberg, May 1993: The RIN Lead Rapporteur was unable to attend. There was no input on the above action item. From WG15 RIN Annapolis, October 1993: The action list was lost. The minutes of the previous meeting [Heidelberg] were scanned to recover as many action items as possible. The action item on NUL was not amongst them. 6. Title: full support for state-dependent charsets [Open] Keywords: charmap, character, encoding, shift-state, state- dependent, stateful Description: A mechanism to allow otherwise-identical byte values to be interpreted as different characters by preceding them by implementation-defined escape sequences. The escape sequence forces a change of state, and thus a different interpretation of: . a subsequent byte (single-shift encoding) or . subsequent bytes (locking-shift encoding). In the latter case, a further escape sequence is necessary to force further state-changes. Originator: J Alternatives: None Documents: N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N330 Japanese comments on Posix .2b/D4 N362 Japan action item report N365 US Action Item Report N436 Japanese action item response for October 1993 Resolution: This is agreed to be necessary. The US Development body is expecting input from the Japanese MB to extend the charmap syntax. Status: Open, awaiting comment on Japanese MB proposals in N436. | It appears that this has fallen through the cracks. | There is no further reference to indicate action on this | paper. | It is recommended that RIN request WG15 to action all | MBs to review and comment on N436 with a view to | forwarding a final version to the US development body | for inclusion in 1003.2b History: From WG15 Hamilton, May 1992: N245 and N281 (Disposition of comments on CD 9945-2 in N245) were considered by WG15 Hamilton. They contained: | Sect B.5 (regcomp() family) OBJECTION. page 788, | line 618: | Problem: | | The functions regcomp() and regexec() should have wchar_t | version interface because of the following reasons: | | (1) To use regcomp() and regexec() functions in a program which | handles its internal character data in wchar_t data type, | for example a text editor, it should do the following process: | | 1. convert internal text data from wchar_t | array to char array. | | 2. search pattern using regexec(). | | The conversion should be done every time the program | searches a pattern, for each line. It is too heavy overhead | to such programs and it will make wchar_t based programming | too hard. If wchar_t version of regcomp()/regexec() functions | are provided, no wchar_t-to-char conversion is needed. | | (2) If regexec() is used on a system which uses state-dependent | encoding, the following problem should occur. | | When the function regexec() is called with REG_NOSUB flag in | the cflags argument is not set, and when a match is found, | the function returns matched position in pmatch argument. | | If state-dependent encoding is used, this pmatch information | may be useless because it sometimes will not returns state | information. | | For example, suppose we are using a state-dependent | encoding, which has two shift state and switches initial | shift state to another shift state by SO (Shift Out) code | and return from another shift state to initial shift state | by SI (Shift In) code. | | If searched pattern is: | #define SO 0x0e | #define SI 0x0f | | char *pattern = { SO, 'X', 'Y', 'Z', SI, ' ' }; | | and the string is: | | char *string = { SO, 'A', 'B', 'C', 'X', 'Y', 'Z', 'U', SI, ' ' }; | | the regexec() function will return pmatch information which | says: | | pmatch[0].rm_so = 4 (start of matched string) | pmatch[0].rm_eo = 7 (end of matched string) | pmatch[1].rm_so = -1 | pmatch[1].rm_eo = -1 | | But in this case, naive program will treated the matched | string as | | { 'X', 'Y', 'Z' } | | in INITIAL SHIFT STATE, not in ANOTHER SHIFT STATE, because | returned string position information does not contains any | state information. | | Action: | | Define wchar_t version of regcomp(), regexec() functions, | which takes (wchar_t *) type string argument, not (char *) | type. Because wchar_t string has no state dependent | information, this problem does not happen. | | It is also useful for programs which treats all character/string | information in wchar_t type, instead of char type. | _______________________________________________________________ | RESOLUTION: | We believe that this subject should be studied for inclusion in | the POSIX.2b revision and the full international standard. See | resolution ITSCJ.3. From WG15 RIN Reading, October 1992: 3.1 7. H Jesperson reported on the WG15 9945-2 ad hoc meetings in Utrecht as follows:- a. State-dependent encoding was discussed and it was agreed that individual utility options should not handle the problem. 3.1.7 21. A review of Uniforum and X/Open documents on state- dependent text encodings has led the Japanese C-language group to develop a minimal set of functions for their manipulation. The whole matter of state-dependent encoding is agreed to be necessary, but the question of exactly what needs to be included is left for later consideration and further discussion. From WG15 Reading, October 1992: SC22/WG14 working on an amendment for C, Derek Jones is the Project Editor. It is also looking at locale specifications. Japan pointed out that concern has been voiced in RIN about "stateful" encoding. The SC22/WG14 Multibyte Support Extension will introduce this into standard. The issue should be reviewed carefully. The Japanese proposed MSE does not support stateful encoding., however is being changed to introduce 6 new functions to support this. It is possible that there could be a mis-match between POSIX and WG14 directions on stateful encoding. N330, Japanese MB comments on POSIX.2b Draft 4, contained three references to state-dependent encoding problems: | Sect 2.4.x (State-dependent encoding) DISCUSSION. | | Discussion: | | [Background] | ISO CD POSIX.2/D11.2 Ballot resolution on shift (state-dependent) | encoding issues raised by ITSCJ (Japan) chose the option (c) | among the following candidates: | | (a) State-dependent encoding is out of scope. | (b) State-dependent encoding is allowed, but it is a | feature of implementation defined. | (c) To support state-dependent encoding is one of the | issues, and it would be considered in the future draft. | | [Goal of POSIX.2b] | ISO DIS POSIX.2/D12 Annex H says: | | (8) The support of state-dependent character encoding (*) | should be addressed fully. | [*: Original text of POSIX.2/D11 Annex H uses "state- | dependent character sets". However, it is not an | appropriate expression.] | | [Current status of POSIX.2b/D4] | As the first cut, it keeps space holders for | (a) 2.4 Character Set section | (b) 2.5 Locale section | (c) 2.8 Regular Expression Notation section | (d) 4-5 several utilities sections | | [What are must] | (1) give a definition of "state-dependent encoding" or | "state-dependent encoded character set" | (2) give a clear scope of POSIX(.2) on what kind of state- | dependent encodings shall/should/may be supported. | (3) give specification on how to define a state-dependent | encoding in charmap file and/or locale | (4) give specification on how to handle state-dependent | encodings (by what utilities/functions) | | Sect Global (State-dependent encoding) OBJECTION. | | Problem: | | State-dependent encoding features are generic over almost all the | string/character handling functions and utilities. For example, | the following operations are very sensitive. They have to keep | track of "state" transition. | | - string/character search | - substring/character manipulations (add/delete/modify/ | insert/...) | | However, the current POSIX.2b/D4 picked up several utilities for | enhancement of stateful-dependent encoding support. Since the | Japanese Ballot Comments on POSIX.2/D11.2 in terms of state-dependent | encoding issues may not cover all the utilities that would be effected | by state-dependent support, the POSIX.2b/D4 may mislead that other | utilities have no problems on state-dependent encoding support. | | Action: | | In stead of addressing state-dependent encoding support in each | potential utility section (except specific requirements for a | specific utility), create a new subsection in Section 2 to describe | global issues and generic requirements regarding state-dependent | encoding support. | | In particular, list up all the possible character/string processing | operations which shall be carefully done in state-dependent | encoding environments and specify desirable/requested result of | such operations. | | Sect 2.4.x (state-dependent encoding) DISCUSSION. | | Discussion: | | [ Support of State-dependent Encoding ] | | | Charmap cannot describe character sets encoded by stateful encoding | schemes well because, in a stateful encoding, there is no one-to- | one correspondence between octet values and characters, and the | same sequence of bytes represent different characters according | to the state that is changed by locking shift escape sequences. | | It is possible to write a charmap for such characters by placing | locking shift to the both sides of character, where the second | locking | | shift specifies the default state. Although this virtually makes a | state-dependent coding stateless, it is not the common practice | as it uses a lot of extra bytes. | | Single shift is an exception. This form of shift is used to change | the state temporarily for interpreting a character that immediately | follows it. In other words, every character in a character set | invoked by a single shift has that single shift preceding it. | Therefore, in charmap, it can be treated as a part of multibyte | characters. Unfortunately, single shifts are by far the less used | than the locking shifts. | | Besides their description in charmap, the support of state-dependent | character sets poses the following problems: | (1) In searching or comparing statefully encoded strings, | byte-par-byte comparison does not always yield valid results. | It is allowed to insert locking shifts at arbitrary character | boundaries even if they are redundant. | (2) In dividing, truncating or making substrings of statefully | encoded strings, simply returning part of them can produce | strange results because they do not contain preceding and/or | following locking shifts. | (3) Concatenated strings may have redundant locking shifts which | causes the comparison problem mentioned above. | | In order to alleviate these difficulties, an implementation that | supports state-dependent character sets shall: | (1) process the statefully encoded strings as a concatenation of | state-independent character. | (2) insert (if necessary) locking shifts at the beginning and at | the end of substring to retain correct state information when | extracting substrings of a string. | (3) eliminate redundant locking shifts whenever possible. | WG15 Plenary produced the following action items: 9210-22: Member Bodies: Review WG15/N330 and provide feedback through their RIN rapporteurs. 9210-23: Member Bodies: Bring the issues of stateful encoding within the new WG14 activities to the attention of their national experts, with special care given to issues that may conflict with 9945-2. From WG15 Heidelberg, May 1993: The 9210-22 action was noted as CLOSED: the referenced documents [N362, N365] (US and Japanese AI reports) contain no substantive argument. The 9210-23 action item was noted as Open and redesignated 9305-10: the assignee was changed to Japan: see [N362, N365] From WG15 Annapolis, October 1993: 9305-10 was flagged as Complete at Annapolis. N436, the Japanese MB report to WG15, included an attachment on State- Dependent Encoding Support in POSIX.2: RATIONALE: State-dependent encoding is widely used in Japan and other countries for data communication and data processing. There are several examples: - When using terminals with a terminal server that do not allow 8-bit non-parity transmission, Japanese characters are transmitted to/from terminal with 7-bit stateful encoding. If the host is using 8-bit non-stateful encoding, which is very common situation, code conversion is done within the terminal driver. - For the Internet mail and news message transmission, 7-bit stateful encodings are used in Japan, Korea and Taiwan, because the underlying message transmission protocol, SMTP, does not allow 8-bit transmission (See RFC 821 and RFC 822). For detailed description of the encoding used in Japan, see RFC 1468. - On IBM-compatible mainframes using EBCDIC-based encodings, stateful encodings are used to process multibyte characters. This is true not only in Japan, but in Taiwan, Korea and mainland China. But in the current description of the POSIX standards does not fully address the support of state-dependent encodings, as written in the "2.4 Character Set" section of POSIX.2 (Page 61 in DIS 9945-2). Not to prohibit implementing POSIX interfaces on the systems that use state-dependent encodings, some description for state- dependent encoding is necessary. Please note that our intention is not to mandate the support of state-dependent encodings on all POSIX-conforming systems, but just to allow state-dependent encodings as an optional feature. THE CURRENT DISCUSSIONS IN JAPAN: (charmap syntax extension) Currently one proposal to extend charmap syntax to allow definition of state-dependent encodings is proposed. It is very raw idea and not fully agreed one, so some feasibility study is needed to complete the proposal. The idea is to introduce "shift state declaration" syntax in the charmap file. A shift state declaration declares the "shift sequence" (one or more bytes which indicate the change of shift states) to switch into the shift state. If a shift state declaration is appeared, the character set mapping definitions following the definition defines characters in that shift state. The proposed syntax for shift state declaration is as follows: " %s %s\n", , , where: Indicates shift state number (0, 1, 2...). shall be the initial shift state. Indicates shift sequence. The syntax of shift sequence is the same as that of part of character set mapping definition. Indicates comments. 7. Title: charmap-based charset conversion [Open/Closed] Keywords: charmap, iconv, code-set, locale, character Description: Originator: WG20, Dk Alternatives: Documents: RIN N111 WG20 NP on Cultural Convention-Set Registry RIN N112 WG20: Subdivision for cultural convention specification standard RIN N113 CEN: Information Technology-European Multilingual Ordering N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 N284 WG15 minutes, Hamilton, May 1992 N294 P1003.2b D4 (Shell & Utilities Amd) N330 Japanese comments on Posix .2b/D4 N444 CEN cultural elements registry N462 Ca: Proposal for inclusion of CHARIDS in next amd 9945-2 N515 US Action Item Report. Resolution: Status: History: From WG15 Stockholm, November 1991: New DS Issues: 3. Want command to convert between code sets based on charmaps Keld has indicated that DS has done this. The DS solution, however, is not known to the other members of the small group. >> KS to submit proposal. That proposal should be reviewed by RIN, with coordination with the IEEE working group, with the potential of being included in P1003.2b Ultimate solution should align, where possible, with technology of XPG4 iconv [I could find no record of an appropriately-titled document to either RIN or WG15 in reponse to this] From WG15 RIN Stockholm, November 1991: 4.11. Interface routines for locale and charmap Keld Simonsen introduced Danish suggestions for interface routines for locales and charmaps, adding that it was related to work in progress within X/Open. Donn Terry pointed out that, when a well-finished proposal corresponding was forthcoming, it should be accompanied by a statement justifying the requirement for such a facility. Given such justification, the facility appeared to him to be suitable as a component of a revision to 9945-1. From WG15 Hamilton, May 1992: N245 included Danish member body comments on 9945-2: 5. We miss a utility that can convert files based on charmaps or locales. The charmaps are the formal place to specify the character sets, and this information should be used also to convert files. As heterogeneous environments become more commonplace, viz. world-wide networking, and some frequent Danish letters occur in different positions in various character sets, there is much need for a specification for scripts and for user extensibility. We intend to have a proposal ready for a later issue of 9945-2, and we see a place for this in a revised "tr" utility. We would like a statement in 9945-2 that this is an area where work is to be done. N281 contained the response to this proposal: We have added a statement to the tr rationale. Such a statement of future intentions is limited by ISO rules to a footnote or informative annex. From WG15 Reading, October 1992: WG15 Plenary considered the responses to the following action item from Stockholm: 9205-09 Danish Member Body to prepare and submit a specific proposal regarding conversions between code sets (based on charmaps, or otherwise; proposal should give appropriate consideration to XPG iconv). (open action item 9111-20) Status: Done - proposal is included in P1003.2b. [I could find no appropriately-titled document to either RIN or WG15 describing the proposal] N294, the 1003.2b (Shell & Utilities Amd) Draft 4, was available at the meeting. The draft included a new iconv utility to convert codesets. N330, the Japanese MB comments on N294, included a number of objections to the iconv section: | Sect 4.73.3 (iconv) OBJECTION. page 72, line 2022: | | Problem: | | [iconv command option] | | The description of the "-f fromcode" option says that "If the | option-argument is the pathname of a readable file, iconv shall | attempt to use it as a charmap file, as defined in 2.4.1." This | semantics may cause unexpected results depending on the current | working directory, because if a file or a directory in the | current directory happens to be the same name of "fromcode" (or | "tocode"), iconv will treat the file as charmap file. This | behavior restricts users to use file name same as codeset name. | Because there are no standards for charmap file name, it will be | impossible to use iconv command in a portable manner. I think | there should be a mean for users to specify explicitly the | "fromcode" and "tocode" arguments to be used as charmap files. | | Action: | | There are three proposals for the modification of iconv | specification. | | (1) The first proposal is to add a new option, "-c", to specify | the "fromcode" and "tocode" option-arguments are charmap file | names. If "-c" option is not specified, iconv will treat | "fromcode" and "tocode" option-arguments as implementation- | defined codeset names. | | Change the description of "-f fromcode" option (lines 2021-2028) | to: | | -f fromcode | Identify the codeset of the input file. Valid values | for fromcode are specified in the system documentation. | If this option is omitted, the codeset of the current | locale shall be used. | | and add the following option description after the line 2030: | | -c Treat the fromcode and tocode option-arguments as the | names of charmap files. If the option-arguments are | the pathnames of readable files, iconv shall attempt to | use them as charmap files, as defined in 2.4.1. If the | readable file is not a valid charmap file, the results | are undefined. If the option-argument is not the | pathname of a readable file, the results are | implementation defined. | | (2) The second proposal is to add new set of options which | specify charmap file names. In this proposal, "-f fromcode" | option is always used to specify codeset name. To specify | charmap file, you must use "-F fromcharmap" option. | | Change the description of "-f fromcode" option (lines 2021-2028) | to: | | -f fromcode | Identify the codeset of the input file. Valid values | for fromcode are specified in the system documentation. | If this option is omitted, the codeset of the current | locale shall be used. | | and add the following option description after the line 2030: | | -F fromcharmap | Identify the codeset of the input file. If the option- | argument is the pathname of readable file, iconv shall | attempt to use them as charmap file, as defined in | 2.4.1. If the readable file is not a valid charmap | file, the results are undefined. If the option- | argument is not the pathname of a readable file, the | results are implementation defined. If this option is | omitted and -f fromcode option is not specified, the | codeset of the current locale shall be used. If both | of the -F fromcharmap and the -f fromcode options are | specified, the results are undefined. | | -T tocharmap | Identify the codeset of the output file. The semantics | are equivalent to the -F fromcharmap option. | | (3) The third proposal is to add a mechanism to identify fromcode | (or tocode) option-argument is charmap filename or not. In the | following description, if fromcode or tocode option-argument has | a character in it, it will be used as charmap file. | | Change the description of "-f fromcode" option (lines 2021-2028) | to: | | -f fromcode | Identify the codeset of the input file. If the option- | argument contains character in it and the | pathname of a readable file, iconv shall attempt to use | it as a charmap file, as defined in 2.4.1. If the | readable file is not a valid charmap file, the results | are unspecified. If the option-argument does not | contain character, the results are | implementation defined. If this option is omitted, the | codeset of the current locale shall be used. | | | Sect 4.73.5.3 (iconv) OBJECTION. page 73, line 2058: | | Problem: | | [LC_CTYPE environment variable description of iconv command] | | In the description of "-t tocode" option of iconv command, it | says that "The semantics are equivalent to the -f fromcode | option." and the last sentence of "-f fromcode" says "If this | option is omitted, the codeset of the current locale shall be | used." It means that if the "-f fromcode" option is specified | and the "-t tocode" option is omitted, the codeset of the current | locale is used as the output file's codeset. This behavior | should also be noted in the LC_CTYPE description. | | Action: | | Add the following sentence after the line 2058: | | If -t tocode option is omitted, this variable shall | determine the codeset of the output file. | From WG15 RIN Annapolis, October 1993: Mapping locales on to the underlying character set is problematic. There is the charmap approach, but there are misgivings that this is inelegant at best and inefficient in the case of large character sets, such as used by the Japanese. 9310-07 MBs are asked to consider the impact and problems associated with the support of locales by the charmap mechanism, and to consider the need for the establishment of a charmap registry. Responses to RIN Lead Rapporteur prior to the WG15 meeting, May 1994. 9310-08 Lead Rapporteur to report to WG15 that RIN is considering the need and possible alternatives for charmaps. RIN is looking for technical input on whether charmaps provide the best solution to the problem. RIN notes that CEN is currently constructing a charmap registry, , and that WG20 are also taking this approach - and refer. MDR-10 -> RIN N111 MDR-11 -> RIN N112 MDR-12 -> RIN N113 From WG15 Annapolis, October 1993: Plenary considered N515, the US action item report, which responded to AI 9405-56: 9405-56 United States: Forward N444 to PASC for possible inclusion 1003.2b and report back to WG15 on actions taken; reference WG15 resolution 94-283. (Closed) CLOSED...The US has identified two proposals for change to 9945-2 presented in N444. The first of these is the Charsymbmap proposal described in section 6.9. We beleive this proposal to be essentially the same as the Canadian CHARIDS proposal contained in N462. See the response to action item 9405-55. The second proposal is the "replace-after" proposal described in Annex A. The US believes this extension to be unnecessary as demonstrated in Annex A.4 of the same document. Denmark had problems with the US reponses here. This was discussed in WG15 Plenary as follows: 4.9.2 Charsymbmap (US report back on [N444]) [N515] Denmark believes they have consensus on this proposal now. Canada disagree. The US response in N515 to Action item 9405-56 states that they believe the proposed extension to be unnecessary, the functionality being provided by the CHARID proposal - see above. Germany noted that if CEN adopts the charsymbmap proposal then Europe would have two incompatible standards - Posix and charsymbmap. Denmark suggested that the WG15 review of 1003.2b D10 should resolve any outstanding issues. The Canadian (CHARID) solution addresses a smaller set of problems than the Danish (charsymbmap) proposal. It may be possible to resolve any shortfall in CHARIDs by suitable proposals to enhance it from the European members. 8. Title: "file" user-specified recognition algorithm [Closed] Keywords: file, utility, locale, file-types, LC_CTYPE Description: A proposal to extend the set of file types recognised by the "file" utility by adding a command-line parameter specifying a file containing descriptions of file types. Originator: Dk Alternatives: None Documents: N271 Dk: Danish comments on 9945-2 Amd 1 N282 Disposition of comments on CD 9945-2 Amd 1 Resolution: This proposal was accepted and will be added in the final standard. Status: Accepted and closed. History: From WG15 Stockholm, November 1991: N271 is the first relevant document on the subject of the "file" utility: Danish comments on 9945-2 Amd 1 Sect 5.14 OBJECTION, page 163 Problem: The specification of the FILE-utility is too small a subset of implementations normally seen. a. It should as a minimum be possible to extend the number of file-types recognised in a reliable (or unreliable way). We need something like the /etc/magic-filetype-specification. b. It should be possible to test, if a file is of type text according to the LC_CTYPE class printable. Action: 1. Add a fileformat-specifications. Use /etc/magic if nothing better is available. Could be an option like [-m file] 2. Add the ability to recognise (printable) text according to the locale. This may also be done with an option like -t or with a separate utility. _______________________________________________________________ RESOLUTION: 1. This will be considered for inclusion in POSIX.2b. 2. This will be added in the final standard. 9. Title: "pax" extended character set support [Open/Closed] Keywords: Description: Originator: Alternatives: Documents: (WG15RIN.185) pax -e comments N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N266 SC22/WG14 N197: Support for symbolic character names N281 Disposition of comments on CD 9945-2.2 Resolution: Status: History: From WG15 Stockholm, November 1991: The Danish MB's comments on CD 9945-2, quoted from N245 include: "7. We want the text for 'pax -e' (in previous drafts) to be included, as we need a better quasi-portable way of transporting such files. It may be included in Annex F. "It could be included in the normative part of the standard at a later stage, and we would like indications in the standard that an extended exchange format is being planned." The response, in N281, was: RESOLUTION: The text has been added to Annex G (the previous F). Statements about future plans are already in the draft (See D11.2 page 551 lines 9965-68 and page 558 lines 10251-65). From the plenary: New DS Issues: 1. Want pax -e in -2.2 Canada needs to have a meeting of their TAG to determine position. Will meet in December and advise Hal. (WG15RIN.185) pax -e comments: Keld, here are the pax -e objection texts. Hal The -e stuff is very complicated and there is a lack of standardized C language support to implement this feature. Trying to standard this at this point is a mistake. Why not place an optional record in one of the archive headers that states "this archive was created in the foobar locale" and leave it up to the recipient to handle the foobar locale. Even with -e the way it is stated, there is no guarantee that any locale but the portable one will be properly handled by recipients. ---------------------- Problem: I stated this once before -- it deserves repeating: The creeping proliferation of charmap is getting out of control. The charmap started out to be a simple and straight forward device to allow code set independent specifications of locale definitions. It is trying to generate a life of its own. It is this type of thing that causes those whose who do not have an appreciation for internationalization to oppose any and everything having to do with internationalization and characters and character sets beyond ASCII. I am strongly opposed to the -e option of the pax utility and the introduction of charmap where it should not be. The introduction of the -e option and charmap to the pax utility only serves to reduce consensus on POSIX.2. Action: Delete "[-e charmap]" from lines 9614, 9615, and 9617. Delete lines 9694-9713. Delete lines 10140-10170. ---------------------- Drop this whole mess. It's too new, I don't think that it's well thought out in the context of the full problem. The time to address this class of issue is when the new file format is addressed. When the full file format is addressed, this can be done in concert with controlling the format and having the ability to represent both very long file names and to indicate the character set in use. (The use of -e could cause distinct filenames to be truncated to the same name.) Asking for warnings when a name might not translate is OK with me. From WG15 Hamilton, May 1992: Keld's Proposal (N266): Danish proposal adding two functions was discussed. One function takes a code point and returns the symbolic character name. The other function takes a symbolic character name and returns the code point. Dk explained these would be used in the implementation of things like pax -e, and iconv(). Some discussions about the first record of the new pax format containing a character set name. There still needs to be a translation between code pages, that the symbolic name routines do not help with. Keld is concerned that industry groups are leaning towards the use of symbolic character names. Additionally, there are a number of Danish proposals in the pipeline which depend on this particular proposal. Donn Terry is still concerned with general portable applicability. Because the timing of iconv() and pax -e are still indeterminate and these routines are being proposed solely because of these, it was felt it is too soon. 9205-40 US Member Body: Forward the Danish proposal, N266 to IEEE POSIX.1 for their review. From WG15 Reading, October 1992: This action was noted as Complete at the start of the WG15 Reading meeting. 10. Title: C MSE widechar support [Open] Keywords: Description: Originator: Alternatives: Documents: RIN N105 Japanese Comments on POSIX.1a (MSE) RIN N106 Japanese Proposal to POSIX 1003.2b N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 Resolution: Status: History: From WG15 Stockholm, November 1991: a. SRTN8, Japanese concerns re CD 9945-2 - Japan would like to make this document visible to other countries - need to assign number although Japan plans to expand the document and deliver more detailed response before the end of the year. Japan needs to handle multiple char sets simultaneously, per ISO 2022; data files often contain various escape sequences which indicate which char set data follows; discussion of these requirements in relation to nature of LC_CTYPE: - Hal indicated that he did not feel that LC_CTYPE would prevent interpretation of command line args consistent with Japanese needs - item 3 on Pg2 of comments really deal with 9945-1 features? Japan has difficulty dealing with wide char data with traditional Lib C; would like to see wide char handling capabilities in .2 utilities, both for functionality and as an example of wide char handling for programmers. Japan is not sure whether it would be more appropriate to include wide char (ISO C/MSE) features in .1 or .2; .1a might be the appropriate place to include these extensions. (Although it might be feasible to include in the LIS spec, WG15 has told US body that LIS MUST be the same as the 1990 standard, thus no extensions could be included). Hal suggested that these comments be included in the Japanese ballot, so that they would be on record officially, and the US could deal with them as work on .1 AND .2 proceed. A Japanese "Yes" vote with this comment, creating a WG15 issue, would allow Hal to insist that extensions be included in .2b (and .1a) [N245 includes the Japanese MB comments on 9945-2, and details the Japanese MSE proposal]. From WG15 Reading, October 1992: SC22/WG14 working on an amendment for C, Derek Jones is the Project Editor. It is also looking at locale specifications. Japan pointed out that concern has been voiced in RIN about "stateful" encoding. The SC22/WG14 Multibyte Support Extension will introduce this into standard. The issue should be reviewed carefully. The Japanese proposed MSE does not support stateful encoding, however is being changed to introduce 6 new functions to support this. It is possible that there could be a mis-match between POSIX and WG14 directions on stateful encoding. WG15 Reading produced the following Resolution: RESOLUTION 92-223 9945 multibyte/wide character handling Whereas the current ISO/IEC 9945-1 (POSIX.1) does not support any APIs for multibyte/wide character handling that are defined by ISO/IEC 9899 (C Language), and Whereas the DIS 9945-2 (POSIX.2) does specify generic character handling features based upon a character definition that "a character means a sequence of one or more bytes representing a single symbol", and Whereas an amendment to ISO/IEC 9899 is scheduled in 1993, in which Multibyte character Support Extensions (MSE) are proposed to provide a set of functions for multibyte/wide character handling, aiming at improvement of worldwide portability of C programs that need generic character handling capabilities, and Whereas the CD 9945-2 Ballot Dispositions and the POSIX.2b Draft has indicated that certain extensions will be needed in conjunction with the proposed ISO C MSE and its derivatives in the POSIX environment, and an API part of which should be included in a future amendment to 9945-1, Therefore, SC22/WG15 requests that the US: 1. Consider the LIS and language-binding interface changes necessary to handle character-oriented features as symbol and not storage patterns for a future revision of 9945-1. 2. Inform SC22/WG15 of any plans for supporting such features in future revisions of all parts of the 9945 Standard. From WG15 RIN Annapolis, October 1993: RIN considered two papers submitted by Japan, touching on the MSE issue - N105, N106 From WG15 Annapolis, October 1993: 22.39 Extensions to base {1a} na Japan will be proposing the inclusion of the 'C' MSE amendment in the Posix series of standards. This is still under discussion in WG14. Flags have been raised within RIN that this will happen. From WG15 RIN Twente, May 1995: 3.1.11 C MSE widechar support --Japan will make a proposal--open 11. Title: Invariant ISO 646 support [Closed] Keywords: ISO 646inv, shell, awk, 9945-2, ISO 10646 Description: A proposal to permit the characterset defined by ISO 646 inv in the shell and the small languages supported by the POSIX Shell and Utilities standards. Originator: Dk Alternatives: a) No change b) Support ISO 10646 Documents: RIN N047 A representation for the shell in ISO 646 N323r WG15 RIN N096: Minutes & resolutions, Reading, October 1992 N416 Invariant ISO 646 support in Posix 9945-2 Resolution: RIN regards the issue as dead. WG15 and the US development body also regard the proposal as being rejected. Status: Issue in RIN is closed: the issue is now between Dk and WG15 who have invited Dk to supply further documentation to support their proposal. History: From WG15 Stockholm, November 1991: c. RIN SRTN7/N047, A representation for the shell in ISO 646 >> Proposal from Denmark relates to a long identified problem and an inconsistency with the recommendations of ISO TR10176 (programming languages should not use certain characters; note that TR10176 states that it may not be globally applicable, and seeks further input; 9945-2 may be a case in point) Issue that proposal addresses is the ability of using national characters within file names etc, without impact on shell interpretation (e.g. Danish "slashed-O" occupies the same space as the POSIX pipe symbol, thus file names cannot include a slashed-O without the shell interpreting that character as a pipe). Presentation of national characters on displays and printers is a separate issue. From WG15 RIN Reading, October 1992: 3.1.15 28. The Danish draft on invariant ISO 646 is seen as a rehash of the original trigraph proposals to digraphs. This should be approved by WG14 [!] before this issue may be re-opened in this group. Closed pending such approval. From WG15 Heidelberg, May 1993: 9305-04 Denmark: Expand and clarify proposal contained in RIN N047 regarding usage of national characters (as defined in ISO 646 national positions), giving consideration that such proposal: 1) addresses all aspects of proposed standard, rather than JUST the shell, (e.g. it should work with not only shell, but also regular expressions, awk, etc.) 2) should allow use of all features of the proposed standard maintaining conformance, (e.g. currently proposed use of " " would conflict with existing use) 3) should provide a general solution for similar requirements of other countries 4) should be sensitive to the cost/benefit ration of imposing the solution in relation to existing implementations (open action item 9111-25, 9205-11, 9210-4) From WG15 Annapolis, October 1993: The above action was noted as closed. From WG15 RIN Annapolis, October 1993: RIN AI 9305-05 Invariant ISO 646: Input required from Denmark. This action was noted as (Open) going into the RIN meeting - but was not present in the list of actions at the end of the meeting, possibly due to the appearance in WG15 of: N416 Invariant ISO 646 support in Posix 9945-2 22.41 additional utilities {2b} CD reg: [N416, N420] Proposed action on the US to take these on board. Nl accepts N420 proposal, but regards the N416 document as representing old technology superceded by ISO 10646. The original action was on Dk to provide these papers as additional information to the US. Done deal. N416 and N420 will be passed to the US for comment. From WG15 Tokyo, May 1994: 9405-52 United States: Review N416 and N420 and forward them to PASC for consideration. From WG15 Vancouver, October 1994: This action was flagged as (Closed) in the review of action items going into the WG15 Vancouver meeting; debate on the item was summarised as: 5.2.3 22.41 additional utilities {2b} CD reg: [N416,N420] Denmark is not happy with the response (not going to include extended characterset support because it would reduce consensus) to its request and would like to enter into a dialogue with the IEEE group responsible. Denmark is invited to offer further supportive argument. From 9945-2:1993 Annex H.1: | (2) The shell, awk, other small languages, and regular expressions | should be supported by national variants of ISO/IEC 646 {1}. A | proposal from Denmark is expected in this area. | | This text has been removed from P1003.2b Draft 11, May 1995. 12. Title: charsymb/CHARIDS [Open] Keywords: CHARIDS, charmap, locale, localedef, UCS, code-point, code-set Description: A mechanism to enable the automated production of a charmap file through the addition of a reference to a code-point in ISO 10646 for each symbol in the CHARID file. Originator: Ca Alternatives: charmap (?) Documents: RIN N127 Procedures for European Registration of Cultural Elements, CEN draft 5 N316 Canadian contribution to SC22/WG20 - Short character names N462 Ca: Proposal for inclusion of CHARIDS in next amd 9945-2 N515 US Action Item Report. N554 Ca Action Item Report N555 US Action Item Report (SC22WG15.498) Comments on WG15 Action Item 9410-24 (Canadien questions) Resolution: None as yet. Status: Open. The proposal to extend the charmap file to accomodate references to code points appears to have been accepted. Debate currently addresses how this is best achieved. History: From WG15 Tokyo, May 1994: Plenary considered N462, the Canadian MB contribution on CHARIDS: Introduction: As defined in the current text of iso/iec 9945-2 a locale definition file that uses mnemonic character naming cannot stand alone, but must be associated with a Charmap file that maps the mnemonic names to code points. This mapping is necessarily dependent on the character set in use. Therefore any locale definition requires: - the locale definition file; - at least one CHARMAP file; - for each CHARMAP file, a statement of what character set it corresponds to; Further there is no standardized machine-readable way of specifying the second and third items. As a result it is not possible to write a locale definition that is independent of implementation. ... Proposal: We are in the process of defining a Canadian Locale and we need to make this definition both unambiguous and implementation independent. We propose a "CHARIDS" file to address this deficiency. We feel that this is an international requirement and should be included as a normative amendment to ISO/IEC 9945-2. The "CHARIDS" file would be very similar to CHARMAPS. The only differences are that the file/header name is CHARIDS and that the character value operand is a reference to a code point in ISO 10646. This permits anyimplementation, given a way of mapping ISO 10646 to the desired character set, to produce a corresponding CHARMAP file, without human intervention. Note that the existence of a CHARIDS mechanism does not preclude the use of CHARMAP files as currently specified. Document ISO/IEC JTC1 SC22/WG15 N316 outlines an approach based on ISO 10646 that we feel staisfies the CHARID requirement. The header and trailer would be as follows: CHARIDS END CHARIDS Between these two statements the symbol definitions would look like "optional comment" where: is a symbol representing a character and used in the LOCALE definition: would be U (standing for UCS) followed by the hexadecimal coding value attributed to that character in iso/iec 10646 (4 hexadecimal digits); mapping of UCS coding to the actual code used by an environment would be implemented by this particular environment's designers/implementors/providers, based on this standard reference. It should be noted that X/Open already uses this approach although it is not standardized. Canada plans to use this syntax in its LOCALE definition. ... The discussion took place at agenda point 6.6: 6.6) CHARID (Canada) Reference N462 This is a better way of doing charmaps based on Canada's experience in this area. This document has been presented to WG20 who has accepted it. The CEN registry and X/Open is aligned with this proposal. Canada would like to give this to PASC for inclusion in 1003.2b. Resolution forwarded to the drafting committee to forward this to PASC. Action item 9405-55 on the United States to forward N462 to PASC for inclusion 1003.2b and report back to WG15 on actions taken. From WG15 RIN Vancouver, October 1994: 3.1.13 Charsymb/CHARIDs (N119, N127) There was discussion over conflicting proposals (conflicting to a minor extent) presented by Mr. Kriger and Mr. Simonsen. Mr. Kriger noted he believes the US-proposed changes will not be upwardly compatible. Mr. Simonsen explained why they would. Mr. Hill noted the US noted its response to SC22/WG15 action item 9405-55 is relevant. Mr. Hill noted the US expects substantive discussion of this item to take place in SC22/WG15. From WG15 Vancouver, October 1994: Action item 9405-55 was noted as complete. The US AI report, N515 refers: CLOSED...The US believes the proposal is not complete since it does not provide any way way to transform CHARIDS files into charmap files. Therefore there still isn't a way to create portable locale definitions. A couple of straightforward extensions to the localedef utility and the charmap files in 9945-2 will provide a portable way to define locales. We believe this is the intent of the Canadian proposal. The following list summarises changes the US proposes as an alternative solution to this problem: 1. Expand the legal values for the RHS of the charmap file to include UCS2 and UCS4 values. These values would be of the form and , respectively. 2. Add a -u option to localedef to indicate the target code-set to be used by the compiled locale. If the -u option is given then all the values of the forms and will be translated from those UCS2 and UCS4 values to corresponding code-points in the code-set specified by the -u option. 3. That implementations have localedef predefined mappings for the standard symbolic names for characters in the character set defined by 9945-2 Section 2.4. The US believes that these changes would allow application writers to build portable charmap and locale source definition files that could be used on any implementation providing the 9945-2 option that includes the localedef utility as long as the implementation recognised the target code-set for the compiled locale. The US intends to flesh out this proposal for inclusion in the next distributed draft for IEEE ballot of P1003.2b. The proposal was not received by the US in time for distribution to SC22/WG15 in Draft 10. If you have any comments, the US would appreciate receiving them in time for discussion at our January IEEE PASC meetings. The WG15 Plenary discussion on this was as follows: 4.9.1 Charid (US report back on [N462]) [N515] Canada raised a query on why the US response to 9405-55 in N515 offered the changes it did, and what the rationale for them was. The US could offer no immediate explanation, and offered to get a more detailed response, to be distributed by email. Canada to consider whether the changes have the effect required. The US had brought a number of copies of Draft 10 of 1003.2b, currently being distributed through the SC22 secretariat, which they invited comments on from WG15 MBs, preferably direct to the IEEE group. WG15 AI 9410-24 was created to require the US to provide Canada with the rationale. From WG15 Twente, May 1995: N555, the US report, included the following: 9410-24 United States: Distribute to the WG15 Email list the details on its proposal on CHARIDS, (see action item 9405-55) and US Response (SC22/WG15 N515) Response: CLOSED The resulting changes to P1003.2b will appear in Draft 11 of that document. Draft 11 was being prepared at the 4/95 PASC Meeting and is already approved for distribution as CD/PDAM Registration and Ballot. This was mailed to cpwg-mail@revcan.ca and SC22WG15 mailing list on 4/27/95. [As (SC22WG15.498)]: IEEE P1003.2 N269 April 26, 1995 SC22/WG15 US TAG N520 Topic: Response to SC22/WG15 Action Item A9410-24 From: Donald W. Cragun The questions submitted by Canada with our responses are below: 1) a) Could the US present the precise format of the proposed new charmap file? Draft 10.9 will be available from the US delegation at the Enschede meeting. Draft 11 will be distributed for concurrent registration and ballot soon. b) Specifically, could the US explain the relationship of the new proposed field to the portion of each line that is now considered "comment" or explanatory material? The proposal does not include a new field. If just allows two additional forms for specifying the part of the the existing forms. The portion of the lines between CHARMAP and END CHARMAP are not changed. c) Has the been used to delimit RHS comments (i.e. those comments that do not start at the beginning of the line)? Empty lines and lines starting with the are comments. The field can contain any characters (within the context of a line in a text file). Comments are separated from the by one or more characters. A could be used after the required as a convention to make the charmap files easier to read by humans, but are not required by the current standard or the proposed changes. 2) a) Could the US explain the need for the addition of a new parameter to the localedef utility? The new option (-u code_set_name), specifies the name of a code set to be used as the target mapping of character symbols and collating element symbols whose encodings are defined in terms of ISO 10646 position constant values. b) Would not a similar effect be achieved by manipulating the charmap with the standard text utilities and then using the existing localedef utility? None of the other standard utilities specified in 9945-2 (even the iconv utility in P1003.2b) is designed to translate from ISO 10646 16- or 32-bit values encoded as strings of the form or to octal, decimal, or hexadecimal encodings of the forms expected in charmap files by localedef. Scripts could be created using awk or sed to perform these translations manually, but the P1003.2 working group believes that implementations should be able to translate from 10646 to codesets supported by the implementation without manual assistance. 3) a) Could the US explain what is meant by "... have localedef predefine mappings for the standard symbolic names for characters in the character set defined by 9945-2 Section 2.4"? Canada is aware that 9945-2 specifies standard symbolic names for the characters referenced in Section 2.4. Canada's question relates to the "... localedef predefine mappings ...". Since the 10646 encodings for all of the characters in Table 2-4 in section 2.4 of 9945-2 are always the same, they need not be specified in charmap files that are encoded using the new formats; localedef will be required to supply the encoding information using the values specified in Table 2-4 implicitly. 13. Title: regexps [Open/Closed] Keywords: Description: Originator: Dk Alternatives: Documents: N170r WG15 RIN N036: Minutes & resolutions, Rotterdam, May 1991 N245 Summary of voting & comments on 2nd CD 9945-2: Shell & Utilities N281 Disposition of comments on CD 9945-2.2 Resolution: Status: History: From WG15 RIN Rotterdam, May 1991: 3.2.1.3. Regular expressions There was a serious error in the definition of longest leftmost match for regular expressions in the last draft of 1003.2. This will be fixed. The issue of when '$' and '^' are special in regular expressions is contentious. Some want 'ab$cd' to be allowed ('$' not special); others want it to be illegal (as it is in extended regular expressions). Traditionalists counter by saying that this would break too many existing scripts, and will probably win the day. RIN is happy with this situation. The result of the application of a regexp to a sequence of characters containing an embedded null is currently permitted; there has been an objection to this, as current practice in the C language and utilities written therein is that null is special. This suggests that the issue is language-dependent: RIN is in favour of putting language in the LIS which does not require that null is special, but allowing bindings to make it (or perhaps some other character) special if they wish. tr no longer knows about multi-character collating sequences, or, indeed, anything much relating to regular expressions. From WG15 Hamilton, May 1992: N245 included a number of Danish MB comments on the 2nd CD of 9945-2, including: 13. We are still not satisfied with the current regular expression syntax, but we have no better solution at present. N281, the Disposition of Comments, responded: No action proposed. From 9945-2:1993 Annex H.1: | (2) The shell, awk, other small languages, and regular expressions | should be supported by national variants of ISO/IEC 646 {1}. A | proposal from Denmark is expected in this area. | | This text has been removed from P1003.2b Draft 11, May 1995.