Dear ISO/POSIX and IEEE/POSIX.2 members, Although the SC22/WG15 9210 Draft Minutes which Martin Kirk is now delivering though email do not record my comments at the 9945-2 session (4.2), as I said at that time in conjunction with Hal's report on the POSIX.2b Ad Hoc Meeting in Utrecht, we Japanese Member Body would like to make our POSIX.2b/D4 comments visible to all WG15 member bodies and concerned experts. Also I was requested by Hal and Keld to send its online version to them or to the group(s). I'm therefore enclosing it for your ease of review and/or reference. Please note that (as Hal reported at the WG15 plenary) the Japanese comments were well discussed at the POSIX.2b Ad Hoc in Utrecht (not at RIN in Reading) and consequently that some controversial points were solved with IEEE dot2 experts consensus, some other discussions led to new ideas/proposals and some issues (like stateful encoding support) were left still open. As such, please make sure that based upon the Utrecht discussion, IEEE POSIX.2b development group (or its technical editor Hal) is going to draft a new version of POSIX.2b soon. At the same time, it is expected that the IEEE P1003.2 9210 meeting minutes will include such discussions. So, for more details please look at the IEEE minutes and/or the upcoming POSIX.2b Draft 5, or please directly contact Hal (hlj@posix.com) or the Japanese POSIX WG (posix@ccut.cc.u-tokyo.ac.jp) if you have any (urgent) questions and comments. Thank you for your co-operation. Best Regards, Yasushi Nakahara TOSHIBA Corp. Phone: +81 428-33-1346|1347 Fax: +81 428-32-0018 Email: ynk@ome.toshiba.co.jp | tsbome!ynk@u-tokyo.ac.jp | ..!tsbome!ynk ISO/IEC JTC 1/SC22/WG15 N330 ISO/IEC JTC 1/SC22/WG15 RIN N087 (RTN-009) IEEE TCOS P1003.2/N160 Japanese Comments on POSIX.2b Draft 4 JSC22/POSIX WG IPSJ/ITSCJ, Japan October 23, 1992 1. Introduction These are the comments for POSIX.2b Draft 4 (August 1992) from IPSJ/ITSCJ JSC22/POSIX WG, the Japanese National Body of JTC1 SC22/WG15 (POSIX). The comments are mainly focusing on the Annex H issues, particularly in terms of following points. (3) User-specified additional character classes [LC_CTYPE and (E)RE] (4) User-specified names for collation weights [LC_COLLATE and related APIs and Utilities] (8) State-dependent encoding support [LC_CTYPE or charmap, and related APIs and Utilities] (9) Column position/width support [LC_CTYPE or charmap, and related APIs and Utilities] (15) Wide character counterparts of POSIX.2 related APIs We also include other (new) comments for POSIX.2 and POSIX.2b. The following email address is available for the ITSCJ/JSC22/POSIX WG. Please send questions and comments to: Email: posix@ccut.cc.u-tokyo.ac.jp JSC22/POSIX WG Information Technology Standards Commission of Japan Kikai-Shinko Kaikan Bldg., 3-5-8 Shiba-Koen, Minato-Ku, Tokyo 105, Japan Tel: +81 3-3431-2808 Fax: +81 3-3431-6493 - 1 - 2. General Discussion and Comments 2.1 Additional Character Class [Annex H: (3)] The proposed additional features and their grammar in POSIX.2b/D4 seem acceptable for JSC22/POSIX WG except the following points: - A new (set of) API(s) to handle these features should be also included in POSIX.1a or another appropriate amendment for POSIX.1. - Harmonization with ISO/C MSE and XPG4 would be requested. 2.2 Ordering and Naming for Collation Weights [Annex H: (4)] We are proposing new specifications. See the detailed discussion section below. 2.3 State-dependent Encoding [Annex H: (8)] We think the following points as our major concerns. (A) What kind of state-dependent encodings we should support as our POSIX scope? (B) How to specify various (types of) state-dependent encodings in LC_CTYPE or charmap source file? (C) How to name it? Because one state-dependent encoding may include several character sets, whose charmaps or LC_CTYPE database may have their own name and definition. (D) How to specify each utility's (of POSIX.2) and each API's (of POSIX.1x) behavior associated with state-dependent encoding support? Is it sufficient enough for that by just taking the following approach? - Since state-dependent encoding features are generic over almost all the string/character handling utilities/APIs, one specific new section should be created in POSIX.2b to address the features. - Assuming that a word "character" is correctly used [we should check this, though] throughout POSIX.2 (or a set of POSIX standards) in a sense that it only means "a sequence of one or more bytes representing a single graphic symbol, i.e. a multibyte character" (regardless of its stateless/stateful- ness of encoding) as we POSOX.2 defined, each section of Utilities may not be requested to add additional description about state-dependent encoding handling unless a certain specific specification or remarks is required. - 2 - See the detailed section for further discussion and our proposals. 2.4 Character width and Column Position [Annex H: (9)] The ISO POSIX.2 CD Ballot Comments Disposition (SC22/WG15 N281) and the recent Project Editor's report on "POSIX.2 update" (email SC22.WG15.110) said: Certain approach(es) should be studied for inclusion in the POSIX.2b revision and the full international standard. [N281] Draft 4 will adopt the solution given in Japanese ballot comment ITSCJ.11. No further proposal is needed at this time. [SC22WG15.110] However, we could not find any description about the solution in POSIX.2b Draft 4. We therefore include the same proposal as our earlier comment ITSCJ.11 and its improved version in the detailed discussion section below. 2.5 Wide character version of APIs [Annex H: (15)] We have accepted the proposed approach for possible inclusion of appropriate wide character counterparts of the functions in Annex B (of POSIX.2/D12) for POSIX.2b Draft 5 or later. We have just started to study more detailed proposals, inheriting the spirit from the ISO/C MSE wide character handling approach. Our proposals will be submitted to the next ISO/POSIX meeting in 1993 or later in harmonizing with the ISO/C MSE amendment ballot process. 3. Detailed Discussion, Comments and Proposals The pages below offer a collection of our comments and proposals on POSIX.2b and POSIX.2 itself. All comments have been reviewed by our members. However, some of these are not unanimously agreed. So, each comment has name of originator. - 3 - ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 1 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect 2.4.x (State-dependent encoding) DISCUSSION. Discussion: [Background] ISO CD POSIX.2/D11.2 Ballot resolution on shift (state-dependent) encoding issues raised by ITSCJ (Japan) chose the option (c) among the following candidates: (a) State-dependent encoding is out of scope. (b) State-dependent encoding is allowed, but it is a feature of implementation defined. (c) To support state-dependent encoding is one of the issues, and it would be considered in the future draft. [Goal of POSIX.2b] ISO DIS POSIX.2/D12 Annex H says: (8) The support of state-dependent character encoding (*) should be addressed fully. [*: Original text of POSIX.2/D11 Annex H uses "state-dependent character sets". However, it is not an appropriate expression.] [Current status of POSIX.2b/D4] As the first cut, it keeps space holders for (a) 2.4 Character Set section (b) 2.5 Locale section (c) 2.8 Regular Expression Notation section (d) 4-5 several utilities sections [What are must] (1) give a definition of "state-dependent encoding" or "state-dependent encoded character set" (2) give a clear scope of POSIX(.2) on what kind of state-dependent encodings shall/should/may be supported. (3) give specification on how to define a state-dependent encoding in charmap file and/or locale (4) give specification on how to handle state-dependent encodings (by what utilities/functions) _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 2 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect Global (State-dependent encoding) OBJECTION. 1992-10-23 Page 4 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 Problem: State-dependent encoding features are generic over almost all the string/ character handling functions and utilities. For example, the following operations are very sensitive. They have to keep track of "state" transition. - string/character search - substring/character manipulations (add/delete/modify/insert/...) However, the current POSIX.2b/D4 picked up several utilities for enhancement of stateful-dependent encoding support. Since the Japanese Ballot Comments on POSIX.2/D11.2 in terms of state-dependent encoding issues may not cover all the utilities that would be effected by state-dependent support, the POSIX.2b/D4 may mislead that other utilities have no problems on state- dependent encoding support. Action: In stead of addressing state-dependent encoding support in each potential utility section (except specific requirements for a specific utility), create a new subsection in Section 2 to describe global issues and generic requirements regarding state-dependent encoding support. In particular, list up all the possible character/string processing operations which shall be carefully done in state-dependent encoding environments and specify desirable/requested result of such operations. _______________________________________________________________________________ ITSCJ/POSIX WG (Shin-ichi Yamada) Phone: +81-44-548-4523 Seq: 3 of 13 Email: ymd@rd.nttdata.jp FAX: +81-44-548-4521 ------------------------------------------------------------------------------- Sect 2.4.x (state-dependent encoding) DISCUSSION. Discussion: [ Support of State-dependent Encoding ] Charmap cannot describe character sets encoded by stateful encoding schemes well because, in a stateful encoding, there is no one-to-one correspondence between octet values and characters, and the same sequence of bytes represent different characters according to the state that is changed by locking shift escape sequences. It is possible to write a charmap for such characters by placing locking shift to the both sides of character, where the second locking shift specifies the default state. Although this virtually makes a state-dependent coding stateless, it is not the common practice as it uses a lot of extra bytes. 1992-10-23 Page 5 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 Single shift is an exception. This form of shift is used to change the state temporarily for interpreting a character that immediately follows it. In other words, every character in a character set invoked by a single shift has that single shift preceding it. Therefore, in charmap, it can be treated as a part of multibyte characters. Unfortunately, single shifts are by far the less used than the locking shifts. Besides their description in charmap, the support of state-dependent character sets poses the following problems: (1) In searching or comparing statefully encoded strings, byte-par-byte comparison does not always yield valid results. It is allowed to insert locking shifts at arbitrary character boundaries even if they are redundant. (2) In dividing, truncating or making substrings of statefully encoded strings, simply returning part of them can produce strange results because they do not contain preceding and/or following locking shifts. (3) Concatenated strings may have redundant locking shifts which causes the comparison problem mentioned above. In order to alleviate these difficulties, an implementation that supports state-dependent character sets shall: (1) process the statefully encoded strings as a concatenation of state-independent character. (2) insert (if necessary) locking shifts at the beginning and at the end of substring to retain correct state information when extracting substrings of a string. (3) eliminate redundant locking shifts whenever possible. _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 4 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect Global (Character Width/Position) PROPOSAL Problem: Handling of character/string width and column position There are several utilities (e.g. fold, pr, printf) in the POSIX.2/D12, which are character width and/or column position sensitive. However, (1) There are no appropriate descriptions about how to handle non-single width(column) characters at some kind of column (field) boundary. (2) Although string width or column position is a different 1992-10-23 Page 6 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 quantifier from either "byte count" or "character count", related utility sections do not clearly state such difference. [e.g. field specifier, precision be counted by characters/bytes or column width? Again, in POSIX.1 or C the printf() function use 'character' as a single byte character (with single character width), while in POSIX.2 "character" means "multibyte character in general (with various width)" and hence a word "character" in the printf UTILITY does not exactly match that of the printf() function. The same care should be taken for date/time format of the date utility.] These points may be well covered in each utility section case by case basis. However, there are a lot of common aspects of width/position handling in general. Unfortunately the current POSIX.2 does not provide such general principles, which may cause unnecessary inconsistency among width/column sensitive utilities. [See date, pr, printf utilities for example.] Action: Create a new subsection in Section 2, where general requirements and principles of width/column handling are described. Describe only specific requirements for specific features in each utility section. _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 5 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect 2.4.y (Character Width) PROPOSAL. Problem: Width of a character/string There are several utilities (e.g. fold) in the POSIX.2 and POSIX.2a, which are character width sensitive. And, the definition of "column" seems appropriate for so called character cell terminals. However, there is no suitable way to specify/identify the width of a character either in POSIX.2 (LC_CTYPE or charmap file) or in POSIX.1. Action: A certain specification should be developed in the future version with appropriate collaboration between IEEE/POSIX.2 WG and ISO 1992-10-23 Page 7 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 WG15/RIN. Japan is now investigating the following two different possibilities: Approach (A): introduce an additional "width" field or operand for each character in the current charmap definition file. e.g. "%s %s:%s %s0 , , , where ":%s" for may be optional. See another ballot comment for more details. Approach (B): introduce an ability of character grouping, and then specify some rules to give width information to the character groups. This character grouping capability may be also useful for another locale information of characters, such as defining collation by character groups among several different character sets; e.g. Latin < Katakana < Hiragana < Kanji ... _______________________________________________________________________________ ITSCJ/POSIX WG (Akio Kido) Phone: +81-462-73-5436 Seq: 6 of 13 Email: kido@ymtvm8.vnet.ibm.com FAX: +81-462-73-7425 ------------------------------------------------------------------------------- Sect 2.4.y (Character Width) PROPOSAL Problem: Although there are several column sensitive utilities specified in the standard, e.g. 'fold', no mechanism is provided to define column width (see column position 2.2.2.31) of a character. This attribute of character shall be defined in LC_CTYPE category or charmap file. Action: Enhance the syntax of charmap file to allow user to define the column width of the character. A proposed syntax is as follows: "%s %s:%d %s0,,,, or "%s...%s %s:%d %s0,,,, , 1992-10-23 Page 8 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 : A non negative integer The column width and preceding colon (:) are optional. When the column width is omitted, the value of column width shall be assumed one for printable characters and zero for non-printable characters. _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 7 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect 2.4.y (Character Width) Discussion and Proposal Discussion: In some countries, like China, Korea, Japan, all characters they usually use may not have a unique character width. However this does not mean that they generally use a kind of "proportional" character width. Rather they use different kind of character sets each of which has a unique character width. For example, in Japan, typical character cell terminal provides single-size Latin characters (alphabet, number, symbol), single-size Katakana characters, double-size (full-size) Katakana characters, double-size Hiragana characters, double-size Kanji characters, and besides double-size Latin, Greek and Russian (Cylilic) characters! Double-size characters all together are defined in single JIS standard (coded) character set(s), i.e., JIS X 0208 (and JIS X 0212), while single-size Katakana characters are defined in another single JIS standard (coded) character set. Although an ability of specifying character width per character is very useful not only for the cases above, but also for proportional-pitch character sets, it may be a tedious job to define character width for thousands of Kanji characters whose values are the same. So, some kind of character grouping capability is also useful in this case. (Consider user-specified character classes as another example of character grouping. POSIX.2b/D4 is introducing such extensions now.) Proposal: Approach (A) - Define character width information as an attribute of character class in LC_CTYPE (, taking advantage of newly introduced ability of user-specified additional character classes) Proposed extended grammar: ctype_keyword : charclass_keyword charclass_list EOL | charclass_keyword charclass_list ':' NUMBER EOL + | charconv_keyword charconv_list EOL ; 1992-10-23 Page 9 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 Approach (B) - Define character width by another new statement in LC_CTYPE Proposed extended grammar: ctype_keyword : charclass_keyword charclass_list EOL | charconv_keyword charconv_list EOL | charwidth_keyword charclass_list EOL + | charwidth_keyword class_list ':' NUMBER EOL + ; charwidth_keyword : 'char_width' ; Approach (C) - Define character width by another new statement in charmap [Details & proposed character grouping and charwidth statements in charmap file should be discussed in POSIX.2b Ad Hoc meeting.] ______________________________________________________________________________ ITSCJ/POSIX WG (Yukiharu Imafuku) Phone: +81-44-548-4555 Seq: 8 of 13 Email: ima@rd.nttdata.jp FAX: +81-44-548-4551 ------------------------------------------------------------------------------- Sect 2.4.y (Character width) PROPOSAL. Proposal-1: 1-1) Replace the charmap definition forms (see 2.4.1 "Character Set Description File") to the following extension forms: "%s %s:%s %s0,,,, or "%s...%s %s:%s %s0,,,,, 1-2) And add the following text to the next of encoding description: The width part is defined width of character. Character width and preceding colon (:) are optional. The value of width shall be non negative integer in the following formats. "%d",< decimal value> When "width" is omitted, the value of character width shall be assumed one for printable characters. And the value of non-printable characters is always zero. The value of width for in portable character set is used as the basic measurement unit of column-width. 1992-10-23 Page 10 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 Proposal-2: 2-1) Add the following token to the lexical convention and the grammar: %token width 2-2) Replace the grammar for charclass_list with the following: charclass_list : charclass_list ';' char_symbol EOL | charclass_list ';' ELLIPSIS ';' char_symbol EOL | charclass_list ';' char_symbol':'NUMBER EOL | charclass_list ';' ELLIPSIS ';' char_symbol':'NUMBER EOL ; Proposal-3: 3-1) Add the following text to the end of LC_CTYPE description: Keyword width may be specified as an character attribute in LC_CTYPE category. width Define the character width of print characters. The operands shall consist of value of character width and charwidth_list. If keyword width is omitted from the locale definition, the value of character width for each character class is assumed one. 3-2) Add the following token to the lexical convention and the grammar (in addition to SUHIRO-SAN proposal): %token width Replace the grammar for charclass_keyword with the following: ctype_keyword : charclass_keyword charclass_list EOL | charconv_keyword charconv_list EOL | charwidth_keyword NUMBER charwidth_list EOL ; charwidth_keyword : 'width' ; charwidth_list : charwidth_list ';' char_symbol EOL | charwidth_list ';' ELLIPSIS ';' char_symbol EOL | char_symbol EOL ; ______________________________________________________________________________ ITSCJ/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 9 of 13 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ Sect 2.5.2.2.3 (LC_COLLATE) PROPOSAL. 1992-10-23 Page 11 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 Problem: In most cases of ideographic characters, it is a requirement that a user be able to specify collation weights as he/she wants. In case of Japanese characters (Kanji), for example, there are five possible collation weights for supporting Japanese SORT. The five weights are On-yomi (pseudo-Chinese pronunciation), Kun-yomi (Japanese pronunciation), number of strokes, radical (components of Kanji), and Kanji character code. There could be more weights. The LC_COLLATE part of localedef specifications should allow a user to describe these weights and give names to the weights. Any combinations of the defined weights should be able to be specified by the user at run time. Proposal: LC_COLLATE extension for specifying weight name =============================================== => 2.5.2.2.3 order_start Keyword. Add the following directive description and the Example. It is implementation defined whether the following optional directive shall be recognized. If they are not supported, but present in a localedef source, they shall be ignored. name specifies the name of a collation weight by a string. An order of weights may be specified by using the name at run time. The syntax for the name directive shall be: "name = Example: order_start forward,name="kunyomi";forward,name="radical" If an operand has a name directive, the definition of the primary, secondary, or subsequent weights for the collation element may be different from the order of operands to the order_start keyword. => 2.5.3.2 Locale Grammar. Modify the opt_word description as follows: opt_word : 'forward' | 'backward' | 'position' | 'name' '=' weight_name ; weight_name : '"' char_list '"' Rationale: 1992-10-23 Page 12 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 User's requirements for character collation in Asia are diverse. Ideographic characters have several rules to sort such as by pronunciations, strokes, etc. and the combination of the rules are used for their sorting. Those properties for a character such as pronunciation can be assigned as weights for a character element. However, no standard primary weight, secondary weight and so on exists for the weights (properties). The weight name extension for LC_COLLATE allows the order of multiple weights to be defined at run time in the different order than the order of operands to order_start keyword. To make the different order effective, the weight names can be specified in the setting of LC_COLLATE category. order_start forward, name="kunyomi";forward,name="radical" When a ja_JP.eucJP locale has the above definition in the LC_COLLATE part, the order of sorting rules can be specified as follows by using the weight names. LC_COLLATE = ja_JP.eucJP@weights=radical,kunyomi This means that the sort-rule "radical" is used as the primary weight and "kunyomi" is used as the secondary weight. ______________________________________________________________________________ ITSCJ/POSIX WG (Yoichi Suehiro) Phone: +81-45-336-5361 Seq: 10 of 13 Email: suehiro@jrd.dec.com FAX: +81-45-336-5599 ------------------------------------------------------------------------------ Sect 2.5.2.5 LC_TIME (POSIX.2/D12) OBJECTION. page 68, line 2289-2290: Problem: [ era_d_t_fmt for the LC_TIME category ] In the date utility, some field descriptors can be modified by the E modifier character to indicate era representation. The description for %Ec is as follows: %Ec Alternate appropriate date and time representation of the locale. However, no corresponding LC_TIME keyword is defined in the locale definition for this modified field descriptor. Using era_d_fmt and t_fmt does not serve for this purpose. Because t_fmt definition can be defined without thinking of the era notation. For example, the time representation 10:20:30 does not usually used with the era notation. Action: Add era_d_t_fmt (the format of the date and time in alternate Era notation) as a keyword to LC_TIME. 1992-10-23 Page 13 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 ______________________________________________________________________________ ITSCJ/POSIX WG (Toshinori Numata) Phone: +81-44-754-3343 Seq: 11 of 13 Email: numa@sysrap.cs.fujitsu.co.jp Fax: +81-44-754-3522 ------------------------------------------------------------------------------ Sect 4.73.3 (iconv) OBJECTION. page 72, line 2022: Problem: [iconv command option] The description of the "-f fromcode" option says that "If the option-argument is the pathname of a readable file, iconv shall attempt to use it as a charmap file, as defined in 2.4.1." This semantics may cause unexpected results depending on the current working directory, because if a file or a directory in the current directory happens to be the same name of "fromcode" (or "tocode"), iconv will treat the file as charmap file. This behavior restricts users to use file name same as codeset name. Because there are no standards for charmap file name, it will be impossible to use iconv command in a portable manner. I think there should be a mean for users to specify explicitly the "fromcode" and "tocode" arguments to be used as charmap files. Action: There are three proposals for the modification of iconv specification. (1) The first proposal is to add a new option, "-c", to specify the "fromcode" and "tocode" option-arguments are charmap file names. If "-c" option is not specified, iconv will treat "fromcode" and "tocode" option-arguments as implementation- defined codeset names. Change the description of "-f fromcode" option (lines 2021-2028) to: -f fromcode Identify the codeset of the input file. Valid values for fromcode are specified in the system documentation. If this option is omitted, the codeset of the current locale shall be used. and add the following option description after the line 2030: -c Treat the fromcode and tocode option-arguments as the names of charmap files. If the option-arguments are the pathnames of readable files, iconv shall attempt to use them as charmap files, as defined in 2.4.1. If the readable file is not a valid charmap file, the results are undefined. If the option-argument is not the 1992-10-23 Page 14 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 pathname of a readable file, the results are implementation defined. (2) The second proposal is to add new set of options which specify charmap file names. In this proposal, "-f fromcode" option is always used to specify codeset name. To specify charmap file, you must use "-F fromcharmap" option. Change the description of "-f fromcode" option (lines 2021-2028) to: -f fromcode Identify the codeset of the input file. Valid values for fromcode are specified in the system documentation. If this option is omitted, the codeset of the current locale shall be used. and add the following option description after the line 2030: -F fromcharmap Identify the codeset of the input file. If the option- argument is the pathname of readable file, iconv shall attempt to use them as charmap file, as defined in 2.4.1. If the readable file is not a valid charmap file, the results are undefined. If the option- argument is not the pathname of a readable file, the results are implementation defined. If this option is omitted and -f fromcode option is not specified, the codeset of the current locale shall be used. If both of the -F fromcharmap and the -f fromcode options are specified, the results are undefined. -T tocharmap Identify the codeset of the output file. The semantics are equivalent to the -F fromcharmap option. (3) The third proposal is to add a mechanism to identify fromcode (or tocode) option-argument is charmap filename or not. In the following description, if fromcode or tocode option-argument has a character in it, it will be used as charmap file. Change the description of "-f fromcode" option (lines 2021-2028) to: -f fromcode Identify the codeset of the input file. If the option- argument contains character in it and the pathname of a readable file, iconv shall attempt to use it as a charmap file, as defined in 2.4.1. If the readable file is not a valid charmap file, the results are unspecified. If the option-argument does not 1992-10-23 Page 15 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 contain character, the results are implementation defined. If this option is omitted, the codeset of the current locale shall be used. ______________________________________________________________________________ ITSCJ/POSIX WG (Toshinori Numata) Phone: +81-44-754-3343 Seq: 12 of 13 Email: numa@sysrap.cs.fujitsu.co.jp Fax: +81-44-754-3522 ------------------------------------------------------------------------------ Sect 4.73.5.3 (iconv) OBJECTION. page 73, line 2058: Problem: [LC_CTYPE environment variable description of iconv command] In the description of "-t tocode" option of iconv command, it says that "The semantics are equivalent to the -f fromcode option." and the last sentence of "-f fromcode" says "If this option is omitted, the codeset of the current locale shall be used." It means that if the "-f fromcode" option is specified and the "-t tocode" option is omitted, the codeset of the current locale is used as the output file's codeset. This behavior should also be noted in the LC_CTYPE description. Action: Add the following sentence after the line 2058: If -t tocode option is omitted, this variable shall determine the codeset of the output file. _______________________________________________________________________________ ITSCJ/POSIX WG (Yasushi Nakahara) Phone: +81-428-33-1347 Seq: 13 of 13 Email: ynk@ome.toshiba.co.jp FAX: +81-428-32-0018 ------------------------------------------------------------------------------- Sect Global (POSIX.2/D11.2 Resolutions) Problem: After careful reading of the ISO CD POSIX.2/D11.2 Ballot Dispositions (ISO SC22/WG15 N281), we found that some of the resolutions are not clear or some of the problems are not solved yet in the POSIX.2/D12. The following are such our commnts in the previous ballot: ITSCJ.5, ITSCJ.8, ITSCJ.46 Action: We are attaching the previous comments hereafter. Give more clear explanation or update the DIS POSIX.2 as 1992-10-23 Page 16 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 indicated in the Ballot Resolutions. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 182-5 @ Resolution= Accepted @------------------------------------------------------ @182 1/3 c 5 a Sect 1.3.2 (Application Conformance) COMMENT. page 15-16, line 542-590 Problem: Conformance Section 1.3.2 Application Conformance says that there are four categories of application conformance. - 1.3.2.1 Strictly Conforming POSIX.2 Application - 1.3.2.2.1 ISO/IEC Conforming POSIX.2 Application - 1.3.2.2.2 Conforming POSIX.2 Application - 1.3.2.3 Conforming POSIX.2 Application Using Extensions The idea of " Conforming" is acceptable, but we think that it is necessary to reconsider about the relationship between ISO/IEC Conforming POSIX Application and Conforming POSIX Application, since without suitable "guidelines" it will mislead to incompatibilities among "nations". For example, we think character encoding and character handling using "wchar_t" are very important for applications as well as implementations. However, if one defines a codeset in its "National Profile" while others do not, there will be serious problems of international portability and/or compatibility of POSIX Conforming Applications. Action: Provide suitable guidelines on what and how to specify some features and options for such ISO/IEC Conformance and Conformance. ------------------------------------------------------ RESOLUTION: There are two sources of guidelines. The first is the existing pointer to the 9945-1 rationale, where the hierarchy of application conformance classes is described. That hierarchy implies that strictly conforming applications and those that rely on international standards (such as 10646) are more portable than those that require national standards (such as ASCII). The second is the new Annex F on portability considerations, as mandated by the TSG-1 Final Report. 1992-10-23 Page 17 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 @====================================================== @ Final= Objection, Original= Objection, TR= greger, BG= 182-8 @ Resolution= Accepted[Modified] @------------------------------------------------------ @182 2/4 o 8 m Sect 2.4 (Character set) OBJECTION. page 54-55, line 1232-1252: Problem: In the current draft, is there an assumption that the portable character set shall consist of single-byte characters? The only requirements that POSIX.2 places on coded character set in page 55 seem to allow that the portable character set consists of strictly multibyte characters. However, the current POSIX.1 and POSIX.2 specifications do not fit the case where the portable characters are implemented as two-or-more-byte characters, even though the character set definition file itself can allow such definition. Action: If there is an undocumented assumption that the portable character be single-byte, add clearly such statements in section 2.5. If no requirement on byte/bit size of the portable character set is assumed except that the minimum value of CHAR_BIT is 8, please say so and then specify more "character" oriented interfaces both in POSIX.2 and POSIX.1, instead of the current "byte" oriented interfaces. For example, the proposed getopt() function does not work well in two-byte portable character set environment. And the several utility syntax guidelines in section 2.10.2 will cause a conflict and fail. ------------------------------------------------------ RESOLUTION: We have modified D11.3 in clause 2.4 and the introduction to Annex B to describe the character set restrictions. These changes were sent to you in email prior to publication. @====================================================== @ Final= Comment, Original= Comment, TR= hlj, BG= 182-46 @ Resolution= Accepted[Modified] @------------------------------------------------------ @182 4/35 c 46 m Sect 4.35 (localedef) COMMENT. page 488, line 7890 - 7895: Problem: The following description exists in LC_CTYPE: "This variable shall have no affect on the processing of localedef 1992-10-23 Page 18 ITSCJ/JSC22/POSIX WG Comments on POSIX.2b/D4 input data; the POSIX Locale shall be used for this purpose, regardless of the value of this value.". On the other hand, a locale definition file allows to specify a character itself in it. (2.5.2 Locale Definition, Page 65) When characters which are not defined in POSIX locale are present in a localedef input data, localedef cannot parse strings correctly. For example, a byte which corresponds to a localedef special character (such as separators) may occur in the second or subsequent bytes of a multibyte character. Action: Reconsider the reference of LC_CTYPE environment variable by localedef or reconsider the specification of locale definition. ------------------------------------------------------ RESOLUTION: We propose that this issue be discussed by WG15 in its resolution meeting. We regard localedef in this context in the same way one would look at a compiler; i.e., there must be a known state for it to work in. The conclusion is, of course, that the only characters you can portably use in a localedef source are those in the portable character set; for anything else you better be using either symbolic names or hex/oct/dec constants. 1992-10-23 Page 19