From ynk@ome.toshiba.co.jp Tue Jun 22 04:18:22 1993 Received: from tiswd-gw.toshiba.co.jp by dkuug.dk with SMTP id AA04265 (5.65c8/IDA-1.4.4j for ); Tue, 22 Jun 1993 04:18:22 +0200 Received: by tiswd-gw.toshiba.co.jp (5.67+1.6W/2.8Wb) id AA08452; Tue, 22 Jun 93 11:24:57 JST Received: from tis4.tis.toshiba.co.jp by tis2.tis.toshiba.co.jp (4.1/6.4J.6-R04) id AA27671; Tue, 22 Jun 93 11:07:03 JST Received: from ome-relay by tis4.tis.toshiba.co.jp (5.52/6.4J.6-R05) id AA17329; Tue, 22 Jun 93 11:05:39 JST Received: by tsbome.ome.toshiba.co.jp (5.64/6.4J.5-OMgw0.1) id AA05609; Tue, 22 Jun 93 11:06:09 +0900 Return-Path: Message-Id: <9306220206.AA05609@tsbome.ome.toshiba.co.jp> To: sc22wg15@dkuug.dk From: ynk@ome.toshiba.co.jp (Yasushi Nakahara) Subject: Full text of Japanese comments on DIS 9945-2 Reply-To: ynk@ome.toshiba.co.jp Date: Tue, 22 Jun 93 11:05:09 JST X-Charset: ASCII X-Char-Esc: 29 Hi all the WG15 experts, Here is a full text of the Japanese comments on DIS 9945-2 ballot. Although I've already sent the online version to Jim and Hal, I would like to make our Japanese comments visible online for all concerned people. Please take these into your consideration for a disposition of comments. Thanks and Regards, - ynk Yasushi Nakahara Phone: +81 428-33-1346|1347 TOSHIBA Corp. ISE Lab. OSA Fax: +81 428-32-0018 2-9 Suehiro-cho, Ome-shi, Email: ynk@ome.toshiba.co.jp Tokyo 198, JAPAN _______________________________________________________________________________ Japanese Comments on ISO/IEC DIS 9945-2 (POSIX.2) 1. Introduction The followings are the comments for ISO/IEC DIS 9945-2 (POSIX.2) from the Japanese National Body. Since 1988, we've been sending various comments for ISO/IEC 9945 series of standard, concerning byte and character issues in terms of internationalization of POSIX specification, in order to make POSIX standards really "internationalized" and hence acceptable by all National Member Bodies and the related industry worldwide. As usual, we have been eagerly reviewing the DIS 9945-2 in terms of internationalization, namely "byte vs. character" and "multibyte character" issues. Although we strongly support POSIX.2 approach about character handling in a sense that "a character is a character, not a byte", we consider the current draft is insufficient in this point in conjunction with consideration of its locale sensitivity and string handling capabilities. Therefore, the Japanese National Body votes "No" on ISO/IEC DIS 9945-2. If the comments are accepted, the vote will be changed to "Yes". 2. Overview We have been carefully reviewing the DIS 9945-2 mainly from an "internationalization" or "standardization of national/regional language support" point of view. Japan believes that the most important thing of internationalization is to achieve kinds of "character independency" and "locale independency". In the light of this, we would like to repeatedly stress that the following aspects should be taken into consideration when defining and finalizing the POSIX.2 specifications. o Character counts != byte counts o Character counts != display width o Byte counts != display width o Only the "wchar_t" type in C language (known as a "wide character") corresponds to the concept of a character. o Specifications of character-oriented interfaces shall be carefully designed in terms of the following points: - Character boundary recognition - Limit check & truncation in various units, in particular, make clear what units (byte, character, column, width, and etc.) shall be applied. - Character/string width recognition - Character/string parsing & manipulation - Language dependency of text data including message data - Culture dependency of representations As such, our major concerns in this ballot can be categorized into the following groups: o Copyright issue o UPE option o POSIX Locale o "Word" handling in ex and vi. o Byte, character, and column position issues o others 3. Comments The pages below offer a collection of our detailed objections and comments on the DIS 9945-2. _______________________________________________________________________________ Japanese Comments on DIS 9945-2 _______________________________________________________________________________ Sect Global (copyright) OBJECTION (editorial). Problem: The document organization does not follow the standard style defined by ISO/IEC Directives Part 3, Drafting and Representation of the International Standard. The document should not contain any description claiming the IEEE copyright. Action: Follow the ISO/IEC standard style and remove all the IEEE copyright notice in each page. _______________________________________________________________________________ Sect Global (Merging UPE options) OBJECTION. Problem: Merging the previous POSIX.2 and POSIX.2a into a single document DIS 9945-2 is a big change. It does not mean to simply add a new Section 5. It does incorporate a lot of changes as well into the existing parts of "POSIX.2 classic" in Section 2, Section 3, and Section 4 as "optional" extensions with a symbolic name {POSIX2_UPE}. However, such "optional" extension parts cannot be easily identified except in Section 5. Also, it is observed the two wordings "User Portability Utility Option" and "User Portability Extension" are used (in the same meaning?), which is misleading to ordinary readers. As an international standard (of final stage), it is expected that such optional parts can be easily identified. Action: As for the additional parts of the UPE (or UPUO?), at least the entirely new sections or subsections or sub-subsections ... should have a title of the following form: (Sub)Section # xxxxxxx (UPE option) by an analogy with normative/informative annex titles such as: Annex X (normative) xxxxxx Annex Y (informative) yyyy For example, "3.3.1 Alias Substitution" subsection of the shell language as a whole is solely for UPE extension. So, the title could be: 3.3.1 Alias Substitution (UPE option) _______________________________________________________________________________ Sect Global (the POSIX Locale) OBJECTION. Problem: In this draft of standard, there are a lot of descriptions such as "In the POSIX Locale, ...." to restrict normative specifications to a limited locale or to give strict meanings for the descriptions. Such wording is very much appreciated in terms of *strict* specification, however, it is unclear whether or not the word "the POSIX Locale" can be used for a locale that is a superset of *the POSIX Locale*, where *the POSIX Locale* means a locale which is strictly created and supported by conforming implementation as if the locale was defined via the localedef utility with inut data from Table 2-6, Table 2-8, Table 2-9, Table 2-10, Table 2-11, Table 2-12, all in 2.5.2. (See 2.5.1 POSIX Locale) It seems that the current usage of the word "in the POSIX Locale" throughout this draft may mean the following two things on a case by case basis. 1. Strict meaning: in *the POSIX Locale* which is nothing more and nothing less than what 2.5.1 tables define. 2. Non-strict meaning: in the "POSIX" locale which is a super set of *the POSIX Locale*, or compatible with *the POSIX Locale*. Action: If this standard has a clear intention that the locale name "POSIX" shall not be used for such a superset or a compatible locale, please say so somewhere appropriate in this document, probably in Section 2.5.1. Re-examine the wordings "in the POSIX Locale" throughout this draft from this point of view, since it is believed if the above is the case (i.e. a locale compatible with *the POSIX Locale* cannot be named with "POSIX"), several wordings "in the POSIX Locale" should be reworded as "in the POSIX compatible locale". _______________________________________________________________________________ Sect Global ("word" handling) OBJECTION and PROPOSAL. Problem: [Background] There are several utilities in POSIX.2 which are related to a "word" handling capability, such as vi, ex, wc and talk, whose target text files are not only written in specific programming languages (e.g. C, shell, awk and others), but also written in natural languages, such as English, French, German, Japanese, Chinese and so forth. These utilities are expected to handle "words" in such natural language files in an appropriate manner. Unfortunately, however, an actual definition of a "word" in natural language text may vary from country to country or from language to language, i.e., vary from "locale" to "locale". Another observation of POSIX.2 is that there is neither common definition of "word", nor a mechanism to handle such locale-dependent "words" in a portable manner. Rather, each utility defines its own "word" in a fully ASCII/English dependent manner only in the "POSIX Locale". On the other hand, for example, in Japan/China normal Japanese/ Chinese text has no "white space" at all. White spaces may appear only in a mixture of Japanese/Chinese text and an English or Latin text. This may force the implementors in each country (mainly non-English speaking country) to develop their own version of these utilities to meet the market needs. Situations are quite similar to what regular expressions faced with before POSIX.2 defines generic and flexible regular expressions to eliminate their ASCII/English dependency in the past. Now is the time that we give a generic and flexible mechanism to handle such various types of a "word" with an ability to specify definition of a "word". [Direction of a proposal] As described, a concept of a "word" is fully dependent on the natural language or culture in each country. If we want to go in detail about a linguistic definition of "word", we need some experts on linguistics. But, it does not meet our POSIX objectives and goals in terms of computer information processing. It would be waste of time. Instead, we should try to give a common or reasonable definition of "word", using (extended) regular expressions. It is also important that we re-examine the purpose of the "word" processing in each commands, how useful/meaningful it is, or what convenience is intended. [Rough definition of each "word" by (extended) regular expressions] In Latin text, (1) [^[:space:]]+[:space:] corresponds to: - a generic (Latin) word - a wc's word - a vi's bigword (2) [[:alnum:]_]+[^[:alnum:]_] corresponds to: - an ex's word (3) ([:alnum:]+[^[:alnum:]])|([:SPECIAL:]+[^[:SPECIAL:]]) matches: - a vi's (small) word. In Japanese text, (4) ([:KANJI:]+)|([:HIRAGANA:]+)|([:KATAKANA:]+)|([:OTHER-JAPANESE:]+) may meet a requirement to handle a "word" in Japanese text by these utilities. In Korean text, (5) ([:HANJA:]+)|([:HANGUL:]+) or [^[:space:]]+[:space:] may meet a requirement to handle a "word" in Korean text by the above utilities. In Chinese text, (6) [:HANZI:] or [:HANZI:]+ may meet a requirement to handle a "word" in Chinese text by the above utilities. Please note here that capital letter names of character classes above are not defined in the current POSIX documents, however, taking advantages of an ability to define new character classes by implementations and/or even by the system users, such character classes may be defined in a supported locale by the system. Proposal: [Proposed definition of a "word" and introduction of LC_WORD] The above examples show that a definition of "word" requires certain character classes to define it clearly. In other words, a "word" (small "word") can be defined by using some character classes as follows: - A "word" is a maximal string of certain character classes (that are dependent on language, and whose information should be given in a "locale" specified by LC_CTYPE, and LC_WORD). A "word" is normally delimited by white space. - In a strict meaning, a "word" does not include the following white spaces. But, somewhere it is convenient, a "word" may include the following white spaces. Where, a new environment variable LC_WORD shall be defined by: LC_WORD This variable shall determine the locale category for detecting generic word boundaries, whose consequent information (note that this part of standards does not specify semantics of the direct value of this valuable, it may be a pathname in which an actual definition of a word is provided) shall provides definition of a word in an extended regular expression form. If this valuable is unset or set to null, the default word definition: a maximal string of non-space characters shall be applied. Some utilities (such as vi and ex) may provide another mechanism by their own command interface to override a definition of their own "word". Please also note that each programming language, such as shell, C, awk and so forth, definitely has a need to define a word of its language by itself. And hence, LC_WORD shall not affect their own lexical processing in terms of their own "word" handling, unless so stated by such programming languages. [Extension of a "word" handling of vi and ex.] The above mechanism to detect word boundaries through the LC_WORD environment variable is not sufficient for more sophisticated utilities such as ex and vi. In particular, vi may require more flexible mechanism by itself to handle different types of a word (a bigword and a smallword) at the same time. This could be solved by introducing a new set of option variables for vi and ex. (1) Ex's extension Add a "wordexpr" optional string variable: 5.10.7.5.xx wordexpr, wd [ Default: [[:alnum:]_]+[^[:alnum:]_] ] The wordexpr option shall define a "word" for detecting word boundaries of the text. The wordexpr option can be set a character string consisting of extended regular expression notation. If this option is unset or set to null, the default value shall be set via LC_WORD. If LC_WORD is unset or set to null, then the default shall be applied as if it is set to "[[:alnum:]_]+[^[:alnum:]_]". In the visual mode, both of this option and an additional "swordexpr" option shall be regarded as (two different types of) definitions of a "small word". (2) Vi's extension Add two optional string variables: "swordexpr" and "bwordexpr" 5.10.7.5.yy "swordexpr", swd [ Default: [:SPECIAL:]+[^[:SPECIAL:]] ] The swordexpr option shall specify an additional definition of "word" (small word) for detecting word boundaries of the text in the visual mode. The swordexpr option can be set a character string consisting of extended regular expression notation. If this option is unset or set to null, the default value shall be applied as if it is set to "[:SPECIAL:]+[^[:SPECIAL:]]". 5.10.7.5.zz "bwordexpr", bwd [ Default: [^[:space:]]+[:space:] ] The bwordexpr option shall specify a definition of "bigword" for detecting word boundaries of the text in the visual mode. The bwordexpr option can be set a character string consisting of extended regular expression notation. If this option is unset or set to null, the default value shall be set via LC_WORD. If LC_WORD is unset or set to null, then the default shall be applied as if it is set to "[^[:space:]]+[:space:]". Action: Start a feasibility study of the above proposal in SC22/WG15 level with appropriate collaboration of IEEE/POSIX.2 WG and Japanese POSIX WG, aiming to incorporate such proposal into the next DIS version or in the near future amendment, i.e. POSIX.2b. _______________________________________________________________________________ Sect 1.3.1.3 (POSIX2_UPE) OBJECTION. page 6: Problem: The following description of the POSIX2_UPE symbolic constant is not sufficient, since the associated optional extensions in other parts than in Section 5 are not mentioned. {POSIX2_UPE} The system supports the User Portability Utilities Option in Section 5. It is believed that the associated optional extensions in other sections, such as Section 2, Section 3 and Section 4, shall be supported under the {POSIX2_UPE} constant with a value of 1. Action: Change the description of the POSIX2_UPE constant as follows. {POSIX2_UPE} The system supports the User Portability Utilities Option in Section 5 and the associative optional extensions in Section 2, Section 3 and Section 4. And, this fact should be noted appropriately at the beginning of Section 5 and in Table 2.19 on page 118. _______________________________________________________________________________ Sect 2.2.2.94 (job ID with string) OBJECTION. page 27: Problem: Since this draft does not define a "string", the following job IDs with "string" is not well defined. %string Job whose command begins with "string" %?string Job whose command contains "string" It is unclear whether a "string" of Job ID can contain characters or has special delimit characters or has none. Action: Add a clear description of a string which is allowed as a part of Job ID. _______________________________________________________________________________ Sect 2.4.1 (Character Set) EDITORIAL COMMENT. page 45: Problem: The last paragraph of the page 45 says: For the interpretation of the dollar-sign and the number-sign, see 2.2.2.45 and 2.2.2.110. It was there because the old draft of POSIX.2 (CD 9945-2.2) had special interpretation of dollar-sign and number-sign in the General Terms section. Such special interpretation, like "permits the substitution of the pond sign for the number-sign #, and the currency symbol in ISO 646 for the dollar-sign $", have been dropped in the current draft. So such special reference is no longer needed. Action: Delete the last paragraph of the page 45. _______________________________________________________________________________ Sect 2.5.2.3 (LC_MONETARY) EDITORIAL COMMENT. page 62: Problem: In the "Table 2-9 -- LC_MONETARY Category Definition in the POSIX Locale", the definition of frac_digits is missing. Action: Add the following line: frac_digits -1 after the "int_frac_digits -1" line in the Table 2-9. _______________________________________________________________________________ Sect 2.5.2.3 (LC_MONETARY) OBJECTION. page 62: Problem: In the paragraph just below the Table 2-9, there is a description: Keywords that are not provided, string values set to the empty string (""), or integer keywords set to -1, shall be used to indicate that the value is not available in the locale. But the definition of "not available" is not available. What is the intention of the wording "not available"? What is the difference between "undefined" and "not available"? Action: Describe the meaning of "not available" in that paragraph, or the Terminology section. _______________________________________________________________________________ Sect 2.5.2.5 (LC_TIME) EDITORIAL COMMENT. page 67: Problem: In the "Table 2-11 -- LC_TIME Category Definition in the POSIX Locale", t_fmt_ampm is defined using ' '. That space should be replaced with for clear definition. There is also a typo in the same definition; should be . Action: Replace the definition: t_fmt_ampm "\

" with: t_fmt_ampm "\

" _______________________________________________________________________________ Sect 4.40.5.3 (mailx) EDITORIAL COMMENT. page 357: Problem: The sentence in LISTER paragraph "If this variable is null or not set, the output command shall be ls (see 4.39)." seems to be specifying the default. Action: Remove the succeeding sentence "The default value shall be unset.". _______________________________________________________________________________ Sect 4.40.5.3 (mailx) EDITORIAL COMMENT. page 358: Problem: The style of the description is not consistent with the others. Action: Replace "The name of a preferred command interpreter." with "This variable shall be interpreted as the name of a preferred command interpreter.". _______________________________________________________________________________ Sect 4.40.7.2.11 (mailx) EDITORIAL COMMENT. page 368: Problem: The description for the reference to LISTER environment variable is missing in the text of folders command. Action: Add the following text at the end of this paragraph: "The command specified by the LISTER environment variable shall be used."(See 4.40.5.3).". _______________________________________________________________________________ Sect 5.2 (at) OBJECTION. page 514-515: Problem: It is unclear what should be done when the following argument combinations, i.e. "batch queue with time specifier(s)", are specified. Case (1): $ at -q b -t time Case (2): $ at -q b timespec First interpretation: In both cases, should the at utility schedule to submit the job(s) at the specified time (by -t time or timespec) to the batch queue? Second interpretation: Or, should the at utility submit the job immediately to the batch queue, indicating that this batch job should be scheduled at (around) the specified time, which interpretation may be wrong since the batch queue does not seem to accept any time constraints? Third interpretation: Should the at utility ignore any time specifier(s) and submit the job(s) to the batch queue "now"? Action: Add clear descriptions in "5.2.3 Options" subsection "-q queuename" paragraph when a batch queue is specified with a time specifier (-t time or timespec) for a job execution. _______________________________________________________________________________ Sect 5.2 (at) OBJECTION. page 514-516: Problem: The descriptions about "time" specifier are misleading, simply because the (italic) word "time" is used in different two ways; one is for "-t time" option, another is "time" field element of "timespec" operand, both of which are unfortunately of different form. Action: Use two different words throughout this section. The followings are such recommendations: Recommendation 1: "-t time" -> "-t time_digits" "time" (of timespec) -> "time" Recommendation 2: "-t time" -> "-t time" "time" (of timespec) -> "at_time" Note: In this case, "date" field element of "timespec" should, too, be changed from "date" to "at_date" for consistent descriptions. Recommendation 3: "-t time" -> "-t t_time" (implying "touch" time format) "time" (of timespec) -> "a_time" (implying "at" time format, and "date" -> "a_date" as well) _______________________________________________________________________________ Sect 5.2.6.2 (at) OBJECTION. page 520: Problem: The format of the successful notice being written to standard error which is specified in 5.2.6.2 should also be in a case of the POSIX Locale only. Action: Change the first sentence of this subsection to: In the POSIX Locale, the following shall be written to standard error when a job has been successfully submitted: "job %s at %s\n", at_job_id, where shall have the same format as is described in Standard Output. _______________________________________________________________________________ Sect 5.3 (batch) OBJECTION. page 521: Problem: The title of batch utility "Execute commands when the system load permits" is not appropriate for International Standard. Also, the first sentence of the description subsection (5.3.2) is not appropriate. Action: (1) Change the title: from "batch - Execute commands when the system load permits" to "batch - Schedule commands to be executed in a batch queue" (2) Change the first sentence of 5.3.2: from "The batch utility shall read commands to be executed at a later time." to "The batch utility shall read commands from standard input to be scheduled for their execution in a batch queue. Mail shall be sent to the invoking user after the batch job has run, announcing its completion even if the error termination has occurred." _______________________________________________________________________________ Sect 5.3 (batch) OBJECTION. page 521-523: Problem: Several subsections under 5.3.5 and 5.3.6 just describe "See 5.2." This is not sufficient for clear specifications of the batch utility. In particular, locale dependency and output format specifications are unclear. Action: Add clear descriptions for each subsection so that implementations and/or users can understand what exactly means by "See 5.2", i.e. whether or not they are the same as the specifications of the at utility or appropriate modification(s) may apply in case of the batch utility. _______________________________________________________________________________ Sect 5.10.7.2 (ex) OBJECTION. page 551: Problem: [Definition of "word"] The current draft defines a word of the ex utility only in the POSIX Locale, while the LC_CTYPE environment description in 5.10.5.3 does not describe any constraint as such, which may lead a reader to understand that word boundary detection may be affected by the LC_CTPE variable in any locale. As an international standard, POSIX.2 is expected to provide more internationalized version of the ex (and vi) utility. Action: Consider the proposed extensions of the ex/vi utility in another comment in this ballot, , as one of such potential solutions. _______________________________________________________________________________ Sect 5.10.7.2 (ex) OBJECTION. page 551: Problem: [Description of "file" argument] While the Extended Description in 5.10.7 defines two special specifiers for file name; % (the current pathname) and # (the last mentioned pathname or the previous current pathname), the "file" paragraph in 5.10.7.2 does not address these special names. As a result, a strict reader could interpret that the file argument, including % and #, shall only be subjected to the shell word expansion process, which is not the intention of this standard, I observe. Action: Add a special description for "%" and "#" as a (part of) "file" argument based on the descriptions in 5.10.7 on page 547. _______________________________________________________________________________ Sect 5.10.7.2.1 (ex) OBJECTION. page 551-552: Problem: [Definition of "rhs"] The current draft defines nothing about "rhs" of the "abbrev" command except that it is a "string" in the following synopsis: ab[brev] word rhs This is so misleading that the following two interpretations could be possible: Interpretation 1: The "rhs" is an entire string that begins with a nonblank character followed with any characters and ends at end of a line with a newline exclusive. Interpretation 2: The "rhs" is a string consisting of nonblank characters. Action: Add a clear description of the "rhs" string. _______________________________________________________________________________ Sect 5.10.7.2.8 (ex) OBJECTION. page 553: Problem: [Current line indicator for "edit" command] Regarding the current line indicator, the current draft defines as follows: - If file is omitted or results in the current file, the current line indicator shall not be changed. - Otherwise, the current line indicator shall be the last line of the buffer; however, if this command is executed from within visual mode, the current line shall be the first line of the buffer. The last description about vi's case, seems to break existing practice for "edit #" (edit the last mentioned or the previous current file) or similar, which is very useful; i.e. if file is the previous current file, the current line indicator is set to the previous current line of that file (to switch back to the previous position of the previous file easily). Action: Change the second dash paragraph as follows. - Otherwise, the current line indicator shall be the last line of the buffer; however, if this command is executed from within visual mode, the followings shall be applied: 1) if file is or results in the previous current file, the current line indicator shall be the previous current line of that file. 2) otherwise, the current line indicator shall be the first line of the buffer. _______________________________________________________________________________ Sect 5.10.7.2.13 (ex) OBJECTION. page 555: Problem: [line folding of "list" command] The following description is not appropriate, "Long lines shall be folded; the length at which folding occurs is unspecified, but should be appropriate for the output device.", because of the following reasons: 1) Without an appropriate notice, a multi-column character may happen to be split into unappropriate pieces. 2) the word "appropriate for output device" is not appropriate for this standard. Action: Change the current description about line folding as follows. "Long lines shall be folded; the length at which folding occurs is unspecified, however, the followings shall be applied: - A multi-column character at the folding position shall neither be separated nor be discarded. - Folding should be as appropriate as possible for the output terminal through the information from the COLUMNS environment variable and other characteristics of the terminal." _______________________________________________________________________________ Sect 5.10.7.2.14 (ex) OBJECTION. page 555: Problem: [Definition of "rhs"] The definition of a "rhs" string in the following synopsis is unclear. map[!] [x rhs] The current draft seems to define that the "rhs" of the "map" command would be a string consisting of "printable" characters with a special escaping character a "control-V" (or control-Q in "nomagic" mode?) for "nonprintable" characters except for . This may lead to the following interpretation: The "rhs" is an entire string that begins with a nonblank character followed with any printable characters and ends at end of a line with a newline exclusive. Nonprintable characters [note: in the current draft, a clause "except for " is added. However, by definition, characters are printable, aren't they?] can be specified in the rhs by escaping with a control-V (or control-Q). Action: Add a clear description of the "rhs" string based on the above interpretation. _______________________________________________________________________________ Sect 5.10.7.2.21 (ex) OBJECTION. page 557: Problem: [line folding of "print" command] The following description is not appropriate, "Long lines shall be folded; the length at which folding occurs is unspecified, but should be appropriate for the output device.", because of the following reasons: 1) Without an appropriate notice, a multi-column character may happen to be split into unappropriate pieces. 2) the word "appropriate for output device" is not appropriate for this standard. Action: Change the current description about line folding as follows. "Long lines shall be folded; the length at which folding occurs is unspecified, however, the followings shall be applied: - A multi-column character at the folding position shall neither be separated nor be discarded. - Folding should be as appropriate as possible for the output terminal through the information from the COLUMNS environment variable and other characteristics of the terminal." _______________________________________________________________________________ Sect 5.10.7.2.37 (ex) OBJECTION. page 561: Problem: [appending "write" for non-existing file] It is not understandable why the result of an attempt to append (by >>) to non-existing file is implementation defined. [Although there is the fact that the (old) SVID specifies that it is an error.] This is not consistent with the shell's behavior (create a new file and append). Also, it is unclear whether this draft specifies the result of "w! >> file" (forced append) for non-existing file is implementation defined, too. Action: Change from "If the file does not exist, the result is implementation defined.", to "If the file does not exist, the file shall be created for initial append.", or to "If the files does not exist, and if the write is forced by the succeeding character !, then the file shall be created for append, otherwise, the result is implementation defined." _______________________________________________________________________________ Sect 5.10.7.2.47 (ex) COMMENT (editorial). page 564: Problem: Since the case of no buffer argument is being specified in the description paragraph, the followings are not correct: Synopsis: @ buffer Synopsis: * buffer Action: Change the above to Synopsis: @ [buffer] Synopsis: * [buffer] _______________________________________________________________________________ Sect 5.10.7.4 (ex) COMMENT (editorial). page 564: Problem: The last sentence of the first paragraph of 5.10.7.4 "Replacement Strings" is not sufficient regarding \( and \). Action: Change the last sentence to "The sequence \n, where n is an integer, shall be replaced by the text matched by the pattern enclosed in the n-th set of parentheses \( and \) earlier in the same RE string." _______________________________________________________________________________ Sect 5.10.7.4 (ex) COMMENT (editorial). page 564: Problem: The second line of the second paragraph of 5.10.7.4, "(using the \& or \)" could be typo. Action: Change this to: "(using the & or \)" _______________________________________________________________________________ Sect 5.10.7.5.9 (ex) OBJECTION. page 567: Problem: The description of the "magic" command: "If magic is set, change the interpretation ..." is not appropriate. It is misleading and rather a reader may interpret it in a totally different way from the actual intention. Action: Replace this paragraph entirely with the following. "If magic is set [default], several characters have their special meaning for regular expression notation as described in the Regular Expression subsection. If nomagic is set, such characters (except ^ at the beginning of a pattern, $ at the end of a pattern, and \) shall be treated as ordinary characters unless preceding a \; when preceded by a \ they shall regain their special meaning." _______________________________________________________________________________ Sect 5.10.7.5.20 (ex) OBJECTION. page 569: Problem: [unit of "shiftwidth"] The unit of the "shiftwidth" value is not clearly described. Action: Explicitly add a unit of "shiftwidth" as follows. "The value of this option shall give the width in columns (neither in characters nor in bytes) of an indentational level used during autoindent and by the shift commands." _______________________________________________________________________________ Sect 5.10.7.5.23 (ex) OBJECTION. page 569: Problem: [unit of "tabstop"] The unit of the "tabstop" value is not clearly stated. Action: Explicitly add a unit of "tabstop" as follows. "The value of this option shall specify the software tab stops in columns (neither in characters nor in bytes) to be used by the editor to expand tabs input." _______________________________________________________________________________ Sect 5.11.5.3 (expand) OBJECTION. page 573: Problem: In the last sentence of the LC_CTYPE paragraph, the following clause is not necessarily needed, "on a constant-width-font output device." because 1) assumption of such characteristics on column width of each character is described in 2.2.2.36 "column position" more appropriately, and 2) although each "printable" character in the portable character set is defined to have a column width of one, all other characters are assumed to have integral column widths; one, two, ... and so on, and 3) the clause "on a constant-width-font output device" may mislead that all the supported characters are associated with a single font of a constant width. Action: Change the last sentence to: " ..., and for the determination of the width in column positions each character would occupy on an output terminal." _______________________________________________________________________________ Sect 5.18.7 (more) OBJECTION. page 599: Problem: [ handling in "more" command] The description of the first dash of page 599 says: A character, followed first by a , then by an underscore (_), shall cause that character to be written as underlined text, ... It implicitly assumes that one moves the current horizontal position back by one CHARACTER when output. However, the "fold" and "expand" commands assume that one moves the current position by one COLUMN (see page 301 for "fold" and page 572 for "expand"). It is OK if only single-column characters are handled, but if there is a multi-column character, the specification above will cause inconsistency. For example, if AA is a two column character, to underline the character, you should write: AA_ in the "more" sense. According to the "fold" description, however, you should write: AA__ to underline the character. Action: Change the first dash and the second dash of page 599 to: -- A character, followed first by n (where n is the same as the number of column positions that the character occupies) s, then by n underscores (_), shall cause that character to be written as underlined text, if the terminal type supports that. The n underscores, followed first by n s, then any character with n column positions, also shall cause that character to be written as underlined text, if the terminal type supports that. -- n s (where n is the same as the number of column positions that the previous character occupies) that appears between two identical printable characters shall cause the first of those two characters to be written as emboldened text (i.e., visually brighter, standout mode, or inverse-video mode), if the terminal type supports that, and the second to be discarded. Immediately subsequent occurrences of s/character pairs for that same character also shall be discarded. (For example, the sequence a\ba\ba\ba is interpreted as a single emboldened a.) _______________________________________________________________________________ Sect 5.26 (string) OBJECTION. page 633: Problem: [unit of "string length"] The unit of the "string length" value is not clearly stated. Action: Explicitly add a unit of "string length" as follows. -n number -number (Obsolescent) Specify the minimum string length in character counts, where the number argument is a positive decimal integer. The default shall be 4 (characters). _______________________________________________________________________________ Sect 5.32.5.3 (unexpand) OBJECTION. page 651: Problem: In the last sentence of the LC_CTYPE paragraph, the following clause is not necessarily needed, "on a constant-width-font output device." because 1) assumption of such characteristics on column width of each character is described in 2.2.2.36 "column position" more appropriately, and 2) although each "printable" character in the portable character set is defined to have a column width of one, all other characters are assumed to have integral column widths; one, two, ... and so on, and 3) the clause "on a constant-width-font output device" may mislead that all the supported characters are associated with a single font of a constant width. Action: Change the last sentence to: " ..., and for the determination of the width in column positions each character would occupy on an output terminal." _______________________________________________________________________________ Sect 5.33.5.3 (uudecode) OBJECTION. page 654: Problem: The data created by the uuencode shall be locale-independent for its wide-range portability. See another objection to the uuencode utility for more detailed discussions. From this point of view, LC_CTYPE dependency for input files of the uudecode utility is not appropriate. Action: (1) Delete LC_CTYPE dependency for input files in 5.33.5.3 on page 654. (2) If the decode_pathname in the "begin" line includes non portable finename characters, because the uuencode utility does not provide any means to include/attach the encoding (locale) information (beyond default encoding ISO 646 or ISO 646 compatible) for such decode_pathname, Add the following sentence after the second paragraph of 5.33.2 Description section: "If the pathname of the file to be produced are encoded in the different codeset (locale), the result is unspecified." _______________________________________________________________________________ Sect 5.34.6.1 (uuencode) OBJECTION. page 655-657: Problem: [Encoding of "uuencode"] The original design of the uuencode utility of UNIX version seems to have the following two aims. (A) to convert a binary file into a "printable" text data format in ASCII or in ASCII-compatible code. (B) to create a portable interchange format of a binary file for delivery as a text data through an appropriate transfer mechanism (like the mailx utility). It is observed that the current draft took only the aim (A) and tried to eliminate its "ASCII"- dependency by changing (A) to the following (A'): (A') to convert a binary file into a "printable" text data format in the CURRENT LOCALE. The aim (B) seems to be discarded by the POSIX uuencode utility. More precisely, the current draft specifies the encoding algorithm as follows: Step 1: Take three octets as input and write four characters of output by splitting the input at six-bit intervals into four octets, containing data in the lower six bits only. Step 2: These octets shall be converted in the range 0x20-0x5f (, and then it shall be assumed to represent a printable character in the ISO/IEC 646 encoded character set). Step 3: It then shall be translated into the corresponding character codes for the code set in use in the current locale. Step 4: Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45. Note that the translation processes in Step 2 and in Step 4 are quite new requirements introduced by POSIX.2 UPE. From another point of view, however, such specifications (current locale dependency of encoded value) introduce another difficulty for data interchange by the uuencode/uudecode utilities, simply because a recipient (uudecode) should know in what locale (or codeset) the data was uuencoded. That is, in terms of the aim (B), it is NOT a good idea to translate the converted octets in the range 0x20-0x5f to/from the corresponding (printable) characters coded in the current locale. In such systems (or in such locales) that the code range 0x20-0x5f does not correspond to the "printable" characters and if a user wants to have a printable output in his/her locale, translation of the "uuencoded" data in range 0x20-0x5f to an appropriate "printable" one should be done by another utility such as tr, iconv and so forth. In other words, such translation should be out of scope of the uuencode/uudecode utilities. One possible compromising solution would be to introduce a special option flag for such locale-sensitive translation. Action: Taking the both aims (A) and (B) into a consideration, as an international standard, define the POSIX uuencode utility as an ISO-646 version (except a pathname encoding, see the detailed discussion below), in stead of as a locale sensitive version, i.e., the aims of the POSIX uuencode utility should be interpreted as follows: (A") to convert a binary file into a "printable" text data format in ISO 646 or in ISO 646 compatible code. (B) to create a portable interchange format of a binary file for delivery as a text data through an appropriate transfer mechanism (like the mailx utility). Detailed proposed actions are: (1) Drop a whole locale-sensitive translation specification from the encoding algorithm of the uuencode utility. (2) Delete LC_CTYPE dependency for input files in 5.34.5.3 on page 656, because the data of input files shall be regarded as binary data. (3) Since the usage of a term "character" in the algorithm of Step 1 and Step 4 is not appropriate, change a) from "Take three octets as input and write four characters of output .." to "Take three octets as input and write four bytes of output .." b) from "Each encoded line shall contain a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45." to "Each encode line shall starts with a length byte, equal to the number of bytes to be decoded plus 0x20, followed by the encoded bytes. The maximum number of octets to be encoded on each line shall be 45." (4) Since it is unclear whether a byte count of each line includes a trailing newline, add a clear description about a byte count of each encode line ("newline inclusive" is expecting, tough). (5) Change the first paragraph of 5.34.6.1 Standard Output: from "The standard output shall be a text file (encoded in the character set of the current locale) that begins with the line: "begin %s %s\n", , decode_pathname and ends with the line: "end\n" " to "The standard output shall be a text file (encoded in ISO 646 or ISO 646 compatible codeset) that begins with the line: "begin %s %s\n", , decode_pathname and ends with the line: "end\n" " (6) Per similar discussions of (extended) tar/cpio format about pathname encoding support beyond ISO 646 in POSIX.1 Section 10, add the following description after the second paragraph of 5.34.6.1 Standard Output on page 657 ("In both cases, ..") : "For maximum portability between implementations, decode_pathname should be selected from characters represented by the portable filename character set as 8-bit characters with most significant bit zero. If an implementation supports the characters outside the portable filename character set in names for decode_pathname, one ore more implementation-defined encodings of these characters shall be allowed for decode_pathname. However, if a decoding system (a system where the uudecode utility runs) does not support such special encodings, the results of uudecode are unspecified." _______________________________________________________________________________ Sect 5.35.7.1 (vi) OBJECTION. page 664: Problem: [Definition of "bigword"] As an international standard, the current definition of "bigword" is not appropriate. See for more details. POSIX.2 is expected to provide more internationalized version of the vi (and ex) utility. Action: Consider the proposed extensions of the ex/vi utility in another comment in this ballot, , as one of such potential solutions. To add "in the POSIX Locale" would be another solution in the worst case, but it does not provide any means and any solution for "word" handling in non POSIX Locale. _______________________________________________________________________________ Sect 5.35.7.1 (vi) OBJECTION. page 666: Problem: [Definition of "word"] As an international standard, the current definitions of two kinds of "words" are not appropriate. See for more details. POSIX.2 is expected to provide more internationalized version of the vi (and ex) utility. Action: Consider the proposed extensions of the ex/vi utility in another comment in this ballot, , as one of such potential solutions. [Note that the original two definitions can be supported by their default values.] _______________________________________________________________________________ Sect 5.35.7.1 (vi) OBJECTION. page 664: Problem: [definition of "bigword" in "vi"] The definition of "bigword" is too English specific and may not be appropriate in non-English locale. For example, in Japanese, words are not delimited by characters. Action: Add "In the POSIX Locale," at the beginning of the paragraph. _______________________________________________________________________________ Sect 5.35.7.1 (vi) OBJECTION. page 666: Problem: [definition of "sentence" in "vi"] The definition of "sentence" is too English specific. For example, in Japanese, IDEOGRAPHIC FULLSTOP can also delimit a sentence. Action: Add "In the POSIX Locale," at the beginning of the paragraph. _______________________________________________________________________________ Sect 5.35.7.1 (vi) OBJECTION. page 667: Problem: [exceptions in "vi" command] There is another exception of column position: multi-column character. For example, in Japan, Japanese characters usually occupy two columns. When setting current position on such multi-column character, for most implementation of Japanese editors including Japanized or internationalized version of "vi", cursor position is set to the first column of multi-column character and the cursor will never move to the second column of multi-column character. Cursor position value should not be altered by this, the same as the example described in the page 667. Action: Add the following sentences after the end of second paragraph of the page 667: "Another exception is if the current column position is on a character that occupies two or more columns and the column position is at the second or subsequent column of the character. In this case, the cursor shall be placed on the first column of the multi-column character, but the current column position value shall not be altered by this, the same as described above." _______________________________________________________________________________ Sect 5.35.7.1.12 (vi) EDITORIAL COMMENT. page 670: Problem: The wording "space characters" is ambiguous. If you mean only characters and do not mean other space-like characters such as s, please write so explicitly. Action: Replace "space characters" with " characters" _______________________________________________________________________________ Sect 5.35.7.1.32 (vi) EDITORIAL COMMENT. page 675: Problem: The wording "wide character" is ambiguous. If you meant a character which occupies two or more columns, "Multi-column character" is more appropriate. Action: Replace "wide character" with "multi-column character" _______________________________________________________________________________ Sect 5.35.7.1.44 (vi) OBJECTION. page 679: Problem: [upper/lower case conversion in "vi"] The sentence "Lowercase alphabetic characters shall be changed to uppercase and uppercase characters changed to lowercase." would be inapplicable in some non-English locales. For example, German character (LATIN SMALL LETTER SHARP S), which is a lowercase letter, has no uppercase counterpart. And the sentence "This command shall have no effect on non alphabetic characters." is inappropriate because POSIX.2 allows alphabetic characters (the "alpha" character class) which does not have uppercase/lowercase difference. Action: Replace "Lowercase alphabetic characters shall be changed to uppercase and uppercase characters changed to lowercase. This command shall have no effect on non alphabetic characters." with "Lowercase alphabetic characters which have uppercase counterparts shall be changed to uppercase characters and uppercase characters which have lowercase counterparts changed to lowercase, as prescribed by the current locale. This command shall have no effect on characters which have no uppercase/lowercase counterparts." _______________________________________________________________________________