From keld@dkuug.dk Thu Sep 5 20:10:15 1991 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA09720; Thu, 5 Sep 91 20:10:15 +0200 Date: Thu, 5 Sep 91 20:10:15 +0200 From: Keld J|rn Simonsen Message-Id: <9109051810.AA09720@dkuug.dk> To: wg15rin@dkuug.dk Subject: Ballot resolution X-Charset: ASCII X-Char-Esc: 29 I received this from Greger Leijonhufvud via other channels, I do not remember that this has been posted to the RIN list, so I post it, as it is very relevant to the work of RIN. /Keld ---- Here follows, for your perusal, the proposed changes to the locale section of the draft as a result of the 11.1 balloting. If you have any comments, please send to me... Also - we have received a proposal to allow only symbolic notation in localedef sources -- do you agree or disagree? It would increase portability.... Greger Leijonhufvud g.leijonhufvud@xopen CHANGES TO POSIX.2 DRAFT 11.1 (BALLOT RESOLUTION) ================================================= Chapter 2.2: =========== Replace line 367 with: "The character order, as defined for the LC_COLLATE category in the current locale (see 2.5.2.2), defines the relative order of all collating elements, such that each element occupies a unique position in the order. In addition, one or more collation weights may be assigned for each collating element; these weights are used to determine the relative order or strings in e.g. the sort utility." Chapter 2.4: =========== Replace the last sentence on lines 1303-1304 with: "The default character shall be the number sign (#). This declaration shall only be specified if the coded character set does not contain the number sign character." Replace lines 1346-1358 on page 55: "Decimal constants shall be represented by two or three decimal digits, preceded by the escape character and the lowercase letter d, for example, \d05, \d97 or \d143. Hexadecimal digits shall be represented by two hexadecimal digits, preceded by the escape character and the lowercase letter x, for example, \x05, \x61 or \x8f. Octal constants shall be represented by two or three octal digits, preceded by then escape character, for example, \05, \141 or \217. In a portable environment, each constant shall represent an 8-bit byte. Implementations supporting other byte sizes may allow constants to represent values larger than those than can be represented in 8-bit bytes, and to allow additional digits in constants. When constants are concatenated for multi- byte character values, they shall be of the same type, and interpreted in byte order from left to right. The manner in which constants are represented in the character is implementation defined. All bytes of the multi-byte character must be specified." Chapter 2.5: =========== Insert after the first sentemce on line 1606: This declaration shall only be specified if the coded character set does not contain the number sign character." Change lines 1620-1621 to: "Individual characters, characters in strings, and collating elements shall be represented using symbolic names, as defined below. In addition, characters may be represented using the characters themselves, or as octal, hexadecimal or decimal constants. When non-symbolic notation is used, the resultant locale definitions may not be portable between implementations or installations. The right angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it must be preceded by the escape character. The following rules apply to character representation:" Change line 1622-1625 to: "(1) A character can be represented via a symbolic name, enclosed within angle brackets (< and >). The symbolic name, including the qangle brackets, shall exactly match a symbolic name defined in the charmap file specified via the localedef -f option, and shall be replaced by the corresponding value from the charmap file." Change lines 1636-1642 (2) A character can be represented by the character itself, in which case the value of the character is implementation-defined. "Within a string, the double quote character, the escape character and the right angle bracket character must be escaped (preceded by the escape character) to be interpreted as the character itself. Outside strings, the characters " , ; < > escape_char must be escaped to be interpreted as the character itself." Change line 1644 to: "(3) A character can be represented as an octal constant." Change line 1649 to: "(3) A character can be represented as a hexadecimal constant." Change line 1654 to: "(3) A character can be represented as a decimal constant." Add after line 1662: "If a charmap file is present, only characters defined in the charmap shall be specified." Change last sentence on lines 1735-1737 to: "If not specified, the upper-case letters A through Z, as per Table 2-3 (see 2.4.1) shall automatically belong to this class, with implementation-defined character values." Change last sentence on lines 1740-1745 to: "If not specified, the lower-case letters a through z, as per Table 2-3 (see 2.4.1) shall automatically belong to this class, with implementation-defined character values." Change lines 1746-1747 to: "digit Define the characters to be classified as numeric digits. Only the characters 0 1 2 3 4 5 6 7 8 9 shall be specified, and in ascending sequence after numerical value. If this keyword is unspecified, an implementation defined sequence of characters shall belong to this class." Change lines 1751-1753 to: "If not specified, the characters , , , , , and as per Table 2-3 (see 2.4.1) shall automatically belong to this class, with implementation-defined character values." Change lines 1774-1776 to: "xdigit Defines the characters to be classified as hexadecimal digits.Only the characters defined for the class digit shall be specified, in ascending sequence after numerical value, followed by one or more sets of 6 characters representing the hexadecimal digits 10 through 15, with each set in ascending order (for example A, B, C, D, E, F, a, b, c, d, e, f). If this keyword is unspecified, an implementation defined sequence of characters shall belong to this class." Change lines 1787-1789 to: "If not specified, the lower-case letters a through z, as per Table 2-3 (see 2.4.1), and their corresponding upper-case letters A through Z, shall automatically be included, with implementation-defined character values." Add to rationale (line 1800 in draft 11): "The character classes lower, upper and space have a set of automatically included characters. These only need to be specified if the character values (i.e., encoding) differs from the implementation default values. The definition of character class digit requires that only 10 characters, the ones defining digits, can be specified -- alternate digits (e.g. Hindi or Kanji) cannot be specified here. However, the encoding may vary if an implementation supports more than one encoding. The definition of character class xdigit requires that the characters included in character class digit are included here also, and allows for different symbols for the hexadecimal digits 10 through 15." On line 1812, change "X" for space/cntrl to a dash. On line 1813, change "X" for cntrl/space and cntrl/blank to dash. On line 1818, change "X" for blank/cntrl to a dash. Replace 1936-1940 with: "(7) Ordering by Weights. When two strings are compared to determine their relative order, the two strings are first broken up into a series of collating elements, and each successive pair of elements are compared according to the relative primary weights for the elements. If equal, and more than one weight has been assigned, then the pairs of collating elements are recompared accoring to the relative subsequent weights, until either a pair of collating elements compare unequal, or the weights are exhausted." Delete 1943-1947. Delete lines 1958-1959. (substitute) Delete lines 2137-2157. (substitute) Delete lines 2180-2183. (substitute) Change lines 2184-2188 to: "position Specifies that comparison operations for the weight level shall consider the relative position of non-IGNOREd elementd in the strings. The string containing a non-IGNOREd element after the fewest IGNOREd collating elements from the start of the compare shall collate first. If both strings contain a non-IGNOREd character in the same relative position, the collating values assigned to the elements shall determine the ordering. In case of equality, subsequent non-IGNOREd characters shall be considered in the same manner." Change line 2191 to: "order_start forward;backward" Action: Begin new paragraph on line 2221. Change lines 2233-2234 to: "Collation shall behave as if, for each weight level, IGNORED elements are removed." Delete line 2288. Replace lines 2333-2334 in draft 11 with the following: "The directives that can be specified in an operand to the order_start keyword are based on the requirements specified in several proposed standards and in customary use. The following is a rephrase of rules defined for "lexical ordering in English and French" by the Canadian Standards Association (text is brackets is re-phrased): 1. Once special characters ([punctuation]) have been removed from original strings, the ordering is determinded by scanning forward (left to right) [disregarding case and diacriticals]. 2. In case of equivalence, special characters are one again removed from original strings and the ordering is determined scanning backward (starting from the rightmost character of the string and back), character by character, [disregarding case but considering diacriticals]. 3. In case of repeated equivalence, special characters are removed again from original strings and the ordering is determined scanning froward, character by character, [considering both case and diacriticals]. 4. If there is still an ordering equivalence after rules 1 through 3 have been applied, then only special characters and the position they accupy in the string are considered to determine ordering. The string that has a special character in the lowest position comes first. If two strings have a special character in the same position, the character [with the lowest collation value] comes first. In case of equality, the other special characters are considered until there is a difference or all special characters have been exhausted." Delete lines 2344-2364 (Draft 11 line numbers). Replace lines 2399-2401 with: "Previous drafts contained a 'substitute' statement, which performed a regular expression style replacement before string compares. It has been withdrawn based on balloteer objections that it was not required for the types or ordering this standard is aimed at." Lines 2453-2454: mon_decimal_point The operand is a string containing the symbol that shall be used as the decimal delimiter in monetary formatted quantities. In contexts where standards limit the decimal_point to a single byte, the result of specifying a multi-byte operand shall be unspecified. Lines 2455-2457: mon_thousands_sep The operand is a string containg the symbol that shall be used as a separator for groups of digits to the left of the decimal delimiter in formatted monetary quantities. In contexts where standards limit the mon_thousands_sep to a single byte, the result of specifying a multi- byte operand shall be unspecified. Lines 2464-2468: If the last integer is not -1, then the size of the previous group (if any) shall be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping is performed. Lines 2558-2561: 3;-1 123456'789 "\3\177" 3 123'456'689 "\3" 3;2;-1 1234'56'789 "\3\2\177" 3;2 12'34'56'789 "\3\2" -1 123456789 "\177" In the above example, the octal value of {CHAR_MAX} is 177. Lines 2530-2531: The currency_symbol does not appear in the LC_MONETARY category definition in the POSIX locale because it is not defined in the C Standard's {7} C locale. The C Standard {7} limits the size of decimal points and thousands delimiters to single-byte values. In locales based on multi-byte coded character sets this cannot be enforced, obviously; this standard does not prohibit such characters but makes the behavior unspecified. The grouping specification is based on, but not identical to, the C Standard {7}. The "-1" signals that no further grouping shall be performed (the equivalent of {CHAR_MAX} in the C standard). Lines 2572-2577: decimal_point The operand is a string containing the symbol that shall be used as the decimal delimiter in numeric, nonmonetary formatted quantities. This keyword must be specified and cannot be set to the empty string. In contexts where standards limit the decimal_point to a single byte, only the first byte shall be used. thousands_sep The operand is a string containg the symbol that shall be used as a separator for groups of digits to the left of the decimal delimiter in numeric, nonmonetary formatted monetary quantities. In contexts where standards limit the thousands_sep to a single byte, the result of specifying a multi-byte operand shall be unspecified. Lines 2584-2587: If the last integer is not -1, then the size of the previous group (if any) shall be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping is performed. Line 2594: decimal_point "" Change line 2797: NUMBER A decimal number, represented by one or more decimal digits. Change line 2831: %token CHAR %token NUMBER Change lines 2915-2919 to: opt_statements : opt_statement | opt_statements opt_statement ; opt_statement : /* empty */ | collating_symbols | collating_elements | substitutes ; Change 2922-2923 to collating_elements : 'collating-element' COLLELEMENT 'from' '"' char_list '"' EOL Change line 3012: mon_keyword_char NUMBER EOL mon_keyword_char '-1' EOL Delete lines 3027-3030. Change lines 3033-3035: mon_group_list : NUMBER | mon_group_list ';' NUMBER ; Change lines 3060-3062: num_group_list : NUMBER | num_group_list ';' NUMBER ; Delete lines 3063-3066 Change line 3067: LC_TIME -> LC_NUMERIC In 4.35 (localedef) change on line 7786 the words "charmap file" to "character mapping".