From donn@hpfcrn.fc.hp.com Thu Jul 18 23:48:49 1991 Received: from relay.hp.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA26195; Thu, 18 Jul 91 23:48:49 +0200 Received: from hpfcrn.fc.hp.com by relay.hp.com with SMTP (16.6/15.5+IOS 3.13) id AA20486; Thu, 18 Jul 91 14:48:27 -0700 Message-Id: <9107182148.AA20486@relay.hp.com> Received: from hpfcdonn.fc.hp.com by hpfcrn.fc.hp.com with SMTP (16.7/15.5+IOS 3.22) id AA10796; Thu, 18 Jul 91 15:47:50 -0600 To: xojig%xopen.co.uk@hplb.hpl.hp.com, wg15rin@dkuug.dk Subject: Comments from Karels Date: Thu, 18 Jul 91 15:47:49 MDT From: Donn Terry X-Charset: ASCII X-Char-Esc: 29 (Forwarded with permission) I thought you might find this message interesting. If you have comments or responses please respond directly to Mike, or to the original list or this list, as appropriate. I'm not able to forward or condense any responses. (Being sent by Fax to Toronto, as well.) Donn ------- Forwarded Message Replied: Thu, 18 Jul 91 14:14:34 MDT Replied: karels@okeeffe.Berkeley.EDU (Mike Karels) Replied: ballot2@okeeffe.Berkeley.EDU Replied: pc@hillside.co.uk Replied: posix.2@mks.com Replied: rabin@osf.org Replied: kuro@corp.sun.com Replied: seth@attunix.att.com Received: from hpfcla.fc.hp.com by hpfcrn.fc.hp.com with SMTP (16.7/15.5+IOS 3.22) id AA08430; Thu, 18 Jul 91 13:59:28 -0600 Return-Path: Received: from okeeffe.Berkeley.EDU by hpfcla.fc.hp.com with SMTP (15.11.1.6/15.5+IOS 3.20) id AA02333; Thu, 18 Jul 91 13:57:36 -0600 Received: from rip.CS.Berkeley.EDU by okeeffe.Berkeley.EDU (5.66/1.41) id AA00373; Thu, 18 Jul 91 12:52:08 -0700 Received: by rip.CS.Berkeley.EDU (5.67/1.42) id AA04590; Thu, 18 Jul 91 12:52:01 -0700 From: karels@okeeffe.Berkeley.EDU (Mike Karels) Message-Id: <9107181952.AA04590@rip.CS.Berkeley.EDU> To: ballot2@okeeffe.Berkeley.EDU Cc: pc@hillside.co.uk, posix.2@mks.com, rabin@osf.org, kuro@corp.sun.com, seth@attunix.att.com Subject: 1003.2 D11.1 resolution on internationalization Date: Thu, 18 Jul 91 12:51:59 PDT Hi, folks. I had meant to get this out earlier, but events have conspired against me. I'm sending this message to various people that have been involved in past correspondence on 1003.2 issues, especially issues relating to internationalization. I have a number of unresolved objections to 1003.2 Draft 11 in the area of internationalization (locale, collation, and regular expressions) that I think point out serious technical flaws in the draft. These have not been resolved, although most of them have been outstanding for several drafts. The response now is that specific things are required by various international groups, or that it would reduce consensus. (That, or admonishment that application developers who find the existing rules to violate the rule of least astonishment should limit their applications to local use, or supply a localedef definition along with their applications! Objection 068-20) Because of the newness and complexity of the locale issues, I strongly suspect that most of the balloting group is not looking at these sections very closely. I find it hard to believe that most reasonable, technically knowledgeable people accept some of the things that remain in the draft. Therefore, I'm asking the addressees and anyone else in the 1003.2 balloting group to look at certain of my unresolved objections. I would very much appreciate responses to me and to Hal (at least) indicating whether people agree or disagree with my objections, or whether they haven't looked at this in sufficient detail to know whether my requests are correct (i.e., wouldn't object either way). I would especially appreciate any support of my unresolved objections in people's ballots on D11.1. The major issues that remain are: 1. The presence of full regular-expression-based substitution within the collation rules has not been justified by anything other than claims that various groups consider it a requirement. The only examples that have been provided (mapping Mc to Mac) don't work correctly, and no technical requirement based on internationalization has been given. (Objection 068-12) 2. The regular expression matching and non-matching lists (bracket expressions) can match multicharacter collating elements as well as (single- or multi-byte) characters. The examples all list character combinations which should be treated as a single character such as or . I claim that these examples are all solved more correctly by treating those two-byte sequences as multibyte characters rather than two-character collating elements, and then everything works as expected without this modification to bracket expressions. Otherwise, this will lead to surprises for all; even the Germans won't expect [p-t] to include , just because is defined as a collating element in an equivalence class with . Worse yet, the regular expression [^s] would match ! (Objection 068-19) 3. Although ranges in matching lists (such as [p-t]) are noted as inherently non-portable and are prohibited to Strictly Conforming applications, they have been extended to allow equivalence classes as endpoints. This isn't even well-defined if the members of the equivalence class are not adjacent in the collation sequence. (There is no requirement in this draft that members of an equivalence class be contiguous in the collation sequence; a locale collation could contain, for example, ; ; ... ; ; ; ... ; in which case [[=a=]-d] isn't well-defined.) (Objection 068-21) There are other specific flaws from past drafts that have not been fixed yet; if this piques your curiosity, take a look at my other objections to sections 2.5 and 2.8 in the unresolved objections list. For example, I'm not fully convinced that the addition of the reverse and position collation keywords solves the problems that they are intended to solve; they should be used a word at a time, not a string at a time, and nothing in the standard is capable of sorting based on words (including the sort utility). Please feel free to pass this message on if you know of other balloters who might be interested. Thanks for your time, Mike ------- End of Forwarded Message