From D.Cannon@exeter.ac.uk Fri Dec 10 00:22:01 1993 Received: from osiris.dknet.dk by dkuug.dk with SMTP id AA29351 (5.65c8/IDA-1.4.4j for ); Thu, 9 Dec 1993 23:22:13 +0100 X400-Received: by mta osiris.dknet.dk in /PRMD=minerva/ADMD=dk400/C=dk/; Relayed; Thu, 9 Dec 1993 23:21:19 +0100 X400-Received: by /PRMD=uk.ac/ADMD= /C=gb/; Relayed; Thu, 9 Dec 1993 23:22:11 +0100 X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed; Thu, 9 Dec 1993 23:22:05 +0100 X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed; Thu, 9 Dec 1993 23:22:01 +0100 Date: Thu, 9 Dec 1993 23:22:01 +0100 X400-Originator: D.Cannon@exeter.ac.uk X400-Recipients: sc22wg15@dkuug.dk X400-Mts-Identifier: [/PRMD=uk.ac/ADMD= /C=gb/;<22751.9312092222@cen>] X400-Content-Type: P2-1984 (2) Content-Identifier: Report of SC2... From: D.Cannon@exeter.ac.uk Message-Id: <22751.9312092222@cen> To: sc22wg15@dkuug.dk Subject: Report of SC22/WG9 Guide 25 meeting Acknowledge-To: D.Cannon@Exeter.ac.uk X-Organisation: University of Exeter, Computer Unit Content-Type: text Content-Length: 26857 X-Charset: ASCII X-Char-Esc: 29 Just received this from Brian Meek re the SC21 Guide 25 meeting, held in London, 30-November to 2-December. I wasn't aware that the meeting was coming up so soon after the WG15 meeting, and it doesn't read as though Brian had much input from any of us in the Posix world prior to the meeting - apologies for the lack of early action on my part at least are due here. Brian is looking for Posix feedback for the second meeting, tentatively scheduled for September 1994, as he's aware that Posix test methodology is impacted by the work. What he would most like is a person familiar with Posix TM work at the next meeting - I can try to line a UK body up, but the May WG15 meeting might reasonably decide that a (more influential?) representative of WG15/P2003.3 would be more appropriate. I'd suggest that WG15 discuss this in May and ensure the nominee, whoever is chosen, be well briefed for the SC21/WG9 meeting. Best wishes to each and all for Christmas and the New Year! Cheers, Dave Cannon. From udaa000@hazel.cc.kcl.ac.uk Thu Dec 9 12:02:47 1993 Date: Thu, 09 Dec 1993 12:01:44 EST From: Brian To: "cbs%nsfnet-relay::dk.dkuug::sc22"@hazel.cc.kcl.ac.uk Cc: dr@osg.national-physical-lab.co.uk, udaa000@hazel.cc.kcl.ac.uk, d.cannon@exeter.ac.uk Subject: SC21/WG9 - informal report of SC22 attende This is an advance copy of my report on the Guide 25 meeting last week. A better-looking but still flawed copy has been posted to Joe Cote for distribution. It has things in like significant italics, which you'll have to guess at here. HOWEVER, please note that in my haste I mistyped by postcode in that document - please note the correct one in the body of the report. BLM Report to SC22 of the SC21/WG9 meeting, Interpretation of ISO/IEC Guide 25, London, 1993-11-30/12-02 Brian Meek, UK This report does not claim to be a faithful record of the meeting, which will be in the minutes. It reports my personal views, highlighting matters that seem of interest to SC22. I attended this meeting as SC22 observer, though I found on arrival on the first day that unknown to me IST/21, the UK equivalent to SC21, had added me to the UK delegation without my knowledge or consent. Their authority for doing this is questionable since it is supposed to be a joint working group, but at least it removed any doubt there may have been as to whether I could take part in the discussions. In fact as a result of my reading the draft (JTC1/N2527, SC22/N1369), the first day's discussions and talking with the convenor, I produced overnight some personal comments, from the point of view of a "languages person" but with the necessary disclaimers about their status. These comments, annotated with some updating from the meeting, are reproduced as annex 1 to this report. As it turned out the whole time of the meeting was taken up with resolving the national body comments already received, so the WG decided to circulate my comments (within SC21/WG9 only) for review by others. However, I was asked to relay the request that SC22 itself reviews my comments and send in their reactions to them, plus any other comments of their own. At present this does not need to amount to a formal "SC22 position", which is not expected; it is simply considered that wider review and comment from within the language standards community would be desirable. Hence this is not a matter of letter balloting and deadlines, merely a request to make your views known. If these could be sent to me, I will pass them on, and collate them and produce summaries if that appears to be useful. Participation As well as the convenor (Dave Rayner from NPL UK, a colleague of Roger Scowen and Brian Wichmann) and myself, there were two delegates from Canada, two from Germany (from the German information security agency, confusingly - to me - designated BSI!), one from Italy (from CIMECO, an Italian - but private - accreditation agency) and one further UK delegate (from the UK testing agency NAMAS, also based at NPL). There were apologies from France and Sweden, while SC29 had appointed a liaison representative, albeit on a papers-only basis. Written comments had come only from Canada and Italy, and only Italian ones seemed to call for major changes. Canada, France, Sweden and UK had similar positions, namely that non-editorial changes should be as few as possible (in Sweden's case, none at all). Germany's position was that they wanted significant extensions (the security area being one) but not in this version, extensions anyway being precluded by the WG's terms of reference. So it mostly came down to resolving the Italian comments. To help the WG, the convenor summarised some work on extensions going on in Europe (in which wider participation had been invited) and the way SWG-CA (of which he is also convenor) was looking into it from the ISO side. It was agreed that the most on extensions that could come from this meeting would be a statement of what from the technical viewpoint of completeness of the document is most needed in the way of extensions. Major points The meeting first considered points regarded as major by Canada and Italy. Both has said that the document should be better aligned with the base document Guide 25. The solution eventually agreed was to expand the existing summaries at the start of each section to cover all subsections of the Guide itself, and then to follow with relevant interpretations, redesignated A, B, C etc to avoid having (say) two 7.2s under discussion, one in the Guide and the other in the interpretation. Canada also questioned the use of the terms "test tool", preferring "means of testing". It seems that "test tool" is used differently in different contexts, in some cases including the "test suite" (the set of all the test cases, e.g. in language processor validation suites, the test programs), in others (as in TR9547, as I pointed out), the tools are things additional to the suite. The Canadian term implies the first, more inclusive case. To avoid any ambiguity, this suggestion was accepted. Personally I am not very keen on the substitute, but apparently it is used in some areas, and it is better than ambiguity. There were three further major comments from Italy. One was concerned that the interpretation should not constrain or discourage testing laboratories from offering their own services separate from those specified in test methods standards, developing new testing methods, and entering into "Agreement Groups" with others, providing their own accreditation. This was accepted in the sense that laboratories could not be prevented from developing and marketing services, and if new testing methods were better than those prescribed, their use should not be inhibited. All that mattered was that testing reports and accreditation for something other than what the standards prescribed should be clearly identified as such. The next major Italian point concerned the concept of a "reference implementation", which they felt was inappropriate in many cases. It was pointed out that it was a widely-used technique which had proved its value and therefore the interpretation needed to address it, though detailed wording changes could be considered if needed to avoid false impressions. It was stressed that a reference implementation was not necessarily a "golden" implementation (i.e. perfect) and indeed having one with flaws had itself proved advantageous in some ways. (I have some comments of my own on reference implementations, but as explained above these were not addressed at the meeting.) The last major Italian point concerned the question of "expert analysis" after collection of test results being permitted to "modify" the verdict. It was pointed out that "modify" did not here mean "change", the idea being that a suite of test cases might produce a verdict, but subsequent further tests might caused the overall verdict to be different, e.g. "fail" instead of "pass", or "pass" instead of "inconclusive". Essentially this was because the original suite did not cover every eventuality. As far as its verdict was concerned, no "modification" was involved, it was just that it was incomplete. Furthermore, there were situations in which groups of reports needed to be interpreted, by "expert analysis", to arrive at a verdict. What mattered was that there were prescribed objective procedures to reach the interpretation, and that if there were any subjective elements to an assessment these should be clearly distinguished. Again, this point could be covered by detailed wording changes where appropriate. Detailed proposals The rest of the meeting was taken up with discussion of the detailed proposals for change in the Canadian and Italian comments. Since the detailed Canadian comments were almost all editorial, this was done by considering the Italian ones first and then applying the Canadian ones if still applicable after wording changes. The detailed Italian proposals in many cases turned out mainly to be for deletions, especially when relating to their point about testing laboratories and Agreement Groups, but were resolved mainly be rephrasing, regrouping and other wordsmithing. Much of this was done directly into the text using the convenor's portable machine, which helped to account for the time consumed, though the most complex changes were left as editing instructions to be implemented later. SC22 will see these detailed changes in due course so there is no point in recounting them in this report, and in any case I was not able to be present for the whole time. Some changes of interest to SC22 that affect my own comments are, as stated earlier, referred to in the updated annotations in annex 1. This leaves just two points of interest that I noted. This first relates to 7.2.1 in JTC1/N2527 where the Italian delegate, in presenting their comment, pointed out that a C compiler exhibited different behaviour on different instances of the same model PC from the same manufacturer, which turned out to be because a different processor chip was used. (Yet another argument for LIA?) So a sentence was added to the guidance column to say that in some specific cases it is known that even components need to be identified. Perhaps SC22 people with similar experiences would like to comment, and suggest anything more that might need to be said, remembering that these interpretations are very general. The second relates to 10.2 in Guide 25, 10.5 in JTC1/N2527, where the UK NAMAS delegate pointed out that it applied to cases involving floating point arithmetic and similar approximate processes, and so this need to be mentioned. It is a point which I ought to have picked up myself, especially as a WG11 member, though in self- defence I point out that I have never had a copy of Guide 25 itself and took the statement in JTC1/N2527 at face value without realising its implications. Wrap-up At the end of the meeting it was agreed that a draft of the revised text would be circulated quickly, before Christmas, to those present and other members of SC21/WG9, to check that the editing had been correctly implemented and to correct any residual obscurities and ambiguities. The final revision would then go for letter ballot, which is likely to be handled by the SC21 secretariat in consultation with the SC22 secretariat and others. It is likely that this ballot period will not expire before the end of May 1994. Given that they will then have to be forwarded to SC21/WG9 and circulated, and that the SC21 plenary is in July followed by the holiday period, as well as other commitments, it was decided that the meeting could not be before September 1994 but avoiding the week of the SC22 plenary. Assuming this analysis is correct, 1994-09-06/09 would be the preferred dates with 1994-09-26/28 as the fallback. The default location is London again unless an invitation comes from another host. Recommendations to SC22 As well as asking for comments as soon as possible on the points I raise in annex 1, I recommend that attempts be renewed to find a full liaison representative of SC22 for SC21/WG9. In particular, there are numerous references to Posix testing in the document and I was not sufficiently familiar with that work to offer helpful comment. The ideal kind of person would therefore be a member of WG15 especially interested in the Posix test methods standards but also knowledgeable about compiler testing, and preferably with interests and experience in languages in addition to C, for example also in the other language bindings to Posix. This is a lot to ask of one individual but SC21/WG9 would benefit from such expertise from SC22. I will of course be available to provide detailed background briefing to the liaison representative, and should the meeting be in London I would try to be available during the meeting for back-up and consultation - though the dates for my 1994 summer holiday are not yet fixed. Brian Meek, 5 December 1993 Please send comments by electronic mail if possible. The address is b.meek@hazel.cc.kcl.ac.uk If you cannot send comments electronically, please post them as soon as possible to B.L. Meek, Computing Centre, King's College London, Strand, London WC2R 2LS, UK [Annex 1] Comments on JTC1/N2527, interpretation of ISO/IEC Guide 25 Brian Meek, UK, JTC1/SC22 observer Note that these are simply comments of an individual. They are not endorsed by UK or by SC22 and are advanced only in the spirit of attempting to aid SC21/WG9 in its work. They are all essentially editorial in nature. [Annotations updating these comments following the SC21/WG9 meeting in November-December 1993 appear in square brackets and italicised. Otherwise the text is unaltered apart from right-aligning and correction of minor errors.] General 1. The term conformance is used throughout whereas conformity (mentioned only in the glossary) is surely the normal ISO term? [I have since noted that the term "comply" is also used, as well as "conform".] 2. The document refers to compilers whereas in SC22 the usual term is (programming) language processors. The rationale for this is that compiler is a rather restrictive term implying a conventional language implemented as software in a traditional batch- oriented environment. Language processors include interpretive implementations that are used interactively, hardwired implementations and so on. (There can be more connotations, to be mentioned later.) Admittedly by far the greatest experience is with testing and validation of compilers rather than other kinds of implementation. Interpretive, interactive implementations (and other interactive software) can and have been tested, and can be tested to some extent automatically through use of pre-prepared scripts. However, it may be difficult to predict in advance the path that the interaction may take, the specific responses that will come from the processor which the automated script should recognise, and so on. At best, analysis (and probably human analysis) of the post-test record of the interaction will be needed to determine the result of the test, and at worst it will need to be conducted manually, with all the attendant costs, risk of human error, subjective reactions, and so on. It is not suggested that the interpretation document should be written so as to cover such situations, only that it should not assume that the usual style of testing compilers is adequate for everything. Particular 7.1 Separation of data partitions 3. TR9547 (the SC22 DR on test methods for language processors) makes it clear that testing of a processor takes place in combination with a configuration and that testing cannot be meaningfully discussed without relation to the configuration ("host and target computers, any operating system(s) and software used to operate a processor") on which the processor is running. That is, what is under test is not the processor alone but the processor-configuration combination. Probably this is not in conflict with the concepts in this section, but the reference to compiler testing may not make it clear. TR9547 recognises that you cannot just test the "Bloggsware compiler" that runs on Bloggsware machines with the Bloggsware operating system, without specifying what particular machine model, which OS release, settings of system options, etc. There is mention elsewhere of hardware etc dependence of compilers but it perhaps needs to be made clear that most language standards leave many aspects implementation-dependent - which comes down to configuration-dependent when testing. It happens far too much in my personal opinion, but it is a fact, however undesirable. 9.2.1 Use of reference implementations 4. While it is accepted that a reference implementation is not a model (ideal) but a (probably imperfect) sample, some consideration ought to be given to the possibility of using formal specification languages to define the standard and then to generate automatically a reference implementation (in a given environment, e.g. platform or "configuration" as above) which would consequently embody the standard. Such techniques are used for "fast prototyping" of applications packages and could be used for language processors and other software - provided the standards used formal specification languages throughout. This is (very regrettably in my personal view) very much the exception rather than the rule in SC22 standards and I believe elsewhere, though some standards are coming up (at or about to move to DIS stage) which do use formal specification languages comprehensively. This is a rather different point to that of potentially unrepresentative "model" implementations which is the subject of the second paragraph of the guidance to 9.2.1. 5. I have also a slight concern about the assumption of one reference implementation, in the case of language processor testing. There is mention in 9.2.1.1 of "reasonable coverage" of options but it may not be fully appreciated how extensive these can be in language standards, which are among the most complex standards around. I have already mentioned the problem of implementation-dependence but added to that there commonly are explicit options or "levels" where there is in fact a "core" language that must be provided and a number of optional modules (sometimes a profusion of them) which are optional. (Graphics standards I believe exhibit similar characteristics.) Exactly what constitutes "reasonable coverage" in such circumstances is by no means obvious, because a reference implementation which omits particular modules will often be near-worthless for reference purposes when dealing with other implementations. It goes far beyond the use of a particular combination of parameters, for example. It could in fact be argued that every possible combination of optional modules or levels constitutes a different language with different (if related) standards; there is certainly a strong case that enough reference implementations (plural) are available to cover all the various modules or levels at least once. Most of the instances cited in the text of compiler validation refer to Ada (from which levels are banned) and Pascal (where there are only two, in the cases cited - the different standard for Extended Pascal is a separate issue). It might be worth canvassing opinion from test centres who have performed Cobol or GKS validations over a lengthy period. I am uneasy at the document leaving this issue unaddressed. You can very often target test cases in the test suites to particular optional modules, but with only one reference implementation allowed, those modules not present in it would not have the same benefits as those that are. If reference implementations are really so valuable, why deprive some parts from their benefits, for the sake of the principle of a single reference? Is it really so important to have a single reference? Cannot the comparisons between a number of them be themselves of value? [This point is partly addressed in 4.A of the revised document, replacing 4.1 of JTC1/N2527, but I still this it needs more emphasis in 9.2, in relation to the concept of levels of conformity discussed above.] 6. Further to the "reasonable coverage" concept, quite apart from modules or levels defining further facilities which need to be tested if present, more generally it is surely the case that what constitutes "reasonable coverage" will much depend on the nature of the options themselves, how they interact with what isn't optional and among themselves, and above all what the conformity rules say in relation to them. The former SC22 WG12 Conformity and Validation, which produced TR9547, also produced DR *** giving guideline for preparing conformity clauses, since it was recognised from the first that the two go together, indeed are inextricably linked. Another SC22 WG, WG10, was at the same time producing TR10176 which gives guidelines for standards generally. This DR tackles the options and levels problem too, and advises standards committees to reduce their number to a minimum, and if possible eliminate them altogether. It adds some guidance about how to achieve (in effect) damage limitation when options are unavoidable. The document as it is seems rather to underplay the importance of getting the conformity requirements specified properly, even before you start testing. In that there sometimes seems to be a tendency to gloss over it in standards development, or even try to offload it by saying "that's a matter for validation services", I would rather it were given more attention - somewhere, not necessarily here. 7. Concerning the second guidance paragraph accompanying 9.2.1.1, is the term "instrumented" reference implementations defined anywhere? Just reading this section, it is hard to tell what is meant. 9.3 Validation of test software in the system under test 8. The first paragraph of guidance says that for compiler testing "the whole test suite" needs to be mounted before being submitted to the compiler under test. Why? If the suite consists of a set of cases (programs) why must they all be present before any are run? Experts may know without being told, but if they know already they don't need this guidance here! It may well be convenient to have them all in place, but the implication is that the testing is invalidated if you run the cases one at a time. 10.2 Repeatability and reproducibility 9. My main point here is that there are circumstances in which randomly generated tests are valuable. You can uncover faults that it never crossed anyone's mind might be there. I assume that this is not precluded provided that what is randomly generated is recorded or is otherwise reproducible, but it might be worth making this explicit. (A small piece of lateral thinking: at least one language standard has two different random number generator functions, one of which is reproducible - and explicitly for (program) testing purposes - while the other is unreproducibly random.) 10.Under 10.2.2, I strongly support the concept of evaluation services such as that for Ada which "reaches the parts that validation cannot" (because of conformity requirements or limitations thereof, and because they can address subjective issues which standards quite properly cannot address). However, I do not see the relevance of mentioning them in this document, where it seems to introduce irrelevant considerations (albeit important ones). Yes, reproducibility is then difficult - but this document is not about such matters. A point may still need to be made, but using evaluation services as an example is misleading, and possible even counter-productive. I suggest removing the second sentence of the guidance, and rephrasing to repair the hole that is left. [An Italian comment also expressed dislike of the reference to the Ada evaluation service so that has now been removed. It is replaced by stating as an example that some standards specify a requirement to provide documentation of certain things, and some judgement may be involved in deciding whether what is provided meets the intent of the standard, though reproducibility should still be sought by giving objective criteria for reaching that judgement.] 11.In relation to 10.2.3, the discussion above about options and implementation-dependence is relevant again here. TR10176, incidentally, advises explicitly mentioning in a standard what is implementation- dependent (instead of leaving it unmentioned, to be discovered through the omission), and where possible making things implementation-defined so that what the implementations does must be documented (for the benefit of users, and of testers). 10.3 Internationally agreed test methods 12.Please add reference to TR9547, for language processors (and in Annex 2). [This has already been done.] 13.1 Objectivity 13.I do not find 13.1.8 very clearly expressed. If something fails (or doesn't pass) it is the client who is presumably dissatisfied so the onus is on the client to check why. It may be a fault in the product, but it could be a fault or limitation in the testing - e.g. not allowing for the "expedited data" possibility given as an example. Annex 1, Glossary 14.The word "system" is too vague and is best eliminated entirely from the document if possible. For example, English being the language it is, "test system" could either mean the system being tested or the system doing the testing. Could you use "test assembly" for what does the testing? That seems to me likely to be less ambiguous. "Test tool" is all right but "test aid" might distinguish it more from the test suite or a test case; and surely under "test system" currently it must be test suite plus the test tools (plural) used to run it. -- _________________________________________________________________________ Dave Cannon University of Exeter, Computer Unit Laver Building, North Park Road Systems/Network Programmer Exeter, EX4 4QE, Devon, UK Phone: +44 (0)392 263956 (Changing soon!) Fax: +44 (0)392 211630 _________________________________________________________________________