From donn@hpfcrn.fc.hp.com Tue Jun 4 16:09:41 1991 Received: from hpfcla.fc.hp.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA10101; Tue, 4 Jun 91 16:09:41 +0200 Received: from hpfcrn.fc.hp.com by hpfcla.fc.hp.com with SMTP (15.11.1.6/15.5+IOS 3.20) id AA26413; Tue, 4 Jun 91 08:07:56 -0600 Received: from hpfcdonn by hpfcrn.HP.COM; Tue, 4 Jun 91 08:09:23 -0600 Message-Id: <9106041409.AA04234@hpfcrn.HP.COM> To: wg15rin@dkuug.dk Subject: Interchange format Date: Tue, 04 Jun 91 08:09:21 MDT From: Donn Terry X-Charset: ASCII X-Char-Esc: 29 I was requested to make available a version of the proposed interchange format. Here's an nroff of the current version. Donn ------------------------------------------------------------------------------- (I have inserted the general IEEE copyright notice according to WG15RIN action, 1991-11-24 Keld Simonsen) Copyright (c) 1991 by the Institute of Electrical and Electronics Engineers, Inc. 345 East 47th Street New York, NY 10017, USA All rights reserved as an unpublished work. This is an unapproved and unpublished IEEE Standards Draft, subject to change. The publication, distribution, or copying of this draft, as well as all derivative works based on this draft, is expressly prohibited except as set forth below. Permission is hereby granted for IEEE Standards Committee participants to reproduce this document for purposes of IEEE standardization activities only, and subject to the restrictions contained herein. Permission is hereby also granted for member bodies and technical committees of ISO and IEC to reproduce this document for purposes of developing a national position, subject to the restrictions contained herein. Permission is hereby also granted to the preceding entities to make limited copies of this document in an electronic form only for the stated activities. The following restrictions apply to reproducing or transmitting the document in any form: 1) all copies or portions thereof must identify the document's IEEE project number and draft number, and must be accompanied by this entire notice in a prominent location; 2) no portion of this document may be redistributed in any modified or abridged form without the prior approval of the IEEE Standards Department. Other entities seeking permission to reproduce this document, or any portion thereof, for standardization or other activities, must contact the IEEE Standards Department for the appropriate license. Use of information contained in this unapproved draft is at your own risk. IEEE Standards Department Copyright and Permissions 445 Hoes Lane, P.O. Box 1331 Piscataway, NJ 08855-1331, USA +1 (908) 562-3800 +1 (908) 562-1571 [FAX] Section 10: Revisions to Data Interchange Format => Replace all of Section 10 with the following. 5 Editor's Note: This is a rather extensive rewrite of the previous 5 version of the proposed new interchange format. The key change is caused 5 by the fact that Access Control Lists (ACLs) as required by the 1003.6 5 Security standard could not be reasonably included in the previous 5 format. Given extensive nature of these changes, the rest of this 5 chapter does not have change bars. 5 This version reflects a complete draft, except in two areas: information about security is solicited from P1003.6, and information about high- performance files (and other file types) is solicited from P1003.4. The rationale here will ultimately be moved to Annex B. Most of the current Annex B will be discarded at that time, although some of the rationale from the ``Tar Wars'' should be brought forward to motivate this new format. The material below is under discussion between the VC and TE on where it should be located. In the meantime, so the text is not lost ... 1003.1-1990 (Draft 5), page 167, lines 11-14, Replace the sentence which begins "A process with ..." with the following: The mechanism shall provide an interface whereby a process with appropriate privileges is allowed to restore the ownership and permissions exactly as recorded on the medium, except that the symbolic user and group IDs are used for the tar format, as described in 10.1.1. This issue will be resolved in the near future. Each implementation shall be able to read and write an interchange format, as specified in this section. BEGIN_RATIONALE Rationale: This is a data interchange format for POSIX that meets the requirements of interchange that were discussed in the debates on whether the USTAR or CPIO formats were the appropriate vehicles for such interchange. It is also intended to meet the known additional requirements developed since those debates, and be extensible, in an upward and downward compatible way, to allow future file types. Additionally, it meets the requirements of ISO 1001 {2}, which is equivalent to ANSI X3.27-1987, when certain additional constraints are met. (X3.27 is an old standard, having been around in some form since at least the late 1960's. The 1987 revision appears to be the fourth revision.) This allows a minimal, but useful, level of interchange to systems conforming to ISO 1001 but not to this standard. The terminology and concepts taken from ISO 1001 that are needed for understanding this proposal are defined informally here, where the formal definition should be taken from the standard itself. Such terms are flagged the first time they are used with ``(1001),'' and excerpts from that standard are quoted. In a few instances, definitions taken directly from ISO 1001 {2} are included in this rationale. END_RATIONALE 10.1 Archive/Interchange File Format A conforming system shall provide a mechanism to copy files from a file or other recording medium (an archive) to the file hierarchy and copy files from the file hierarchy to a file or other recording medium, recorded in the interchange format described here. This standard does not define this mechanism, but does put some constraints on it. A format-creating utility is used to record an archive, in an implementation-defined manner, and a format-reading utility is used to restore files from an archive. It is based upon ISO 1001 {2}; a magnetic tape conforming to this standard also conforms to ISO 1001. The headers of these formats are defined (by ISO 1001) to use a subset of the invariant part of the characters and codings of ISO/IEC 646 {1}. There is an exception for certain names that would reasonably be in a different or larger character set. BEGIN_RATIONALE Rationale: This format is intended for interchange, not for backup on a single (family of) systems. It is not as densely packed as might be obtained for backup, and contains information as coded characters that for backup could be coded in binary. The whole issue of identification of character sets is probably unnecessary for backup. END_RATIONALE 10.2 Definitions The following terms are used in this section. 10.2.1 alternate character set: A character set specified in the label records in which certain data fields may be encoded. 10.2.2 al-characters: The a-characters (ISO 1001 {2}) plus a-z. 10.2.3 archive: A file or medium recorded in the format specified in this section. 10.2.4 block: When the medium is magnetic tape or another device that records information in discrete units that have a variable length: the meaning defined in ISO 1001 {2}. Otherwise, it is a quantity of data operated on as a single unit. 10.2.5 folded to a-characters: A string of characters when converted to contain only a-characters, as defined in 10.3. 10.2.6 log: Either display as output of the format-reading utility, or write to an implementation-defined file, or both. Information on the archive which cannot be acted upon due to system constraints is logged. 10.2.7 octet: Byte, as defined in ISO 1001 {2}. 10.2.8 record (verb): When discussing the content of an archive, to transfer data from the file hierarchy to the archive. 10.2.9 restore: Transfer data from the archive to the file hierarchy. 10.2.10 volume: When referring to magnetic tape, this shall have the same meaning as in ISO 1001 {2}. When referring to other media, it shall refer to dismountable media. It shall refer to the content of a regular file when this format is recorded in a regular file. The following definitions are taken from the ISO 1001 {2} standard: block, label, magnetic tape, record (noun), tape mark. The notation used in that standard is also used in this standard. The following notation is used in tables describing record formats: BP Byte position within the label L Length of the field in number of octet positions digit(s) Any digit from 0 to 9. P When this field in a table is marked, it indicates that a more specific requirement than that of ISO 1001 {2} is made. BEGIN_RATIONALE Rationale: The following definitions are repeated for convenience from ISO 1001 {2}. 10.2.11 a-character: The 57 characters: ! " % & ' ( ) * + , - . / : ; < = > ? _ 0-9 A-Z as defined by ISO/IEC 646 {1}. (The ISO 1001 definition is more elaborate, but equivalent.) 10.2.12 block: A group of bytes [octets] recorded consecutively in accordance with the relevant standard for recorded magnetic tape. 10.2.13 byte: A string of eight binary digits operated upon as a unit. 10.2.14 file section: That part of a file that is recorded on any one volume. 10.2.15 label: A record that identifies and characterizes a volume, or a file section on a volume. 10.2.16 magnetic tape: Media as defined by ISO 1863 {3}, ISO 3788 {4}, and ISO 5652 {5}. 10.2.17 record (noun): Related data treated as a unit of information. 10.2.18 tape mark: A control block used as a delimiter. NOTE: The structure of tape marks is specified by the relevant standards for recorded magnetic tape. 10.2.19 volume: A dismountable reel of magnetic tape. Note that there is no requirement on the format of the log, although constraints on the content are mentioned. The log is intended for human use to recover from problems found during restoring the archive. ``Octet'' is chosen both to emphasize the 8-bit nature of this interface, and to avoid confusion with the meaning of ``byte'' as used in the rest of this standard. END_RATIONALE 10.3 General Requirements This section of this standard is an extension to ISO 1001 {2}; conflicts (as opposed to further specification) with ISO 1001 shall be resolved in favor of ISO 1001. The concept of appropriate privilege may include options or flags provided to the utility reading the archive. When restoring from an archive, the process doing so might not have the appropriate privilege to create files with all the characteristics described in the archive. In such a case, the file shall be created with as many of the characteristics that it had in the archive as the privileges of the process and the system implementation permit. In the case of a process that has no additional privileges, the protection information (ownership and access permissions) shall be set in the same fashion that open() (see 5.3.1) would when given the mode argument matching the file permissions supplied by the mode field supplied in the File Information File. A process with permissions to do so shall be capable of restoring the ownership, permissions, and other file characteristics exactly as recorded on the medium. The format-creating and format-reading utilities may have options that systematically change the names of files on the archive or as restored to the file hierarchy. For simplicity, the rest of this standard is written as if such name changes are not made; the utilities shall operate in such a fashion that the relationship between files specified by links and symbolic links is retained in the presence of such changes, to the extent permitted by the organization of the file hierarchy. It is implementation defined what protections are applied to an archive based upon protection fields recorded in it. Interpretation of the protection fields need not be applied until the information has been restored. When a device that records data in discrete units of variable length is used, a block may be longer than the record or records it contains. The format-creating and format-reading utilities shall deal with the extra information in such a way that the extra octets participate in no way in the data recorded on the archive. When a device that does not support discrete units of variable length is used, a block shall be exactly the specified length. When logging of information is required by this standard, the name of the file (as it appears in the FIF) shall be included in the log in such a way that it can be associated with the other logged information for that file. Throughout this section, it is stated that certain fields, records, and labels ``shall be ignored.'' Such items (if present) may be recorded with any value consistent with ISO 1001 {2}. Nothing in this standard shall be construed to require that these fields shall be ignored in the presence of options to the format-reading utility enabling interpretation of some or all of these fields. However, such fields shall be ignored when such options are not used. The existence and content of such items may be logged. Logging of ignored information may be suppressed by an option to the format-reading utility. A string shall be folded to a-characters to fit in a specific field in a label subject to the following constraints: (1) The character set is converted to ISO/IEC 646 IRV {1}; characters not in ISO/IEC 646 {1} IRV shall be converted in an implementation-defined manner. (2) The characters a-z are converted to A-Z. (3) The string is truncated to the field width, or padded with s on the right, as needed to make it exactly the width of the label field. BEGIN_RATIONALE Rationale: The requirements on restoring from an archive are slightly different from the historical wording, allowing for nonmonolithic privilege to bring forward as much as possible. In particular, attributes such as ``high performance file'' might be broadly but not universally granted while set-user-ID or chown() might be much more restricted. There is no implication in this standard that the security information be honored after it is restored to the file hierarchy, in spite of what might be improperly inferred by the silence on that topic. That is a topic for another standard. If the implementation can make use of further extensions in label records (or if they are automatically generated by some systems) it is permissible use them with an option. This may prove particularly useful for tapes generated on ``label-smart'' systems. The character set folding is not ideal, but to be consistent with ISO 1001 {2}, it is required for fields appearing in the header. Note that such fields are only meaningful when the format is not read on a POSIX system or in the case of error, so no crucial data is lost. END_RATIONALE 10.4 Media The originator and recipient of the volume used for interchange shall agree on the media upon which this format is to be recorded. If that media is agreed to be magnetic tape, a tape mark shall be recorded and expected on the medium in the locations defined below. Where an analogous concept exists for other media, it should be used. Where a tape mark or other out-of-band end of file marker is not defined for the agreed-to media it shall be omitted from the media when the interchange format is recorded. BEGIN_RATIONALE Rationale: Requiring tape marks first of all is required by ISO 1001 {2} for the tape formats above. It also makes recovering a damaged tape much easier. Because media such as regular files do not have the ability to record such things, they are not strictly required for other media, but they are recommended where it is possible. Note also that this implies that for tape and tape-like devices, a label is identically a block (other than possible padding in the block). This is also intended here. For devices that do not have a block structure like this, then a stream model applies. END_RATIONALE 10.5 File Representation An instance of this archive format shall consist of one or more volumes. Each volume shall contain zero or more file sections. A file shall be recorded in one or more file sections consistent with the rules of ISO 1001 {2}. Where tape marks are available, multiple File Sections and the appropriate End of File or Section and End of Volume Label Groups shall be used. An archive written on magnetic tape shall use tape marks. On media where both a tape mark or equivalent is not used, and where records are not delimited in hardware or where this fact is hidden by the underlying system, only a single logical volume shall be written, which may span several physical volumes. Both the labels and the file data shall be combined as a single stream, and the labels shall appear immediately adjacent to each other. If the archive must span more than one physical volume, the last octet from the archive that is recorded on the first physical volume shall be the one preceding the first octet recorded on the second physical volume. On media where a tape mark or equivalent is not used, but where records are delimited in hardware, only a single logical volume may be written, which may span several physical volumes. Each label shall appear as a separate physical record. File data shall be treated as a single stream. If the archive must span more than one physical volume, the last octet from the archive that is recorded on the first physical volume shall be the one preceding the first octet recorded on the second physical volume. Media recorded in conformance with this standard shall have volume header and trailer labels as described below. Such media shall represent each POSIX file in either one or two ISO 1001 {2} format files. The first such ISO 1001 file shall consist of labeling information as described for the File Information File (FIF), the second section, the File Data File (FDF), is optional: when the FIF describes a file that does contain data, it shall not be recorded. BEGIN_RATIONALE Rationale: ISO 1001 {2} says that there is either only one file section, or if the file will not fit on the current volume (either because it ran out of space along the way, or because the file is too big for one volume) that the second section follow the first on the next volume, and that no other files intervene until the whole file is recorded. (If the file is huge, there may be many sections, but each must use the whole volume until the last.) Because tape marks are ``out of band,'' it is safe to split data across several volumes like this. The first form of recording on media other than magnetic tape is specifically addressing either stream devices or recording on an ``ordinary file.'' The second addresses devices such as sectored disks. Note that without a tape mark or equivalent out-of-band indicator, when dealing with files that span multiple physical volumes, that it is impossible to tell where the data ends and any label records begin. When tape marks are present, this is feasible, and labels are required. Implementors should note that implementations that refuse to allow writing at least label records after the end of tape reflector spot is detected may not be able to conform to this standard. (or to ISO 1001 {2}). Historical UNIX systems have not handled the detection of the reflector spot consistent with historical usage in other ``tape-smart'' systems. Implementations must be able to read data after the end of tape reflector spot because other implementations may have written past that point. An implementation is free to back up and write the end of tape information after detecting the reflector spot, while holding the previously written information in memory for the next volume. However, given the requirement to read past the reflector spot, that may not be worthwhile to implement. Though it is not required by POSIX.1, implementations of the format- reading and -creating utility, upon reading logical end-of-file, may check to see if an error channel is open to a controlling terminal. The utility then produces a message requesting a new medium to be made available. The utility waits for a new medium to be made available by attempting to read a message to restart from the controlling terminal. In all cases, the communication with the controlling terminal is performed in an implementation-defined manner. The handling of multivolume archives historically has been inconsistent. The 1988 and 1990 versions of POSIX.1 attempted to clarify this in rationale similar to that below. For this new format, the issues of multivolume archives are addressed more precisely. However, for magnetic tape, this needs further clarification because ISO 1001 presumes behavior that many historical systems similar to POSIX have not performed properly in the past. POSIX.1 is intended to be interpreted such that each byte of the format is represented on the media exactly once. In some current implementations, it is not deterministic whether encountering the end-of-medium reflector foil on magnetic tape during a write will yield an error during a subsequent read() of that record, or even if that record is actually recorded on the tape. It is also possible that read() will encounter the end-of-medium when end-of-medium was not encountered when the data was written. This has to do with conditions where the end of (magnetic) record is in such a position that the reflector foil is on the verge of being detected by the sensor and is detected (nondeterministically) during one operation and not on a later one. An implementation of the format-creating utility must assure when it writes a record that the data appears on the tape exactly once, and is properly followed by the trailer labels required by ISO 1001. This implies that the program and the tape driver work in concert. An implementation of the format-reading utility must assure that an error in a boundary condition described above will not cause loss of data, and that trailer labels will be properly interpreted. The general consensus was that the following would be considered as correct operation of a tape driver when end-of-medium is detected: (1) During writing, either: (a) The record where the reflector spot was detected is backspaced over by the driver so that the trailing tape mark and labels that must be written will overwrite the old record. (It should be noted that in most cases, the trailing labels will be much longer than the original record due to the number of interrecord gaps required.) Writing the labels and tape marks should not yield an end-of-medium condition. (b) The condition is reported as an error on the write() following the one where the end-of-medium is detected (the one where the end-of-medium is actually detected completing successfully). No data will be actually transferred on the write() reporting the error, but label records can be written. Writing the tape mark, and writing any subsequent records, should not yield any end- of-medium conditions. (2) During reading, the end-of-medium indicator is simply ignored, presuming that (end-of-file) labels will be recorded on the magnetic medium and that the reflector foil was advisory only to the write(). Systems where these conditions are not met by the tape driver should assure that the format-creating and -reading utilities assure proper representation and interpretations of the files on the media in a way consistent with the above recommendations. The typical failures on systems that do not meet the above conditions are either: (1) To leave the record written when the end-of-medium is encountered on the tape, but to report that it was not written, and possibly not write the trailing labels. The format-creating utility would then rewrite the failed record on the next volume. The format-reading utility could see the record twice if the end-of-medium is not sensed during the read operations. (2) The write() occurs uneventfully, but the read() senses the error and does not actually see the data, causing a record to be omitted, and trailing labels are not seen. Nothing in POSIX.1 requires that end-of-medium be determined by anything on the medium itself (for example, a predetermined maximum size would be an acceptable solution for the format-creating utility). The format- reading utility must be able to read() tapes written by machines that do use the whole medium, however. On media where end-of-medium and end-of-file are reliably coincident, such as disks, end-of-medium and end-of-file can be treated as synonyms. Note that partial physical records [corresponding to a single write()] can be written on some media, but that only full physical records will actually be written to magnetic tape, given the manner in which the tape operates. There is wording to specifically to disallow using the ``no tape mark'' option on magnetic tape, even by agreement, and have it considered as conforming to this standard. A tape that is a single volume looks (roughly) like this in ISO 1001 {2}: Beginning of Volume Label Group. (VOL? and UVL?) ( Beginning of File Label Group. (HDR? and UHL?) tape mark file data tape mark End of File Label Group. (EOF? and UTL?) tape mark )* tape mark NOTE: (`?' and `*' are used in the sense of regular expressions.) When the file data spans the end of volume, the End of File Label Group becomes an ``End of Section Label Group'' (EOV? and UTL?), and no further data can be recorded on the volume. The next segment of the file must be at the beginning of the next volume, and there are requirements about the consistency of many of the fields between the File Header Label Group for each Section. ISO 1001 {2}: A sequence of one or more labels of the same type, recorded in consecutive blocks shall be a label set of that type. All labels in a set shall be numbered consecutively starting from 1, except those labels in User file Header and User File Trailer Label Sets. ISO 1001 {2}: The label in User File Header and User File Trailer Label Sets may be identified in any order and may contain duplicate identifiers within a set. Note that this format yields a double tape mark at the end of the tape. Obviously, in the case of other media, the single volume requirement assures that an End of Section Label Group would never be used. END_RATIONALE 10.6 Volume Label Records This section specifies how each volume label field is defined in the context of this standard. 10.6.1 The VOL1 Header Label BP Field Name L P Content _____ _____________________________________ __ _ ____________ 1-3 Label Identifier 3 VOL 4 Label Number 1 1 5-10 Volume identifier 6 a-characters 11 Volume accessibility 1 a-character 12-24 (Reserved for future standardization) 13 s 25-29 POSIX Identification 5 * POSIX 30 POSIX Version 1 * 1 31-37 Implementation Identifier 7 * a-characters 38-51 Owner Identifier 14 * a-characters 52-79 (Reserved for future standardization) 28 s 80 Label Standard Version 1 4 10.6.1.1 Fields Reserved for Future Standardization These fields shall be reserved for future standardization for ISO 1001 {2} and its successors. The characters in these fields shall be s. 10.6.1.2 Volume Identifier This field shall specify an identification of the volume. An implementation-defined mechanism shall be provided for setting this field when a volume is created, and displaying it when a volume is interpreted. If this field is not set explicitly it shall contain spaces. BEGIN_RATIONALE Rationale: It is possible that some implementations will provide a mechanism for setting this field independent from the archiver, possibly as part of a more general handling of labeled tapes. Note that the format creating utility may have to deal with an interface that assures that the volume identification is retained on the tape when it is rewritten. The POSIX Identifier and version fields are not intended to be required for general labeled tapes. END_RATIONALE 10.6.1.3 Volume Accessibility Editor's Note: Help from 1003.6 is requested in this area. It is unspecified whether any mechanism is present to control access to files as recorded in this format based upon any identifying information or security information recorded with the file. 10.6.1.4 Implementation Identifier This is a constant for the archive format, as opposed to implementation defined for ISO 1001 {2}. It is implementation defined whether a format-reading utility will attempt to read a file or other media in ISO 1001 format that does not contain the values specified for these fields. BEGIN_RATIONALE Rationale: This immediately identifies a file as an archive conforming to this standard. Further system identification is possible elsewhere (as an extension). END_RATIONALE 10.6.1.5 Owner Identifier This field shall contain a representation of the user name associated with the uid of the process creating the archive file, as found in the user database. The name shall be folded to a-characters. 10.6.2 The VOL2 Header Label This header label is optional; if omitted, the fields described in the record shall all take their default values. BP Field Name L P Content _____ ______________________ __ ___ ____________ 1-3 Label Identifier 3 VOL 4 Label Number 1 2 5-9 Standards Body 5 * a-characters 10-14 Number 5 * a-characters 15-19 Variant 5 * a-characters 20-23 Revision 4 * a-characters 24-80 Reserved for POSIX use 56 s BEGIN_RATIONALE Rationale: VOL2 is not specified at all beyond its possible existence in ISO 1001 {2}. Additional labeling information is stored here. END_RATIONALE 10.6.2.1 Character Set Identification The four fields Standards Body, Number, Variant, and Revision are used to identify the alternate character set used in data files, file names, and user and group names. The following are defined to refer to known standards. Additional names may be agreed on between the originator and the recipient. If a name is not recognized, the format-reading utility may take implementation-defined actions, but there shall always be a way for the data to be transferred to the receiving system, possibly without translation. Body Number Variant Revision Formal Standard ________ ________ ________ ________ _______________ s s s s ISO/IEC 646 IRV {1} ISO 646 IRV 1990 Note 1 ISO 8859 1 1987 ISO 8859-1 {6} ISO 8859 2 1987 ISO 8859-2 {7} ISO/IEC 10646 Note 2 19xx ISO/IEC 10646 {9} NOTES: (1) In this form of entry, when the value for variant is IRV then ISO/IEC 646 IRV {1} is specified. Other values specify National Usage variants of this character set, and shall be agreed upon by sender and recipient. (2) This value specifies the compaction method used. The format-reading utility shall provide a means to control whether data files have character set translation applied. This may be done either on a per-file basis, or selecting the option for the whole archive. A utility that performs the same translation should be provided. BEGIN_RATIONALE Rationale: Only a limited number of character set standards can actually be permitted for maximal interchange. Any character set is of course possible by prior agreement. It has been suggested that EBCDIC be listed. It is not clear whether it is a formal standard; it is omitted. Formal standards, and then only those with reasonably large followings, can be included here, simply as a matter of practicality. The notation used for an ISO/IEC 646 {1} National Usage variant is appropriate material for a National Profile. The requirement that translation be able to be suppressed permits mixing binary and character data in a single archive, extracting both, and then translating only the character files. (The character set ``BINARY'' in the FDF is similar; this permits such data to be extracted in the absence of the preplanning needed to use BINARY.) (The tr command will not do because not all the translations are 1:1.) The utility could be popen()-ed to simplify the utility implementation. END_RATIONALE 10.6.3 The VOL3-9 and UVL1-9 Header Labels These labels are free to be used for any implementation-defined purpose; they shall be ignored. 10.7 File Information File This section specifies how the File Information File (FIF) is recorded; both its content and the header labels for it are defined. A FIF consists of ISO 1001 header labels that define the content of the FIF; the FIF itself contains the information necessary to recreate the POSIX file (except the data). 10.7.1 First FIF Header Label (HDR1) This record shall be recorded for all files in the archive. BP Field Name L P Content _____ ___________________________________________ __ _ ____________ 1-3 Label Identifier 3 HDR 4 Label Number 1 1 5-21 File Identifier 17 * a-characters 22-27 File Set Identifier 6 a-characters 28-31 File Section Number 4 digits 32-35 File Sequence Number 4 digits 36-39 Generation Number 4 digits 40-41 Generation Version Number 2 digits 42-47 Creation Date 6 * digits 48-53 Expiration Date 6 * 00000 54 File Accessibility 1 a-characters 55-60 Block Count 6 00000 61-64 POSIX Identification 5 * POSIX 65 POSIX Version 1 * 1 66 FIF Identifier 1 * H 67-73 Implementation Identification 7 * s 74-80 (Reserved for future POSIX standardization) 7 s 10.7.1.1 File Identifier This shall be the name of the FIF file. It shall be derived from the name of the POSIX file being recorded by prefixing the string INFO- to the filename (not pathname) of the file being recorded, with the result folded to a-characters. BEGIN_RATIONALE Rationale: ISO 1001 {2} contains the following note: The File Identifier field of a file within a file set is permitted to be the same as that of other files in the file set. Note that this is not the pathname of the file, but rather again exists for the purposes of non-POSIX systems. END_RATIONALE 10.7.1.2 File Set Identifier This field shall be ignored. 10.7.1.3 File Section Number This field shall be as specified in ISO 1001 {2}. BEGIN_RATIONALE ISO 1001 {2}: This field shall specify the ordinal number of the file section as a four-digit decimal number. The characters in this field shall be digits. END_RATIONALE 10.7.1.4 File Sequence Number This field shall be ignored. 10.7.1.5 Generation Number This field shall be ignored. 10.7.1.6 Generation Version Number This field shall be ignored. 10.7.1.7 Creation Date This shall be the representation of the st_ctime field of the file, converted to a Julian day, according to the time zone active at the time the format-creating utility was run. 10.7.1.8 Expiration Date This shall be the constant 00000. BEGIN_RATIONALE Rationale: The format of dates in the Creation and Expiration Date fields means nothing to POSIX, but might be useful to other systems. In ISO 1001 {2} they are Julian dates, accurate only to the day. The coding of the Expiration Date is taken to mean obsolete data by ISO 1001 {2}. That seems better than a label that may prevent overwriting the file without a lot of trouble, but is certainly subject to negotiation. END_RATIONALE 10.7.1.9 File Accessibility Editor's Note: Help from 1003.6 is solicited. The content of this field may be any permissible value; the format- reading utility shall ignore this field. 10.7.1.10 Block Count This field shall be as specified in ISO 1001 {2}. BEGIN_RATIONALE ISO 1001 {2}: This field shall specify a constant value. The characters in this field shall be zeroes. (Block count is used on the EOF1 record, which is the same format as HDR1.) END_RATIONALE 10.7.2 Second File Header Label (HDR2) BP Field Name L P Content _____ _____________________________ __ _ _____________ 1-3 Label Identifier 3 HDR 4 Label Number 1 2 5 Record Format 1 * D 6-10 Block Length 5 * 20000 11-15 Record Length 5 * 09999 16-20 Standards Body 5 * a-characters 21-25 Number 5 * a-characters 26-30 Variant 5 * a-characters 31-34 Revision 4 * a-characters 35-50 (Reserved for Implementation) 15 not specified 51-52 Offset Length 2 * 00 53-80 (Reserved for Standard) 32 s 10.7.2.1 Standards Body, Number, Variant, Revision These fields record the same information, and are encoded in the same way, as the corresponding fields in the VOL2 Header Label. They specify the alternate character set to be used in the FIF file for those fields permitted to be in the alternate character set. If all these fields are spaces, instead of indicating ISO/IEC 646 {1} IRV, the alternate character set specified in the VOL2 label shall be used. 10.7.2.2 Record Format Data is recorded as variable length records, blocked between 20 and 20K octets per block. BEGIN_RATIONALE Rationale: This format allows for several long records (such as the pathname) without introducing any complexity beyond that already in ISO 1001. Note that the four-octet length field is explicitly required even when the record is put on nontape media. In addition to specifying length, they also serve to separate header labels from the content. END_RATIONALE 10.7.2.3 Block Length This is the maximum length of a block, which is 20000 octets. 10.7.2.4 Record Length This is the maximum length of a record, which is limited to 9999 octets by the structure of variable-length records. 10.7.2.5 Offset Length This shall be zeroes. No offset is permitted. 10.7.3 Additional File Header Labels (HDR3-9, UHL3-9) For a FIF, these labels are free to be used for any implementation purpose, and shall be ignored. On media that supports tape marks, the last header shall be followed by a tape mark. On other media the tape mark shall be omitted. BEGIN_RATIONALE Rationale: Headers can be distinguished from the data because the first record of the actual file will be a variable-length record, which must begin with four digits. END_RATIONALE 10.7.4 FIF Contents The FIF shall consist of several fixed record types, in the order specified below. Each logical record shall be recorded in ISO/IEC 646 {1} IRV, except that certain fields may be recorded in the alternate character set. Additional optional record types are permitted in the format specified below. In the record descriptions below, the content is described excluding the record length as specified in ISO 1001 {2}. Records present in a FIF, but which are not recognized by the format- reading program, shall be ignored, except that they shall be logged. 10.7.4.1 FIF Filename Record BP Field Name L Content _____ _________________ __ ________________________ 1 File Type 1 al-characters 2-10 File Mode 9 al-characters 11 Link Status 1 al-characters 12-21 Modification time 10 digits 22-31 Creation time 10 digits 32-41 Access time 10 digits 42-51 File Size 10 digits 52- File Name Alternate character set. 10.7.4.2 File Type This is the single character as follows: - Regular file c Character special file b Block special file d Directory p FIFO special file h High-performance file l Symbolic link A-Z Reserved for implementations other Reserved for future standardization. The interpretation of the FIF and FDF for each of these file types is as specified below. 10.7.4.3 Link Status This field indicates whether the file is a link or not. An ordinary file entry is indicated for the first (or only) entry in the archive for that file. Subsequent entries for additional links to that file shall indicate whether a link with or without data is recorded. The format- creating utility shall have an option to control whether data is recorded or not. When a link, with or without data, is recorded, a record containing the target of the link, recorded in the alternate character set, shall immediately follow the FIF File Group record, and preceding any information specific to the file type. For full portability, the length of the target shall not exceed {_POSIX_PATH_MAX} octets. If the target of the link was restored by the format-reading utility during the same execution of the utility, then a link to the target shall be created and the FDF associated with the link entry shall be ignored. If the link cannot be created, an error shall be reported and the link ignored. If file was not restored during the same run the following actions shall be taken: File Type Does Not File Type Requires Require Data Data (Including Zero Length) ______________________ __________________ Data present see 10.8 restore Data absent restore ignore entry log log error A link with data shall be recorded with the File Size found at the time that specific link was recorded. If the length is found to be different from the length as recorded for a previous link, that fact shall be logged. A link without data shall have the File Size as found at the time the specific link was recorded, but no FDF shall be recorded for that file. The link type information is recorded as the single character as specified as follows: Ordinary file entry L Link (without data) D Link (with data) A-Z Reserved for implementations other reserved for future standardization. BEGIN_RATIONALE Rationale: Links are recorded in this fashion because a link can be to any file type. It is desirable in general to be able to selectively restore part of an archive, and restore all the files completely. If the data is not associated with each link, it is not possible to do this. However, the data associated with a file can be large, and when selective restoration is not needed, this can be a significant burden. The archive is structured so that files that have no associated data can always be restored by the name of any linkname of any link, and the user may choose whether data is recorded with each instance of a file that contains data. Although not required of an implementation of the format-creating utility, the format permits mixing of both types of links in a single archive; this can be done for special needs, and format-reading utilities are expected to properly interpret such archives. Note that device/i-number labeling of files is not carried forward from USTAR and cpio; rather, it works strictly on a symbolic name basis. This is intentional because device/i-number links could break if files span file systems that did not used to do so, and because to repair problems it is more useful to have both names. END_RATIONALE 10.7.4.4 File Mode The file protections shall consist of nine characters, presented as three groups of three characters each: Permissions for the file owner class (see 2.2.2.35). Permissions for the file group class. Permissions for the file other class. Each field shall have three character positions: (1) If r, the file is readable; if -, it is not readable. (2) If w, the file is writable; if -, it is not writable. (3) The first of the following that applies: S If in , the file is not executable and set-user-ID mode is set. If in , the file is not executable and set- group-ID mode is set. s If in , the file is executable and set-user-ID mode is set. If in , the file is executable and set-group-ID mode is set. T If in , the file is executable and an unspecified concept of shared-executable is applied to the file. It is implementation defined whether such a mode is recorded. It is implementation defined what action, if any, beyond setting the the other-execute mode, is performed when T is found in this position in an archive. x The file is executable or the directory is searchable. - None of the attributes of S, s, T, or x applies. Implementations may add other characters to this list for the third character position. Such additions shall, however, be recorded in lowercase if the file is executable or searchable, and in uppercase if it is not. It is implementation defined whether, when such an extension is found when restoring an archive, any action beyond setting the group-execute mode according to the case of the character occurs. BEGIN_RATIONALE Rationale: The modes are those conventionally used by the ls utility. This is extended beyond the usage in 1003.2 to support the ``shared text'' or ``sticky'' bit. It is intended that the conformance document should not document anything beyond the existence of and support of such a mode. Further extensions are expected to these bits, particularly with overloading the set-user-ID and set-group-ID flags. Editor's Note: This is a fairly conservative approach to all this; given that this is new work, should the suid, sgid, and sticky bits be expressed separately? END_RATIONALE 10.7.4.5 File Modification, Access, and Creation Times These are the decimal representation of these times, as Seconds since the Epoch. The access and modification times shall be restored if the process has the appropriate privilege required to do so. BEGIN_RATIONALE Rationale: Creation time (actually i-node modification time) is for information only, as it is not possible to effectively change it portably. Nothing is intended to prevent a nonportable implementation of the utility from restoring the value. END_RATIONALE 10.7.4.6 File Size For all file types except those noted below, this is the size of the file, in octets, that is recorded on the media as the FDF; if the size of the file on the originating system should change during recording of the file, the file on the archive shall be truncated or padding of octets containing zeroes added as required to assure that the exact number of octets specified is actually recorded. 10.7.4.7 Name The rest of the record shall consist of the the Name, which shall be recorded in the alternate character set. No padding of Name shall occur, so that the length of Name can be calculated from the record length. The format-reading utility shall translate the Name in an implementation- defined manner to be the name of the file on the receiving system. In no case shall a file name that is not supported on the receiving system, or that cannot be subsequently accessed, be created. The length shall not exceed {_POSIX_PATH_MAX} without agreement between the originator and the recipient that this can occur. BEGIN_RATIONALE Rationale: Note that this format allows for slightly less than 10000 octets of pathname for a file. This is well above any current practical limit known. END_RATIONALE 10.7.5 FIF File Owner This shall be the name of the owner of the file as found in the user database on the originating system, represented in the alternate character set. The (variable-length) record shall consist solely of the the file owner name, with no padding. A zero-length record shall be recorded if the implementation creating the format does not support ownership on this file type. If a zero length owner is encountered, it shall be treated as if the entry could not be found in the data base. After an implementation-defined translation to the character set of the receiving system, the File Owner name shall be used as the name of the owner of the file. If the same name appears in the user database of the receiving system (without truncation of either name), and if the format interpreting utility has the appropriate permission to do so, it shall set the owner-ID of the file it is restoring to be the corresponding user-ID. If the owner name does not appear in the user database of the receiving system, the file shall be created with the ownership of the user running the format-reading utility, and the name that was not found shall be logged. If the implementation does not support ownership on the file type being restored, the owner information may be ignored. 10.7.6 FIF File Group This shall be treated in the same way as the file owner entry except that the group database shall be consulted, and the file group owner set where appropriate. 10.7.7 File-type Dependent Information The next several records in the FIF vary depending on the File Type recorded in the first record. The following clauses describe these additional records and the interpretation of certain of the fields in the preceding records. 10.7.8 Regular Files No File-type Dependent Information is recorded. The File Size is interpreted as the number of bytes found in the FDF, except that the FDF shall be omitted if the file size is zero. 10.7.9 Block or Character Special Files The file size in the first FIF record shall be zero, and no FDF shall be recorded. BP Field Name L Content _____ ____________________ __ _____________ 1-15 System Identifier 15 al-characters 16- Device Specification -- al-characters 10.7.9.1 System Identifier When written by the format-creating utility, this field shall contain an implementation-defined value representing the name of the implementation, the model, and sub-model numbers sufficient to distinguish it from similar implementations that use a different representation of the Device Specification field. If the format-interpreting utility recognizes this field, it shall restore the special file according to the information appearing in the Device Specification. If the format-reading utility does not recognize the System Identifier the content of the FIF, including the Device Specification and System Identifier, shall be logged, and the format-reading utility may otherwise ignore the archive entry. BEGIN_RATIONALE Rationale: There is a potential issue of name registration for systems, but it is unlikely to be a problem as vendors do try to keep their systems clearly identifiable. Note that this permits someone to write a special file on one system and restore it on an incompatible system by interpreting the meaning in some way. END_RATIONALE 10.7.9.2 Device Specification This field (of variable length) is unspecified by this standard, except that it shall contain information sufficient to create a file identical in function to the one written by the format-creating utility, when restored on the same implementation on which it was written. 10.7.10 Directories The information in a directory FIF shall be used to restore file ownership and permissions only after the format-reading utility has completed reading the archive. The meaning of the file size field is implementation defined. If the implementation has no special definition for this field, it shall be ignored. No FDF shall be recorded for this entry type. BEGIN_RATIONALE Rationale: Delaying restoration of the file ownership and permissions for a directory assures that the format-reading utility will be able to continue to write entries in the directory. The behavior of the file size field is implementation defined to permit but not endorse a current USTAR behavior. END_RATIONALE 10.7.11 FIFOs No additional records shall be written. The file size shall be ignored, and no FDF shall be recorded. 10.7.12 Symbolic Links A single record consisting solely of the pathname of the target of the symbolic link, in the alternate character set, shall be recorded. For full portability, the length of the name shall not exceed {_POSIX_PATH_MAX} octets. The link shall be created whether or not the target of the link exists. The ownership and protection information recorded for a symbolic link may be ignored. If a symbolic link entry is encountered on a system that does not support symbolic links, it is implementation-defined whether, and under what conditions, the link is translated to a hard link. This translation shall be logged giving both names. If the link is not restored, that fact shall be logged. 10.7.13 High Performance Files Editor's Note: Help is solicited from P1003.4. High-performance files shall be restored as regular files if the implementation does not support the file type. In such a case, the information in the records which follow shall be logged. Zero or more of the following records shall be recorded to reflect the information required to represent the high-performance file's characteristics. BP Field Name L Content _____ _______________________ __ _____________ 1-10 Flag 10 HIGH-PERF: 16- Parameter Specification -- al-characters 10.7.14 Other File Types For other, implementation-defined, file types, additional information records may be provided as specified below. If the File Size in the first FIF record is nonzero, and if the implementation does not recognize the file type, the subsequent FDF shall be restored as an ordinary file. NOTE: This implies that the File Size field of the FDF must be accurate for all file types provided as an extension. BP Field Name L Content _____ _______________________ __ ____________ 1-10 Flag 10 a-characters 16- Parameter Specification -- unspecified The Flag may be any combination of a-characters except those that are specified by this standard to appear as the first ten octets of a FIF record. The remaining characters shall either be al-characters or in the alternate character set. If the record type is not recognized by the implementation, it shall be logged; translation from the alternate character set shall not occur when a record is logged. 10.7.15 Security Following the last file-type specific record, additional records of the following format may be recorded. Editor's Note: Help from 1003.6 is solicited. BP Field Name L Content _____ ______________________ __ ___________ 1-10 Flag 10 SECURITY: 16- Security Specification -- unspecified On systems which do not implement security, these records may be ignored or logged. 10.7.16 Other Additional Information For all file types, additional information records may be provided as specified below. BP Field Name L Content _____ _______________________ __ ____________ 1-10 Flag 10 a-characters 16- Parameter Specification -- unspecified The Flag may be any combination of a-characters except those that are specified by this standard to appear as the first ten octets of a FIF record. The remaining characters shall either be al-characters or in the alternate character set. If the record type is not recognized by the implementation, it shall be logged; translation from the alternate character set shall not occur when a record is logged. BEGIN_RATIONALE Rationale: This explicit permission for extension is included because historically archive formats have not been easily extended to allow new information. It is expected that both implementation specific extensions and future standards will use this mechanism. The Flag field was chosen to be sufficiently long to allow such extensions without fear of name conflict. Because it is a known extension currently under development, SECURITY: is explicitly reserved. END_RATIONALE 10.7.17 The FIF EOF1 Label The EOF1 Record shall be identical to the HDR1 record recorded for the FIF, except in the following fields. BP Field Name L P Content _____ ________________ _ _ _______________ 1-3 Label Identifier 3 EOF 55-60 Block Count 6 , digits 10.7.17.1 Block Count This field shall be as specified in ISO 1001 {2}. If the actual block count exceeds the maximum value of a six-digit decimal number, it shall be recorded modulo 10000000. BEGIN_RATIONALE ISO 1001 {2}: Within a Labeled-Sequence the content of the fields of this label shall be identical with the contents of the corresponding fields in the First File Header Label, except for the following fields. ISO 1001 {2}: ``[The Block Count] field shall specify, as a six-digit decimal number, the number of blocks in which the file section is recorded.'' END_RATIONALE 10.7.18 The FIF EOF2-9 and UTL1-9 Labels These records are optional, but if present shall be recorded in accordance with ISO 1001. 10.8 The File Data File The file data file is optional, and when present shall contain the data contained in the file. If this file is omitted when it is expected, an empty file of the specified type shall be created, and an error logged. If a file of this type is found when it is not expected, the name from the HDR1 record shall be used as the name of a file containing the data found in the FDF, in an implementation-defined directory. An error shall be logged. BEGIN_RATIONALE Rationale: If an unexpected FDF is found, it is intended that the data is put where it can be examined later; the only candidate name (other than a synthetic one) is the one from the HDR1. The directory should either be the current working directory, or a specified ``errors'' directory if the utility provides one. END_RATIONALE 10.8.1 The FDF HDR1 Label BP Field Name L P Content _____ ___________________________________________ __ _ _______________ 1-3 Label Identifier 3 HDR 4 Label Number 1 1 5-21 File Identifier 17 * a-characters 22-27 File Set Identifier 6 a-characters 28-31 File Section Number 4 digits 32-35 File Sequence Number 4 digits 36-39 Generation Number 4 digits 40-41 Generation Version Number 2 digits 42-47 Creation Date 6 * digits 48-53 Expiration Date 6 * 00000 54 File Accessibility 1 a-characters 55-60 Block Count 6 , digits 61-64 POSIX Identification 5 * POSIX 65 POSIX Version 1 * 1 66 FDF Identifier 1 * D 67-73 Implementation Identification 7 * s 74-80 (Reserved for future POSIX standardization) 7 s 10.8.1.1 File Identifier This shall be the name of the FDF file. It shall be derived from the name of the POSIX file being recorded as the filename (not pathname) of the file being recorded, folded to a-characters. 10.8.1.2 File Set Identifier This field shall be ignored. 10.8.1.3 File Section Number This field shall be as specified in ISO 1001 {2}. 10.8.1.4 File Sequence Number This field shall be ignored. 10.8.1.5 Generation Number This field shall be ignored. 10.8.1.6 Generation Version Number This field shall be ignored. 10.8.1.7 Creation Date This shall be the representation of the st_ctime field of the file converted to a Julian day, according to the time zone active at the time the format-creating utility was run. 10.8.1.8 Expiration Date This shall be the constant 00000. 10.8.1.9 File Accessibility Editor's Note: Help from 1003.6 is solicited. The content of this field may be any permissible value; the format- reading utility shall ignore this field. 10.8.1.10 Block Count This field shall be as specified in ISO 1001 {2}. 10.8.2 Second FDF File Header Label (HDR2) BP Field Name L P Content _____ _____________________________ __ _ _____________ 1-3 Label Identifier 3 HDR 4 Label Number 1 2 5 Record Format 1 * F 6-10 Block Length 5 * 16535 11-15 Record Length 5 * 00000 16-20 Standards Body 5 * a-characters 21-25 Number 5 * a-characters 26-30 Variant 5 * a-characters 31-34 Revision 4 * a-characters 35-50 (Reserved for Implementation) 15 not specified 51-52 Offset Length 2 00 53-80 (Reserved for Standard) 32 s 10.8.2.1 Standards Body, Number, Variant, Revision These fields record the same information, and are encoded in the same way, as the corresponding fields in the VOL2 Header Label. They specify the alternate character set to be used in the FDF file for those fields permitted to be in the alternate character set. If all these fields are spaces, instead of indicating ISO/IEC 646 {1} IRV, the alternate character set specified in the VOL2 label shall be used. If the Standards Body field is ``BINARY'', the data recorded shall not be translated from the alternate character set. 10.8.2.2 Record Format Data is always recorded as one record per block. The data from the file is recorded exactly as it comes from the file. BEGIN_RATIONALE Rationale: This seems the best approximation to POSIX's unstructured files. No attempt to organize this for non-POSIX systems is made for this archive format. END_RATIONALE 10.8.2.3 Block Length This is the constant 16535. 10.8.2.4 Record Length This shall be zeroes. BEGIN_RATIONALE Rationale: Because of the unstructured nature of POSIX files, the maximum is not specified. In ISO 1001 {2}. This is zero, indicating unspecified length. END_RATIONALE 10.8.2.5 Offset Length This shall be zeroes. No offset is permitted. 10.8.3 Additional FDF File Header Labels (HDR3-9 and UHL1-9) These records are not permitted for the FDF. On media with tape marks, if present, they shall be ignored. BEGIN_RATIONALE Rationale: Because there is no way to tell how many of them there are in the absence of tape marks, they are not permitted because it becomes impossible to detect the start of the actual data. If they are present on magnetic tape anyway, they are ignored simply to allow fail-soft when tapes approximately in this format are written on some systems. END_RATIONALE 10.8.4 FDF Data The content of the file shall be recorded in logical blocks of 16535 bytes in length. The last block may be truncated to the actual length. BEGIN_RATIONALE Rationale: Note the statements in General Requirements about padding. END_RATIONALE 10.8.5 The FDF EOF1 Label The EOF1 Record shall be identical to the HDR1 record recorded for the FIF, except in the following fields. BP Field Name L P Content _____ ________________ _ _ _______________ 1-3 Label Identifier 3 EOF 55-60 Block Count 6 , digits 10.8.5.1 Block Count This field shall be as specified in ISO 1001 {2}. If the actual block count exceeds the maximum value of a six-digit decimal number, it shall be recorded modulo 107. 10.8.6 The FDF EOF2-9 and UTL1-9 Labels These records are optional, but if present shall be recorded in accordance with ISO 1001 {2}. 10.8.7 Other labels Other labels (including the EOV labels) shall be written in accordance with ISO 1001 {2}. ------- End of Forwarded Message