SC22/WG15 N378 Danish comments on POSIX.2b as a result of WG15 action items u22a11 POSIX WG Danish Standards (DS) Prepared prior to the WG15-meeting in may-93 1. Introduction These comments on POSIX.2b relates to some of comments on the CD/PDAM of POSIX-2 (POSIX.2a-D8) and to WG15 action items from the WG15 meeting in october-92. There's comments on (1) uudecode (2) uuencode (3) file Other newer comments will be described in another paper. 2. uudecode (section 5.33) MIME (Multipurpose Internet Mail Extensions) is now an IETF RFC1341. It should be possible to use "-" for stdout as the parameter to the -o option as mentioned in earlier comments. OR IS THIS ALREADY SO ??? 3. uuencode (section 5.34) MIME (Multipurpose Internet Mail Extensions) is now an IETF RFC1341 and RFC1342 (msgheader). 4. file (section 5.14) The file utility has traditionally been used to guess file types from a builtin repertoire and from a configuration file often named /etc/magic. Unfortunately the POSIX-2 standard only specified the builtin types and thereby dismembering file to half value in our opinion. As user-defined file types can be very useful for non-executable files like wordprocessing documents mail applications f.ex. should be able to rely on such functionaly being present in a POSIX-system. 4.1 Actions 4.1.1 Add the following options to the Synopsis in 5.14.1: [-x] [-c] [-m ]. 4.1.2 Add text to the Description (c) in 5.14.2 saying that File uses a configuration file in an attempt to identify additional user-defined file types. The name of the default configuration file is undefined. The configuration file format is described under input files. 4.1.3 Add the following to the Options in 5.14.3: The file utility shall conform to the utility argument syntax guidelines described in 2.10.2. -m Specify alternate configuration file. Note: -m is used for historical reasons as the configuration file often has been called a MAGIC file. -f or -C would have been more appropiate. -x Use the specification in the magic file as 'reverse' byte-ordering (big endian). This is to allow direct use of a usually imported magic file which uses big endian for historical reasons. -c Check the magic file for syntax errors. 4.1.4 Add this to the Operands in 5.14.4: mfile A pathname of a file replacing the configuration file 4.1.5 Add this to 5.14.5.2 Input files: The format of the user configuration file ("magic file") is as follows. The file consists of four optionally five -separated fields: offset A number specifying the offset, in bytes, into the file that is to be tested. This may optionally be preceded by a '>' to indicate a continuation line to supply extra information in the printed message. type The type of the data found at the specified offset. Valid types include: byte interpret file data as "unsigned char" type. short interpret file data as "unsigned short" type. long interpret file data as "long" type. string interpret file data as a character(byte) string Some types may be followed by a mask specifier of the form &number, which is ANDed with the value before any comparisons are done. The mask specifier is octal if preceded by a 0, hexadecimal if preceded by a 0x, and decimal otherwise. value The value to match. Numeric values may be decimal, octal, or hex. String values are defined as regular expressions (see 2.??) extended in two ways: 1. Normally unprintable characters may be escaped with '\'. The special characters \n, \b, \r, and \f are allowed. An octal representation can also be used to insert any desired byte value (except 0). Normally, regular expression can not handle such character values. Because the backslash is used as an escape character while the regular expression is being read in, normal occurances of a backslash in a regular expression must be escaped with a second backslash ( \( -> \\(, \. -> \\., ...) 2. Text found in a file can also be inserted in the printed string with the use of the \\% delimiter. All text found between these delimiters is substituted into the print string. See the entries below for script and PostScript files for examples of this usage. Note this is really just a \% delimiter with the backslash escaped. Finally, a word of caution: This regular expression search never terminates until a match is explicitly found or rejected. (\n is a valid character in the patterns). Therefore the pattern ".*" should probably never be used here. If the value is numeric, it may be preceded by a character indicating the operation to be performed. Allowable values include: = The value from the file must equal the specified value. < The value from the file must be less than the specified value. > The value from the file must be greater than the specified value. & All bits in the specified value must be set in the value from the file. ^ At least one of the bits in the specified value must not be set in the value from the file. x Any value will match. The default is "=". message The message to be printed if the comparison succeeds. If the string contains a printf() format specification, the value from the file is printed, using the message as the format string. A line that begins with the > character indicates additional tests and messages to be printed. If the test on the line preceding the first line with a > succeeds, the test specified in all subsequent lines beginning with > are performed, and the messages printed if the tests succeed. The next line that does not begin with a > terminates this action. comment A # (number sign) denotes that the rest of the line is comments. extensions A * in column 1 denotes an undefined extension and the byte offset should be treated as zero or in a undefined manner. The byte-ordering used in the file is always left-to-right (big endian). 4.1.6 Add the following to the rationale. Some example entries: #offset type operator+value string to print # comment #-------------------------------------------------------------------------------- 0 string 070707 ASCII cpio archive 0 string ! portable archive 0 long 0550 executable # delete ? >12 long >0 not stripped 0 string ^!\n__\\.SYMDEF archive random library # # All sorts of scripts (like /bin/sh, /bin/awk, etc.) are identified. # 0 string ^#![ ]*\\%[^ \n]*\\% %s # # Various sorts of text and data files. # 0 string ^\01h[0-9][0-9][0-9][0-9][0-9] sccsfile 0 string ^#ifndef c program 0 string ^%!PS-Adobe-\\%[.0-9]*\\%\n PostScript (v%s) text 0 string ^\0377\0377\0177 ddis/ddif 0 string ^\0100\0357 troff (CAT) output 0 long 04553207 X image 0 string ^begin\040[0-9] uuencoded data 0 long 01360403 DCA Revisable Form Text 0 short 025521 DCA Final Form Text 0 short 025522 DCA Final Form Text 0 short 0x0005 DCA >2 short 0xe103 RFT document 1 string Supermax Supermax >10 string Tekst Text # # ... the next lot of lines all deal with WP 5.x files # # ... if the first four bytes are 0xff,'W','P','C' then it's a # WordPerfect Corporation datafile version 5.0 or greater... 0 long 0xff575043 WPC # # ... the 8th byte tells us which product created this file >8 byte 0x01 WordPerfect >8 byte 0x02 Shell >8 byte 0x03 Notebook >8 byte 0x04 Calculator >8 byte 0x05 FileManager >8 byte 0x06 Calendar >8 byte 0x07 ProgramEditor >8 byte 0x08 MacroEditor >8 byte 0x09 PlanPerfect >8 byte 0x0a DataPerfect >8 byte 0x0b Mail >8 byte 0x0c Printer(ptr.exe) >8 byte 0x0d Scheduler >8 byte 0x0e WPOffice >8 byte 0x0f DrawPerfect(X) # # ... bytes 10 and 11 are Major and Minor version no. # Values for byte 10 are as follows # Value Meaning # 0 Major version 5 # Values for byte 11 are as follows # Value Meaning # 0 Minor version 0 # 1 Minor version 1 # Does anyone know how to add a value to a magic number before # file(1) prints it????? >10 byte 0x00 5. >10 byte 0x01 %d. >11 byte 0x00 0 >11 byte 0x01 1 # # ... the 9th byte is the file type >9 byte 0x01 Macro file >9 byte 0x02 Help file >9 byte 0x03 Keyboard definition file >9 byte 0x0a document >9 byte 0x0b dictionary file >9 byte 0x0c thesarus file >9 byte 0x0d block >9 byte 0x0e rectanular block >9 byte 0x0f column block >9 byte 0x10 printer resource file (.PRS) >9 byte 0x11 setup file >9 byte 0x12 prefix information file >9 byte 0x13 printer resource file (.ALL) >9 byte 0x14 display resource file (.DRS) >9 byte 0x15 overlay file(WP.FIL) >9 byte 0x16 graphic file (.WPG) >9 byte 0x17 hyphenation code module >9 byte 0x18 hyphenation data module >9 byte 0x19 macro resource file (.MRS) >9 byte 0x1a graphics screen driver (.WPD) >9 byte 0x1b hyphenation lex module >9 byte 0x1c printer Q codes(used by VAX/DG) >9 byte 0x1d spell code module-wordlist >9 byte 0x1e 5.1 equation resource file (WP.QRS) >9 byte 0x1f VAX keyboard definition >9 byte 0x20 VAX .SET >9 byte 0x21 spell code module-rules >9 byte 0x22 dictionary-rules >9 byte 0x24 .WPD files >9 byte 0x29 WP51.INS file (install options) # ... End of WP 5.x info #