;;; -*- Mode:gate; Fonts:(HL12 HL12I HL12B CPTFONTB HL12BI HL12B HL12I ) -*- =Node: Introduction =Text: 3INTRODUCTION* A Lisp Machine generally has access to many file systems. While it may have its own file system on its own disks, usually a community of Lisp Machine users want to have a shared file system accessible by any of the Lisp Machines over a network. These shared file systems can be implemented by any computer that is capable of providing file system service. A file server computer may be a special-purpose computer that does nothing but service file system requests from computers on a network, or it may be a time-sharing system. Programs need to use names to designate files within these file systems. The main difficulty in dealing with names of files is that different file systems have different naming formats for files. For example, in the ITS file system, a typical name looks like: 3DSK: GEORGE; FOO QFASL* with 2DSK* being a device name, 2GEORGE* being a directory name, 2FOO* being the first file name and 2QFASL* being the second file name. However, in TOPS-20, a similar file name is expressed as: 3PS:FOO.QFASL* It would be unreasonable for each program that deals with file names to be expected to know about each different file name format that exists, or new formats that could get added in the future. However, existing programs should retain their abilities to manipulate the names. The functions and flavors described in this chapter exist to solve this problem. They provide an interface through which a program can deal with names of files and manipulate them without depending on anything about their syntax. This lets a program deal with multiple remote file servers simultaneously, using a uniform set of conventions. =Node: 4Pathnames* =Text: 3PATHNAMES* All file systems dealt with by the Lisp Machine are mapped into a common model, in which files are named by something called a 1pathname*. Apathnm-1 pathname always has six components, each with a standard meaning. These components are the common interface that allows programs to work the same way with different file systems; the mapping of the pathname components into the concepts peculiar to each file system is taken care of by the pathname software. Pathname components are described in the following section, and the mappings between components and user syntax is described for each file system later in this chapter. 3pathnamep* 1object* 2t* if 1object* is a pathname. A pathname is an instance of a flavor (see 4(FLAVOR-0)Objects, Message Passing, and Flavors*); exactly which flavor depends on what the host of the pathname is, but 2pathname* is always one of its component flavors. If 1p* is a pathname, then 2(typep 1p* 'pathname)* returns 2t*. One of the messages handled by host objects is the 2:pathname-flavor* operation, which returns the name of the flavor to use for pathnames on that host. And one of the differences between host flavors is how they handle this operation. There are functions for manipulating pathnames, and there are also messages that can be sent to them. These are described later in this chapter. Two important operations of the pathname system are 1parsing* and 1merging*. Parsing is the conversion of a string--which might be something typed in by the user when asked to supply the name of a file--into a pathname object. This involves finding out what host the pathname is for, then using the file name syntax conventions of that host to parse the string into the standard pathname components. Merging is the operation that takes a pathname with missing components and supplies values for those components from a set of defaults. The function 2string*, applied to a pathname, converts it into a string that is in the file name syntax of its host's file system, except that the name of the host followed by a colon is inserted at the front. This is the inverse of parsing. 2princ* of a pathname also does this, then prints the contents of the string. Flavor operations such as 2:string-for-dired* exist which convert all or part of a pathname to a string in other fashions that are designed for specific applications. 2prin1* of a pathname prints the pathname using the 3#* syntax so it can be read back in to produce an equivalent pathname (or the same pathname, if read in the same session). Since each kind of file server can have its own character string representation of names of its files, there has to be a different parser for each of these representations, capable of examining such a character string and figuring out what each component is. The parsers all work differently. How can the parsing operation know which parser to use? The first thing that the parser does is to figure out which host this filename belongs to. A filename character string may specify a host explicitly by having the name of the host, followed by a colon, at either the beginning or the end of the string. For example, the following strings all specify hosts explicitly: 3AI: COMMON; GEE WHIZ ;2 This specifies host AI.** 3COMMON; GEE WHIZ AI: ;2 So does this.** 3AI: ARC: USERS1; FOO BAR ;2 So does this.** 3ARC: USERS1; FOO BAR AI: ;2 So does this.** 3EE:PS:GEE.WHIZ.5 ;2 This specifies host EE.** 3PS:GEE.WHIZ.5 EE: ;2 So does this.** If the string does not specify a host explicitly, the parser chooses a host by default and uses the syntax for that host. The optional arguments passed to the parsing function (2fs:parse-pathname*) tell it which host to assume. Note: the parser is not confused by strings starting with 2DSK:* or 2PS:* because it knows that neither of those is a valid host name. But if the default host has a device whose name happens to match the name of some host, you can prevent the device name from being misinterpreted as a host name by writing an extra colon at the beginning of the string: For example, 2:EE:FOO.BAR* refers to the device 2EE* on the default host (assumed to use TOPS-20 syntax) rather than to the host named 2EE*. Pathnames are kept unique, like symbols, so that there is only one object with a given set of components. This is useful because a pathname object has a property list (see 4(MANLISTSTR-2)Property Lists*) on which you can store properties describing the file or family of files that the pathname represents. The uniqueness implies that each time the same components are typed in, the program gets the same pathname object and finds there the properties it ought to find. Note that a pathname is not necessarily the name of a specific file. Rather, it is a way to get to a file; a pathname need not correspond to any file that actually exists, and more than one pathname can refer to the same file. For example, the pathname with 2:newest* as its version refers to the same file as a pathname which has the appropriate number as the version. In systems with links, multiple file names, logical devices, etc., two pathnames that look quite different may really turn out to address the same file. To get from a pathname to a file requires doing a file system operation such as 2open*. When you want to store properties describing an individual file, use the pathname you get by sending 2:truename* to a stream rather than the pathname you open. This avoids problems with different pathnames that refer to the same file. To get a unique pathname object representing a family of files, send the message 2:generic-pathname* to a pathname for any file in the family (see 4(FILENAMES-2)Generic Pathnames*). =Node: 4Pathname Components* =Text: 3PATHNAME COMPONENTS* These are the components of a pathname. They are clarified by an example below. 1host* An object that represents the file system machine on which the file resides. A host object is an instance of a flavor one of whose components is 2si:basic-host*. The precise flavor varies depending on the type of file system and how the files are to be accessed. 1device* Corresponds to the ``device'' or ``file structure'' concept in many host file systems. 1directory* The name of a group of related files belonging to a single user or project. Corresponds to the ``directory'' concept in many host file systems. 1name* The name of a group of files that can be thought of as conceptually the ``same'' file. Many host file systems have a concept of ``name'' which maps directly into this component. 1type* Corresponds to the ``filetype'' or ``extension'' concept in many host file systems. This says what kind of file this is; such as, a Lisp source file, a QFASL file, etc. 1version* Corresponds to the ``version number'' concept in many host file systems. This is a number that increments every time the file is modified. Some host systems do not support version numbers. As an example, consider a Lisp program named 2CONCH*. If it belongs to 2GEORGE*, who uses the 2FISH* machine, the host would be the host-object for the machine 2FISH*, the device would probably be the default and the directory would be 2GEORGE*. On this directory would be a number of files related to the 2CONCH* program. The source code for this program would live in a set of files with name 2CONCH*, type 2LISP*, and versions 21*, 22*, 23*, etc. The compiled form of the program would live in files named 2CONCH* with type 2QFASL*; each would have the same version number as the source file that it came from. If the program had a documentation file, it would have type 2INFO*. Not all of the components of a pathname need to be specified. If a component of a pathname is missing, its value is 2nil*. Before a file server can do anything interesting with a file, such as opening the file, all the missing components of a pathname must be filled in from defaults. But pathnames with missing components are often handed around inside the machine, since almost all pathnames typed by users do not specify all the components explicitly. The host is not allowed to be missing from any pathname; since the behavior of a pathname is host-dependent to some extent, it has to know what its host is. All pathnames have host attributes, even if the string being parsed does not specify one explicitly. A component of a pathname can also be the special symbol 2:unspecific*. 2:unspecific* means, explicitly, ``this component has been specified as missing'', whereas 2nil* means that the component was not specified and should default. In merging, 2:unspecific* counts as a specified component and is not replaced by a default. 2:unspecific* does 1not* mean ``unspecified''; it is unfortunate that those two words are similar. 2:unspecific* is used in 1generic* pathnames, which refer not to a file but to a whole family of files. The version, and usually the type, of a generic pathname are 2:unspecific*. Another way 2:unspecific* is used has to do with mapping of pathnames into file systems such as ITS that do not have all six components. A component that is really ``not there'' is 2:unspecific* in the pathname. When a pathname is converted to a string, 2nil* and 2:unspecific* both cause the component not to appear in the string. A component of a pathname can also be the special symbol 2:wild*. This is useful only when the pathname is being used with a directory primitive such as 2fs:directory-list* (see 4(FILEACCESS-3)Accessing Directories*), where it means that this pathname component matches anything. The printed representation of a pathname usually designates 2:wild* with an asterisk; however, this is host-dependent. What values are allowed for components of a pathname depends, in general, on the pathname's host. However, in order for pathnames to be usable in a system-independent way certain global conventions are adhered to. These conventions are stronger for the type and version than for the other components, since the type and version are actually understood by many programs, while the other components are usually just treated as something supplied by the user that only needs to be remembered. In general, programs can interpret the components of a pathname independent of the file system; and a certain minimum set of possible values of each component are supported on all file systems. The same pathname component value may have very different representations when the pathname is made into a string, depending on the file system. This does not affect programs that operate on the components. The user, when asked to type a pathname, always uses the system-dependent string representation. This is convenient for the user who moves between using the Lisp Machine on files stored on another host and making direct use of that host. However, when the mapping between string form and components is complicated, the components may not be obvious from what you type. The type is always a string, or one of the special symbols 2nil*, 2:unspecific*, and 2:wild*. Certain hosts impose a limit on the size of string allowed, often very small. Many programs that deal with files have an idea of what type they want to use. For example, Lisp source programs are usually 2"LISP"*, compiled Lisp programs are 2"QFASL"*, etc. However, these file type conventions are host-specific, for the important reason that some hosts do not allow a string five characters long to be used as the type. Therefore, programs should use a 1canonical type* rather than an actual string to specify their conventional default file types. Canonical types are described below. For the version, it is always legitimate to use a positive fixnum, or certain special symbols. 2nil*, 2:unspecific*, and 2:wild* have been explained above. The other standardly allowed symbols are 2:newest* and 2:oldest*. 2:newest* refers to the largest version number that exists when reading a file, or that number plus one when writing a new file. 2:oldest* refers to the smallest version number that exists. Some file systems may define other special version symbols, such as 2:installed* for example, or may allow negative numbers. Some do not support versions at all. Then a pathname may still contain any of the standard version components, but it does not matter what the value is. The device, directory, and name are more system-dependent. These can be strings (with host-dependent rules on allowed characters and length) or they can be 1structured*. A structured component is a list of strings. This is used for file system features such as hierarchical directories. The system is arranged so that programs do not need to know about structured components unless they do host-dependent operations. Giving a string as a pathname component to a host that wants a structured value converts the string to the appropriate form. Giving a structured component to a host that does not understand them converts it to a string by taking the first element and ignoring the rest. Some host file systems have features that do not fit into this pathname model. For instance, directories might be accessible as files, there might be complicated structure in the directories or names, or there might be relative directories, such as `<' in Multics. These features appear in the parsing of strings into pathnames, which is one reason why the strings are written in host-dependent syntax. Pathnames for hosts with these features are also likely to handle additional messages besides the common ones documented in this chapter, for the benefit of host-dependent programs that want to access those features. However, once your program depends on any such features, it will work only for certain file servers and not others; in general, it is a good idea to make your program work just as well no matter what file server is being used. =Node: 4Raw Components and Interchange Components* =Text: 3RAW COMPONENTS AND INTERCHANGE COMPONENTS* On some host file systems it is conventional to use lower-case letters in file names, while in others upper case is customary, or possibly required. When pathname components are moved from pathnames of one file system to pathnames of another file system, it is useful to convert the case if necessary so that you get the right case convention for the latter file system as a default. This is especially useful when copying files from one file system to another. The Lisp Machine system defines two representations for each of several pathname components (the device, directory, name and type). There is the 1raw* form, which is what actually appears in the filename on the host file system, and there is the 1interchange* form, which may differ in alphabetic case from the raw form. The raw form is what is stored inside the pathname object itself, but programs nearly always operate on the interchange form. The 2:name*, 2:type*, etc., operations return the interchange form, and the 2:new-name*, etc., operations expect the interchange form. Additional operations 2:raw-name*, etc., are provided for working with the raw components, but these are rarely needed. The interchange form is defined so that it is always customarily in upper case. If upper case is customary on the host file system, then the interchange form of a component is the same as the raw form. If lower case is customary on the host file system, as on Unix, then the interchange form has case inverted. More precisely, an all-upper-case component is changed to all-lower-case, an all-lower-case component is changed to all-upper-case, and a mixed-case component is not changed. (This is a one-to-one mapping). Thus, a Unix pathname with a name component of 2"foo"* has an interchange-format name of 2"FOO"*, and vice versa. For host file systems which record case when files are created but ignore case when comparing filenames, the interchange form is always upper case. The host component is not really a name, and case is always ignored in host names, so there is no need for two forms of host component. The version component does not need them either, because it is never a string. =Node: 4Pathname Component Operations* =Text: 3PATHNAME COMPONENT OPERATIONS :host* 1Operation on 2pathname** 3:device* 1Operation on 2pathname** 3:directory* 1Operation on 2pathname** 3:name* 1Operation on 2pathname** 3:type* 1Operation on 2pathname** 3:version* 1Operation on 2pathname** These return the components of the pathname, in interchange form. The returned values can be strings, special symbols, or lists of strings in the case of structured components. The type is always a string or a symbol. The version is always a number or a symbol. 3:raw-device* 1Operation on 2pathname** 3:raw-directory* 1Operation on 2pathname** 3:raw-name* 1Operation on 2pathname** 3:raw-type* 1Operation on 2pathname** These return the components of the pathname, in raw form. 3:new-device* 1dev* 1Operation on 2pathname** 3:new-directory* 1dir* 1Operation on 2pathname** 3:new-name* 1name* 1Operation on 2pathname** 3:new-type* 1type* 1Operation on 2pathname** 3:new-version* 1version* 1Operation on 2pathname** These return a new pathname that is the same as the pathname they are sent to except that the value of one of the components has been changed. The specified component value is interpreted as being in interchange form, which means its case may be converted. The 2:new-device*, 2:new-directory* and 2:new-name* operations accept a string (or a special symbol) or a list that is a structured name. If the host does not define structured components, and you specify a list, its first element is used. 3:new-raw-device* 1dev* 1Operation on 2pathname** 3:new-raw-directory* 1dir* 1Operation on 2pathname** 3:new-raw-name* 1name* 1Operation on 2pathname** 3:new-raw-type* 1type* 1Operation on 2pathname** These return a new pathname that is the same as the pathname they are sent to except that the value of one of the components has been changed. The specified component value is interpreted as raw. 3:new-suggested-name* 1name* 1Operation on 2pathname** 3:new-suggested-directory* 1dir* 1Operation on 2pathname** These differ from the 2:new-name* and 2:new-directory* operations in that the new pathname constructed has a name or directory based on the suggestion, but not necessarily identical to it. It tries, in a system-dependent manner, to adapt the suggested name or directory to the usual customs of the file system in use. For example, on a TOPS-20 system, these operations would convert 1name* or 1dir* to upper case, because while lower-case letters 1may* appear in TOPS-20 pathnames, it is not customary to generate such pathnames by default. 3:new-pathname* &rest 1options* 1Operation on 2pathname** This returns a new pathname that is the same as the pathname it is sent to except that the values of some of the components have been changed. 1options* is a list of alternating keywords and values. The keywords all specify values of pathname components; they are 2:host*, 2:device*, 2:directory*, 2:name*, 2:type*, and 2:version*. Alternatively, the keywords 2:raw-device*, 2:raw-directory*, 2:raw-name* and 2:raw-type* may be used to specify a component in raw form. Two additional keywords, 2:canonical-type* and 2:original-type*, allow the type field to be specified as a canonical type. See the following section for a description of canonical types. Also, the value specified for the keyword 2:type* may be a canonical type symbol. If an invalid component is specified, it is replaced by some valid component so that a valid pathname can be returned. You can tell whether a component is valid by specifying it in 2:new-pathname* and seeing whether that component of the resulting pathname matches what you specified. The operations 2:new-name*, etc., are equivalent to 2:new-pathname* specifying only one component to be changed; in fact, that is how those operations are implemented. =Node: 4Canonical Types* =Text: 3CANONICAL TYPES* 1Canonical types* are a way of specifying a pathname type component using host-dependent conventions without making the program itself explicitly host dependent. For example, the function 2compile-file* normally provides a default type of 2"LISP"*, but on VMS systems the default must be 2"LSP"* instead, and on Unix systems it is 2"l"*. What 2compile-file* actually does is to use a canonical type, the keyword 2:lisp*, as the default. This keyword is given a definition as a canonical type, which specifies what it maps into on various file systems. A single canonical type may have more than one mapping on a particular file system. For example, on TOPS-20 systems the canonical type 2:LISP* maps into either 2"LISP"* or 2"LSP"*. One of the possibilities is marked as ``preferred''; in this case, it is 2"LISP"*. The effect of this is that either 2FOO.LISP* or 2FOO.LSP* would be acceptable as having canonical type 2:lisp*, but merging yields 2"LISP"* as the type when defaulting from 2:lisp*. Note that the canonical type of a pathname is not a distinct component. It is another way of describing or specifying the type component. A canonical type must be defined before it is used. 3fs:define-canonical-type* 1symbol* 1standard-mapping* 1system-dependent-mappings...* 1Macro* Defines 1symbol* as a canonical type. 1standard-mapping* is the actual type component that it maps into (a string), with exceptions as specified by 1system-dependent-mappings*. Each element of 1system-dependent-mappings* (that is, each additional argument) is a list of the form 3(1system-type* 1preferred-mapping* 1other-mappings*...)* 1system-type* is one of the system-type keywords the 2:system-type* operation on a host object can return, such as 2:unix*, 2:tops20*, and 2:lispm* (see 4(FILENAMES-4)Parsing Hostnames*). The argument describes how to map this canonical type on that type of file system. 1preferred-map* (a string) is the preferred mapping of the canonical type, and 1other-mappings* are additional strings that are accepted as matching the canonical type. 1system-type* may also be a list of system types. Then the argument applies to all of those types of file systems. All of the mapping strings are in interchange form. For example, the canonical type 2:lisp* is defined as follows: 3(fs:define-canonical-type :lisp "LISP"* 3 (:unix "L" "LISP")* 3 (:vms "LSP")* 3 ((:tops20 :tenex) "LISP" "LSP"))* Other canonical types defined by the system include 2:qfasl*, 2:text*, 2:press*, 2:qwabl*, 2:babyl*, 2:mail*, 2:xmail*, 2:init*, 2:patch-directory*, 2:midas*, 2:palx*, 2:unfasl*, 2:widths*, 2:output*, 2mac*, 2tasm*, 2doc*, 2mss*, 2tex*, 2pl1* and 2clu*. The standard mapping for each is the symbol's pname. To match a pathname against a canonical type, use the 2:canonical-type* operation. 3:canonical-type* 1Operation on 2pathname** Returns two values which describe whether and how this pathname's type component matches any canonical type. If the type component is one of the possible mappings of some canonical type, the first value is that canonical type (the symbol). The second value is 2nil* if the type component is the preferred mapping of the canonical type; otherwise it is the actual type component, in interchange form. The second value is called the 1original type* of the pathname. If the type component does not match a canonical type, the first value is the type component in interchange form (a string), and the second value is 2nil*. This operation is useful in matching a pathname against a canonical type; the first value is 2eq* to the canonical type if the pathname matches it. The operation is also useful for transferring a type field from one file system to another while preserving canonical type; this is described below. A new pathname may also be constructed by specifying a canonical type. 3:new-canonical-type* 1canonical-type* &optional 1original-type* 1Operation on 2pathname** Returns a pathname different from this one in having a type component that matches 1canonical-type*. If 1original-type* is a possible mapping for 1canonical-type* on this pathname's host, then it is used as the type component. Otherwise, the preferred mapping for 1canonical-type* is used. If 1original-type* is not specified, it defaults to this pathname's type component. If it is specified as 2nil*, the preferred mapping of the canonical type is always used. If 1canonical-type* is a string rather than an actual canonical type, it is used directly as the type component, and the 1original-type* does not matter. The 2:new-pathname* operation accepts the keywords 2:canonical-type* and 2:original-type*. The 2:new-canonical-type* operation is equivalent to 2:new-pathname* with those keywords. Suppose you wish to copy the file named 1old-pathname* to a directory named 1target-directory-pathname*, possibly on another host, while preserving the name, version and canonical type. That is, if the original file has a name acceptable for a QFASL file, the new file should also. Here is how to compute the new pathname: 3(multiple-value-bind (canonical original)* 3 (send old-pathname :canonical-type)* 3 (send target-directory-pathname :new-pathname * 3 :name (send old-pathname :name)* 3 :version (send old-pathname :version)* 3 :canonical-type canonical* 3 :original-type original))* Suppose that 1old-pathname* is 2OZ:A.LISP.5*, where 2OZ* is a TOPS-20, and the target directory is on a VMS host. Then 2canonical* is 2:lisp* and 2original* is 2"LISP"*. Since 2"LISP"* is not an acceptable mapping for 2:lisp* on a VMS system, the resulting pathname has as its type component the preferred mapping for 2:lisp* on VMS, namely, 2"LSP"*. But if the target host is a Unix host, the new file's type is 2"LISP"*, since that is an acceptable (though not preferred) mapping for 2:lisp* on Unix hosts. If you would rather that the preferred mapping always be used for the new file's type, omit the 2:original-type* argument to the 2:new-pathname* operation. This would result in a type component of 2"L"* in interchange form, or 2"l"* in raw form, in the new file's pathname. The function 2compile-file* actually does something cleverer than using the canonical type as a default. Doing that, and opening the resulting pathname, would look only for the preferred mapping of the canonical type. 2compile-file* actually tries to open 1each* possible mapping, trying the preferred mapping first. Here is how it does so: 3:open-canonical-default-type* 1canonical-type* &rest 1options* 1Operation on 2pathname** If this pathname's type component is non-2nil*, the pathname is simply opened, passing the 1options* to the 2:open* operation. If the type component is 2nil*, each mapping of 1canonical-type* is tried as a type component, in the order the mappings appear in the canonical type definition. If an open succeeds, a stream is returned. The possibilities continue to be tried as long as 2fs:file-not-found* errors happen; other errors are not handled. If all the possibilities fail, a 2fs:file-not-found* error is signaled for the caller, with a pathname that contains the preferred mapping as its type component.