;;; -*- Mode:gate; Fonts:(HL12 HL12I HL12B CPTFONTB HL12BI HL12B HL12I ) -*- =Node: Introduction to Characters and Strings =Text: 3INTRODUCTION TO CHARACTERS AND STRINGS* A string is a one-dimensional array representing a sequence of characters. The printed representation of a string is its characters enclosed in quotation marks, for example 2"foo bar"*. Strings are constants, that is, evaluating a string returns that string. Strings are the right data type to use for text-processing. Individual characters can be represented by 1character objects* or by fixnums. A character object is actually the same as a fixnum except that it has a recognizably different data type and prints differently. Without escaping, a character object is printed by outputting the character it represents. With escaping, a character object prints as 2#\1char** in Common Lisp syntax or as 2#/1char** in traditional syntax; see 4(CHARSTR-1)Components of a Character* and 4(READPRINT-2)Sharp-sign Constructs*. By contrast, a fixnum would in all cases print as a sequence of digits. Character objects are accepted by most numeric functions in place of fixnums, and may be used as array indices. When evaluated, they are constants. The character object data type was introduced recently for Common Lisp support. Traditionally characters were always represented as fixnums, and nearly all system and user code still does so. Character objects are interchangeable with fixnums in most contexts, but not in 2eq*, which is often used to compare the result of the stream input operations such as 2:tyi*, since that might be 2nil*. Therefore, the stream input operations still return fixnums that represent characters. Aside from this, Common Lisp functions that return a character return a character object, while traditional functions return a fixnum. The fixnum which is the character code representing 1char* can be written as 2#/1char** in traditional syntax. This is equivalent to writing the fixnum using digits, but does not require you to know the character code. Most strings are arrays of type 2art-string*, where each element is stored in eight bits. Only characters with character code less than 256 can be stored in an ordinary string; these characters form the type 2string-char*. A string can also be an array of type 2art-fat-string*, where each element holds a sixteen-bit unsigned fixnum. The extra bits allow for multiple fonts or an expanded character set. Since strings are arrays, the usual array-referencing function 2aref* is used to extract characters from strings. For example, 2(aref "frob" 1)* returns the representation of lower case 2r*. The first character is at index zero. Conceptually, the elements of a string are character objects. This is what Common Lisp programs expect to see when they do 2aref* (or 2char*, which on the Lisp Machine is synonymous with 2aref*) on a string. But nearly all Lisp Machine programs are traditional, and expect elements of strings to be fixnums. Therefore, 2aref* of a string actually returns a fixnum. A distinct version of 2aref* exists for Common Lisp programs. It is 2cli:aref* and it does return character objects if given a string. For all other kinds of arrays, 2aref* and 2cli:aref* are equivalent. 3(aref "Foo" 1) => #o157* 3(cli:aref "Foo" 1) => #/o* It is also legal to store into strings, for example using 2setf* of 2aref*. As with 2rplaca* on lists, this changes the actual object; you must be careful to understand where side-effects will propagate. It makes no difference whether a character object or a fixnum is stored. When you are making strings that you intend to change later, you probably want to create an array with a fill-pointer (see 4(ARRAYS-1)Extra Features of Arrays*) so that you can change the length of the string as well as the contents. The length of a string is always computed using 2array-active-length*, so that if a string has a fill-pointer, its value is used as the length. The functions described in this section provide a variety of useful operations on strings. In place of a string, most of these functions accept a symbol or a fixnum as an argument, coercing it into a string. Given a symbol, its print name, which is a string, is used. Given a fixnum, a one-character string containing the character designated by that fixnum is used. Several of the functions actually work on any type of one-dimensional array and may be useful for other than string processing; these are the functions such as 2substring* and 2string-length* which do not depend on the elements of the string being characters. The generic sequence functions in chapter 4(GENERIC-0)Generic Sequence Functions* may also be used on strings. =Node: Characters =Text: 3CHARACTERS* The Lisp Machine data type for character objects is a recent addition to the system. Most programs still use fixnums to represent characters. Common Lisp programs typically work with actual character objects but programs traditionally use fixnums to represent characters. The new Common Lisp functions for operating with characters have been implemented to accept fixnums as well, so that they can be used equally well from traditional programs. 3characterp* 1object* 2t* if 1object* is a character object; 2nil* otherwise. In particular, it is 2nil* if 1object* is a fixnum such as traditional programs use to represent characters. 3character* 1object* Coerces 2object* to a single character, represented as a fixnum. If 2object* is a number, it is returned. If 2object* is a string or an array, its first element is returned. If 2object* is a symbol, the first character of its pname is returned. Otherwise an error occurs. The way characters are represented as fixnums is explained in 4(CHARSTR-1)Components of a Character*. 3cli:character* 1object* Coerces 1object* into a character and returns the character as a character object for Common Lisp programs. 3int-char* 1fixnum* Converts 1fixnum*, regarded as representing a character, to a character object. This is a special case of 2cli:character*. 2(int-char #o101)* is the character object for 2A*. If a character object is given as an argument, it is returned unchanged. 3char-int* 1char* Converts 1char*, a character object, to the fixnum which represents the same character. This is the inverse of 2int-char*. It may also be given a fixnum as argument, in which case the value is the same fixnum. =Node: Components of a Character =Text: 3COMPONENTS OF A CHARACTER* A character object, or a fixnum which is interpreted as a character, contains three separate pieces of information: the 1character code*, the 1font number*, and the 1modifier bits*. Each of these things is an integer from a fixed range. The character code ranges from 0 to 377 (octal), the font number from 0 to 377 (octal), and the modifier bits from 0 to 17 (octal). These numeric constants should not appear in programs; instead, use the constant symbols 2char-code-limit*, and so on, described below. Ordinary strings can hold only characters whose font number and modifier bits are zero. Fat strings can hold characters with any font number, but the modifier bits must still be zero. Character codes less than 200 octal are printing graphics; when output to a device they are assumed to print a character and move the cursor one character position to the right. (All software provides for variable-width fonts, so the term ``character position'' shouldn't be taken too literally.) Character codes 200 through 236 octal are used for special characters. Character 200 is a ``null character'', which does not correspond to any key on the keyboard. The null character is not used for anything much; 2fasload* uses it internally. Characters 201 through 236 correspond to the special function keys on the keyboard such as 2Return* and 2Call*. The remaining character codes 237 through 377 octal are reserved for future expansion. Most of the special characters do not normally appear in files (although it is not forbidden for files to contain them). These characters exist mainly to be used as ``commands'' from the keyboard. A few special characters, however, are ``format effectors'' which are just as legitimate as printing characters in text files. The names and meanings of these characters are: 2Return* The ``newline'' character, which separates lines of text. We do not use the PDP-10 convention which separates lines by a pair of characters, a ``carriage return'' and a ``linefeed''. 2Page* The ``page separator'' character, which separates pages of text. 2Tab* The ``tabulation'' character, which spaces to the right until the next ``tab stop''. Tab stops are normally every 8 character positions. The space character is considered to be a printing character whose printed image happens to be blank, rather than a format effector. When a letter is typed with any of the modifier bit keys (2Control*, 2Meta*, 2Super*, or 2Hyper*), the letter is normally upper-case. If the 2Shift* key is pressed as well, then the letter becomes lower-case. This is exactly the reverse of what the 2Shift* key does to letters without control bits. (The 2Shift-lock* key has no effect on letters with control bits.) 3char-code* 1char* 3char-font* 1char* 3char-bits* 1char* Return the character code of 1char*, the font number of 1char*, and the modifier bits value of 1char*. 1char* may be a fixnum or a character object; the value is always a fixnum. These used to be written as 3(ldb %%ch-char 1char*)* 3(ldb %%ch-font 1char*)* 3(ldb %%ch-control-meta 1char*)* Such use of 2ldb* is frequent but obsolete. 3char-code-limit* 1Constant* A constant whose value is a bound on the maximum code of any character. In the Lisp Machine, currently, it is 400 (octal). 3char-font-limit* 1Constant* A constant whose value is a bound on the maximum font number value of any character. In the Lisp Machine, currently, it is 400 (octal). 3char-bits-limit* 1Constant* A constant whose value is a bound on the maximum modifier bits value of any character. In the Lisp Machine, currently, it is 20 (octal). Thus, there are four modifier bits. These are just the familiar Control, Meta, Super and Hyper bits. 3char-control-bit* 1Constant* 3char-meta-bit* 1Constant* 3char-super-bit* 1Constant* 3char-hyper-bit* 1Constant* Constants with values 1, 2, 4 and 8. These give the meanings of the bits within the bits-field of a character object. Thus, 2(bit-test char-meta-bit (char-bits 1char*))* would be non-2nil* if 1char* is a meta-character. (This can also be tested with 2char-bit*.) 3char-bit* 1char* 1name* 2t* if 1char* has the modifier bit named by 1name*. 1name* is one of the following four symbols: 2:control*, 2:meta*, 2:super*, and 2:hyper*. 3(char-bit #\meta-x :meta) => t. set-char-bit* 1char* 1name* 1newvalue* Returns a character like 1char* except that the bit specified by 1name* is present if 1newvalue* is non-2nil*, absent otherwise. Thus, 3(set-char-bit #\x :meta t) => #\meta-x.* The value is a fixnum if 1char* is one; a character object if 1char* is one. Until recently the only way to access the character code, font and modifier bits was with 2ldb*, using the byte field names listed below. Most code still uses that method, but it is obsolete; 2char-bit* should be used instead. 2%%kbd-char* 2%%ch-char*Specifies the byte containing the character code. 2%%ch-font* Specifies the byte containing the font number. 2%%kbd-control* Specifies the byte containing the Control bit. 2%%kbd-meta* Specifies the byte containing the Meta bit. 2%%kbd-super* Specifies the byte containing the Super bit. 2%%kbd-hyper* Specifies the byte containing the Hyper bit. 2%%kbd-control-meta* Specifies the byte containing all the modifier bits. Characters are sometimes used to represent mouse clicks. The character says which button was pressed and how many times. Refer to the Window System manual for an explanation of how these characters are generated. 3tv:kbd-mouse-p* 1char* 2t* if 1char* is a character used to represent a mouse click. Such characters are always distinguishable from characters that represent keyboard input. 3%%kbd-mouse-button* 1Constant* The value of 2%%kbd-mouse-button* is a byte specifier for the field in a mouse signal that says which button was clicked. The byte contains 20*, 21*, or 22* for the left, middle, or right button, respectively. 3%%kbd-mouse-n-clicks* 1Constant* The value of 2%%kbd-mouse-n-clicks* is a byte specifier for the field in a mouse signal that says how many times the button was clicked. The byte contains one less than the number of times the button was clicked. =Node: Constructing Character Objects =Text: 3CONSTRUCTING CHARACTER OBJECTS code-char* 1code* &optional 1(bits* 10)* 1(font* 10)* 3make-char* 1code* &optional 1(bits* 10)* 1(font* 10)* Returns a character object made from 1code*, 1bits* and 1font*. Common Lisp says that not all combinations may be valid, and that 2nil* is returned for an invalid combination. On the Lisp Machine, any combination is valid if the arguments are valid individually. According to Common Lisp, 2code-char* requires a number as a first argument, whereas 2make-char* requires a character object, whose character code is used. On the Lisp Machine, either function may be used in either way. 3digit-char* 1weight* &optional 1(radix* 110.)* 1(font* 10)* Returns a character object which is the digit with the specified weight, and with font as specified. However, if there is no suitable character which has weight 1weight* in the specified radix, the value is 2nil*. If the ``digit'' is a letter (which happens if 1weight* is greater than 9), it is returned in upper case. 3tv:make-mouse-char* 1button* 1n-clicks* Returns the fixnum character code that represents a mouse click in the standard way. 2tv:mouse-char-p* of this value is 2t*. 1button* is 0 for the leftbutton, 1 for the middle button, or 2 for the right button. 1n-clicks* is one less than the number of clicks (1 for a double click, 0 normally). =Node: The Character Set =Text: 3THE CHARACTER SET* Here are the numerical values of the characters in the Zetalisp character set. It should never be necessary for a user or a source program to know these values. Indeed, they are likely to be changed in the future. There are symbolic names for all characters; see the section on character names, below. It is worth pointing out that the Zetalisp character set is different from the ASCII character set. File servers operating on hosts that use ASCII for storing text files automatically perform character set conversion when text files are read or written. The details of the mapping are explained in 4(FILEACCESS-4)File Servers*. 3 The Lisp Machine Character Set* 3 (all numbers in octal)* 3000 center-dot () 4 *041 ! 101 A 141 a* 3001 down arrow ()* 3002 alpha () 042 " 102 B 142 b* 3003 beta () 043 # 103 C 143 c* 3004 and-sign () 044 $ 104 D 144 d* 3005 not-sign () 045 % 105 E 145 e* 3006 epsilon () 046 & 106 F 146 f* 3007 pi () 047 ' 107 G 147 g* 3010 lambda4 *() 050 ( 110 H 150 h* 3011 gamma ( ) 051 ) 111 I 151 i* 3012 delta ( ) 052 * 112 J 152 j* 3013 uparrow ( ) 053 + 113 K 153 k* 3014 plus-minus ( ) 054 , 114 L 154 l* 3015 circle-plus ( ) 055 - 115 M 155 m* 3016 infinity () 056 . 116 N 156 n* 3017 partial delta () 057 / 117 O 157 o* 3020 left horseshoe () 060 0 120 P 160 p* 3021 right horseshoe () 061 1 121 Q 161 q* 3022 up horseshoe () 062 2 122 R 162 r* 3023 down horseshoe () 063 3 123 S 163 s* 3024 universal quantifier () 064 4 124 T 164 t* 3025 existential quantifier () 065 5 125 U 165 u* 3026 circle-X () 066 6 126 V 166 v* 3027 double-arrow () 067 7 127 W 167 w* 3030 left arrow () 070 8 130 X 170 x* 3031 right arrow () 071 9 131 Y 171 y* 3032 not-equals () 072 : 132 Z 172 z* 3033 diamond (altmode) () 073 ; 133 [ 173 {* 3034 less-or-equal () 074 < 134 \ 174 |* 3035 greater-or-equal () 075 = 135 ] 175 }* 3036 equivalence () 076 > 136 ^ 176 ~* 3037 or () 077 ? 137 _ 177 * 3200 Null character 210 Overstrike 220 Stop-output 230 Roman-iv* 3201 Break 211 Tab 221 Abort 231 Hand-up* 3202 Clear 212 Line 222 Resume 232 Hand-down* 3203 Call 213 Delete 223 Status 233 Hand-left* 3204 Terminal escape 214 Page 224 End 234 Hand-right* 3205 Macro/backnext 215 Return 225 Roman-i 235 System* 3206 Help 216 Quote 226 Roman-ii 236 Network* 3207 Rubout 217 Hold-output 227 Roman-iii* 3237-377 reserved for the future* =Node: Classifying Characters =Text: 3CLASSIFYING CHARACTERS string-char-p* 1char* 2t* if 1char* is a character that can be stored in a string. On the Lisp Machine, this is true if the font and modifier bits of 1char* are zero. 3standard-char-p* 1char* 2t* if 1char* is a standard Common Lisp character: any of the 95 ASCII printing characters (including 2Space*), and the 2Return* character. Thus 2(standard-char-p #\end)* is 2nil*. 3graphic-char-p* 1char* 2t* if 1char* is a graphic character; one which has a printed shape. 2A*, 2-*, 2Space* and 3* are all graphic characters; 2Return*, 2End* and 2Abort* are not. A character whose modifier bits are nonzero is never graphic. Ordinary output to windows prints graphic characters using the current font. Nongraphic characters are printed using lozenges unless they have special formatting meanings (as 2Return* does). 3alpha-char-p* 1char* 2t* if 1char* is a letter with zero modifier bits. 3digit-char-p* 1char* &optional 1(radix* 110.)* If 1char* is a digit available in the specified radix, returns the 1weight* of that digit. Otherwise, it returns 2nil*. If the modifier bits of 1char* are nonzero, the value is always 2nil*. (It would be more useful to ignore the modifier bits, but this decision provides Common Lisp with a foolish consistency.) Examples: 3(digit-char-p #\8 8) => nil* 3(digit-char-p #\8 9) => 8* 3(digit-char-p #\F 16.) => 15.* 3(digit-char-p #\c-8 1anything*) => nil alphanumericp* 1char* 2t* if 1char* is a letter or a digit 0 through 9, with zero modifier bits. =Node: Comparing Characters =Text: 3COMPARING CHARACTERS char-equal* &rest 1chars* This is the primitive for comparing characters for equality; many of the string functions call it. The arguments may be fixnums or character objects indiscriminately. The result is 2t* if the characters are equal ignoring case, font and modifier bits, otherwise 2nil*. 3char-not-equal* &rest 1chars* 2t* if the arguments are all different as characters, ignoring case, font and modifier bits. 3char-lessp* &rest 1chars* 3char-greaterp* &rest 1chars* 3char-not-lessp* &rest 1chars* 3char-not-greaterp* &rest 1chars* Ordered comparison of characters, ignoring case, font and modifier bits. These are the primitives for comparing characters for order; many of the string functions call it. The arguments may be fixnums or character objects. The result is 2t* if the arguments are in strictly increasing (strictly decreasing, nonincreasing, nondecreasing) order. Details of the ordering of characters are in 4(CHARSTR-1)Components of a Character*. 3char=* 1char1* &rest 1chars* 3char//=* 1char1* &rest 1chars* 3char>* 1char1* &rest 1chars* 3char<* 1char1* &rest 1chars* 3char>=* 1char1* &rest 1chars* 3char<=* 1char1* &rest 1chars* These are the Common Lisp functions for comparing characters and including the case, font and bits in the comparison. On the Lisp Machine they are synonyms for the numeric comparison functions 2=*, 2>*, etc. Note that in Common Lisp syntax you would write 2char/=*, not 2char//=*. =Node: Character Names =Text: 3CHARACTER NAMES* Characters can sometimes be referred to by long names; as, for example, in the 2#\* construct in Lisp programs. Every basic character (zero modifier bits) which is not a graphic character has one or more standard names. Some graphic characters have standard names too. When a non-graphic character is output to a window, it appears as a lozenge containing the character's standard name. 3char-name* 1char* Returns the standard name (or one of the standard names) of 1char*, or 2nil* if there is none. The name is returned as a string. 2(char-name #\space)* is the string 2"SPACE"*. If 1char* has nonzero modifier bits, the value is 2nil*. Compound names such as 2Control-X* are not constructed by this function. 3name-char* 1name* Returns (as a character object) the character for which 1name* is a name, or returns 2nil* if 1name* is not a recognized character name. 1name* may be a symbol or a string. Compound names such as 2Control-X* are not recognized. 2read* uses this function to process the 2#\* construct when a character name is encountered. The following are the recognized special character names, in alphabetical order except with synonyms together. Character names are encoded and decoded by the functions 2char-name* and 2name-char* (4(CHARSTR-1)Character Names*). First a list of the special function keys. 2abort* 2break* 2call* 2clear-input, clear* 2delete* 2end* 2hand-down* 2hand-left* 2hand-right* 2hand-up* 2help* 2hold-output* 2line,lf* 3 * 2macro,3 *back-next* 2network* 2overstrike, backspace, bs* 3 * 2page, form, clear-screen* 2quote* 2resume* 2return, cr* 2roman-i* 2roman-ii* 2roman-iii* 2roman-iv* 2rubout* 2space, sp* 2status* 2stop-output* 2system* 2tab* 2terminal, esc* These are printing characters that also have special names because they may be hard to type on the hosts that are used as file servers. 2altmode* 3 * 2circle-plus* 2delta* 2gamma* 2integral* 3 2lambda** 2plus-minus* 2uparrow* 2center-dot* 3 * 2down-arrow* 2alpha* 2beta* 2and-sign* 2not-sign* 2epsilon* 2pi* 2lambda* 2gamma* 2delta* 2up-arrow* 2plus-minus* 2circle-plus* 2infinity* 2partial-delta* 2left-horseshoe* 2right-horseshoe* 2up-horseshoe* 2down-horseshoe* 2universal-quantifier* 2existential-quantifier* 2circle-x* 2double-arrow* 2left-arrow* 2right-arrow* 2not-equal* 2altmode* 2less-or-equal* 2greater-or-equal* 2equivalence* 2or-sign* The following names are for special characters sometimes used to represent single and double mouse clicks. The buttons can be called either 2l*, 2m*, 2r* or 21*, 22*, 23* depending on stylistic preference. 2mouse-l-1 or mouse-1-1* 2mouse-l-2 or mouse-1-2* 2mouse-m-1 or mouse-2-1* 2mouse-m-2 or mouse-2-2* 2mouse-r-1 or mouse-3-1* 2mouse-r-2 or mouse-3-2