
								01/02/91

			   BIT - Test Descriptions


   This document briefly describes the major test scripts and processes
which make up BIT.  For more details, see the respective tests themselves,
most of which are Bourne Shell scripts.

   BIT is MIPS' Manufacturing's Online Burn-In Test Program.  It is useful
for verifying that system hardware remains functional over a period of
time.  MIPS uses this program to monitor, exercise, and verify hardware
operation before, during, and after system burn-in.  

   The BIT test suite consists of Bourne shell scripts, data files, and
programs written in C and Fortran.  BIT runs on top of UNIX and uses
standard UNIX utilities.


BIT:

The BIT script is used to start the BIT program.  It begins by querying the
user for the tests to run (CONFIG.BIT).  It then initializes and starts
each of the tests selected (*.BIT).  Finally, it starts the monitor program
(watch.BIT) which displays test results until the user aborts the program.


CONFIG.BIT, DATA.BIT:

BIT begins by verifying that the user is logged in as 'bit'.  It then tests
for the existence of the file DATA.BIT which is the saved configuration
information.  If non-existent, or if the user decides to alter the
configuration information, the shell script, CONFIG.BIT is run.  CONFIG.BIT
displays the current configuration information and allows the user to
either run the same configuration, or enter a new one.  If the user decides
to modify the configuration, he is prompted for each of the various test
parameters.

Once a user "ok's" the system test configuration, the BIT script saves the
information in the file DATA.BIT.  This file is then parsed for the
relevant test information.  The BIT script then creates processes to
'execute' each of the user selected tests.  Where necessary, it performs
any necessary initialization (ie. creating files for the disk tests).
Finally, BIT executes the watch.BIT script.


watch.BIT:

The watch.BIT script endlessly monitors the test results of each of
the active tests.  On each pass, this script monitors errors by
calling check.BIT on each active test.  A count down of each test as
it is checked is displayed on the monitor.  When all tests are
checked, a summary of the test results are displayed on the monitor.
When the user enters "CTRL-C", this script calls clean.BIT to kill all
active tests, and then exits.


check.BIT, grab.BIT:

These scripts monitor the PID.ERR* and PID.PASS* files and generate
the status information displayed by watch.BIT.  If there are errors,
the most recent error is displayed.  If there are no errors, the
number of successful passes is displayed.  If an error file becomes
excessively large, the corresponding test process is killed and
recorded.


clean.BIT:

This script is called from watch.BIT when the user enters CTRL-C.  It
kills all tests that are running..  It appends all ERR and SAV files
onto the file REPORT.BIT using the shell script, STATUS.  Then it
removes all temporary files and directories that were created during
initialization in BIT.  Only one clean.BIT process is allowed to run
at one time; multiple invocations are prevented by the lock file,
LOCK.BIT.clean.

When this script finishes, the machine should be back in the same
state as immediately after the BIT tests were first loaded, except for
the files DATA.BIT, REPORT.BIT, LOG.BIT, and LOCK.BIT.clean.  The BIT
test can be restarted at this point.

The tests available are as follows:


cart.BIT:

This is a shell script that exercises the cartridge tape drive.  It
runs in an endless loop until killed by clean.BIT.  The test is comprised
of nine phases: initialize, make directories, copy files, backup files
to tape, re-initialize, restore files, compare files, generate
checksums, and compare checksums.  The cartridge tape is accessed through
the ctape0 and ctape4 device drivers which normally control scsi device
number 6.

Prior to entering the test loop, an error file, PID.ERRcart, is created,
where PID is the process id of this test.  It begins each loop passs by
removing (recursively) all previously created temporary files and
directories..  It next copies the arbitrary file, '/bin/find' or '/bin/who'
to PID.data.  Seven successive mkdir's are performed to create seven
successively lower directory levels.  After two sync's, the PID.data file
is successively copied to each of the seven directories.

A 'tar cfb' command is then run to recursively copy all directories
and data files to the streamer tape.  IF any errors are generated,
they are appended to the PID.ERRcart file along with the current time
and pass count and then the test loop is restarted.

The temporary file, PID.data, and all temporary directories and data
files are recursively removed.  After syncing the disk, the file
PID.data is again created.  Then, a 'tar xfb' comand is run to extract
the files and directories previously written to tape.  The PID.data
file is compared with each of the seven copies extracted from the tape
using both 'diff' and 'sum'.  Any differences are recorded in
PID.ERRcart.  The test then sleeps for a number of seconds specified
in the CONFIGURATION file.

The cart.BIT script can be run by itself by entering 'cart.BIT &'.
Results are monitored by reading PID.ERRcart and PID.PASScart.


comp.BIT:

This script compiles (arbitrarily) a C source file found in the src.d
directory, test.c, and saves the results.  It then goes into a loop of
compiling test.c and comparing the image against the original file.  The
capability exists to run multiple comp.BIT' scripts using unique copies of
the source program.  Multiple invocations can be created by the BIT script.
If the object code produced doesn't match the reference object code file,
an error is appended to the PID.ERRcomp error file.

The comp.BIT script can be run by itself by entering 'comp.BIT
<source> p &' where <source> is the name of a 'C' source file without
a '.c' appended.  Results can be monitored by reading PID.ERRcomp and
PID.PASScomp.


disk.BIT:

This script runs a disk test against the file specified in CONFIGURATION.
This file can either be a raw disk partition, a regular file, or a remote
mounted file (nfs).  For a raw file, the entire partition is tested.
For a regular file, 90% of the available disk space on the partition on
which the file resides will be tested.  For a remote file, 
DEFAULT_DISK_SIZE is tested if NFS_DISK_SIZING is FIXED, otherwise, 90%
is tested like a regular file.  A regular file will be deleted
before testing starts and then again afterwards.

This script is used during the burn-in test to test the specified disk in a
multi-disk system running UNIX.

If this test detects an error, then there is something seriously wrong
with the disk.  A much more common result will be soft errors logged
to the system console_log.  These soft errors should be evaluated to
determine whether they indicate bad blocks that need to be mapped out.

Each pass of this test constitutes one full pass across the tested
portion of the disk.  The first 8 passes are all different, then the
sequence is repeated.  So, one complete pass indicates that the 
entire tested portion of the disk has been accessed, while 8 passes 
means the disk has been hit with everything we've got (but takes a while).

If a test fails, that entire test is repeated once on the next pass.

The 8 different tests are as follows:

 Pass 1:  Test 1: Sequential Write               of DISK-SPECIFIC data.
 Pass 2:  Test 2: Sequential Read                of DISK-SPECIFIC data.
 Pass 3:  Test 3: Sequential Read/Random Seek    of DISK-SPECIFIC data.
 Pass 4:  Test 4: Random Seek                    
 Pass 5:  Test 5: Butterfly Write/Random Seek    of DISK-SPECIFIC data.
 Pass 6:  Test 6: Butterfly Read/Butterfly Seek  of DISK-SPECIFIC data.
 Pass 7:  Test 7: Random Write/Butterfly Seek    of RANDOM data.
 Pass 8:  Test 8: Random Read/Random Seek        of RANDOM data.
 Pass 9:  Test 1  ...

The first pass of sequential writes creates the entire file.  Under the
UNIX file system, the entire file won't be created until it is all accessed.
Writes and reads are split up so that reads actually come from the disk
rather than from buffers.  The goal was the most comprehensive mixture in
the minimum passes.


exabyte.BIT:

This script exercises the Exabyte tape drive.  It runs in an endless loop
until killed by clean.BIT.  The test is comprised of nine phases:
initialize, make directories, copy files, backup files to tape,
re-initialize, restore files, compare files, generate checksums, and
compare checksums.  The Exabyte tape drive is accessed through the hc0 and
hc4 device drivers which normally control scsi device number 4.  

This script differs from cart and nine in that the order is changed 
to prevent pauses in the middle of the tape which tend to wear it out.
It also transfers more data to better test the exabyte.  This means
that this test alone requires 10MB to run.


fp.BIT:

This script runs various Fortran programs to test floating point arithmetic.
The test suite contains the following tests: 
	whetstone, 
 	alogt asint atant dasint datant dexpt dlogt dpowert dsinht dsint
	dsqrtt dtanht dtant expt powert sinht sint sqrtt tanht tant parc
	pi1 pi2 pi3
	spice-benchmark spice-bipole spice-digsr spice-toronto
	doducd-small doducd-big
	linpackd linpacks	
There are two versions of the test, SHORT and LONG.  The LONG test includes
all tests while SHORT omits the following:
	pi2 pi3 spice-bipole spice-digsr spice-toronto doducd-small doducd-big

The tests consist of running each benchmark and comparing the results against
a reference file containing the correct results.  On failure, the differences
are displayed.  The failure results are limited to 30 lines per failure.
In addition, the first test, whet, is timed.

Some versions of BIT include a second set of fp tests which test the MIPS2
instruction set.  For each of the regular FP tests, there is a second
version whose name ends in '-mips2'.  Only systems using the R6010 FP chip
require these tests.  On smaller systems, these images may be omitted in
order to conserve space.

This test can be run individually by entering 'fp.BIT {short|long} [mips2]' 
and monitoring the *fp files.


floppy.BIT:

This script exercises the floppy disk drive.  It runs in an endless loop
until killed by clean.BIT.  The test is comprised of six phases:
format, setup (make filesystem and mount), copy and compare files to
the block device, backup files to the raw device, restore backup files,
and compare backup files.  The floppy disk driver is accessed through
the fd0t01c block and raw device drivers.


graphic.BIT:

This script uses Xlib routines to exercise the memory in a color graphics
display.  It runs in an endless loop, chaning the color of the test on
each pass, until killed by clean.BIT.
After creating and mapping a simple window, the test fills the window
and then tests each pixel in the bitmap for the proper color.  The
display is closed after each pass.

This test can be run indiviually by entering
'graphic.BIT [-root] [-server <name>] [-delay <seconds>] &'.


mport.BIT:

This C program opens the tty ports on a system and then generates
random strings which it writes to and reads from the ports through
jumpers installed by the operator.


net.BIT:

This script find another host on the local network with a specified
prefix and generates ethernet traffic to it using the 'ping' command.
The systems in burn-in are assumed to have a common prefix such as
'mcs'.  To distribute the network load, the systems access other hosts
in a round-robin fashion.  If all else fails, a centrally designated
host (as defined in CONFIGURATION) is accessed.


nine.BIT:

This script exercises the nine-track tape drive.  It runs in an endless
loop until killed by clean.BIT.  The test is comprised of nine phases:
initialize, make directories, copy files, backup files to tape,
re-initialize, restore files, compare files, generate checksums, and
compare checksums.  The nine-track tape is accessed through the h0 and h4
device drivers which normally control scsi device number 5.  This test is
similar to cart.BIT except for the device driver being used.


port.BIT:

The serial port exerciser is compiled from 'C' source code.  It
requires loop-back connectors from port 0 to 1, port 2 to 3, etc.  For
a given 'loop-backed' port pair, the exerciser sends random
characters from one port to the other and vice-versa.  Baud rates used
are 1200, 2400, 4800, 9600, and 19200.  The test runs in an endless
loop sequencing through the port pairs until killed.  Any errors are
appended to the file, PID.ERRport.

This test can be run by itself by entering 'port.BIT 16 16 &'.  The
first argument is the number of ports to test and the second argument
is the number of ports that the ISI/CP board being tested has (either
8 or 16).  Debug information will be displayed continuously on the
screen, indicating the current transmit port, receive port, baud rate,
and number of characters transmitted and received.


purge.BIT:

This script runs in parallel with the disk.BIT test.  It creates a 
large file (8 copies of /unix) which it then continuously reads.
This prevents the files in disk.BIT from becoming resident in
memory and making the test useless.


sysmsg.BIT:

This script monitors the system error log for messages.  It runs in
an endless loop until killed by clean.BIT.  The files to monitor
are passed to the script as parameters.  On the first
pass, the script makes a copy of each specified file.  On each successive pass,
the files are compared against the copies and any differences are reported.

This test also monitors the kernel variable, cpe_count.  This variable
gives the count of cache parity errors since UNIX was booted.  sysmsg.BIT
checks this count every 5 to 10 minutes and insures that the rate of 
cache parity errors does not exceed a prescribed limit.  Cache parity
errors are soft errors and insignificantly affect performance, however,
they may be a warning sign of other problems.

This test can be run individually by entering 
'sysmsg.BIT <delay> <logfile1> [<logfile2> ...] &'.


mem.BIT: (previously name trap.BIT)

This test is a compiled 'C' program.  It mallocs a chunk of memory and
performs a moving inversions memory test on it.  This test is usually
started last so that other processes can grab whatever memory they need.

If the number of bytes are specified on the command line, then mem.BIT
attempts to malloc that amount.  Otherwise, the current available free 
memory will be malloced.  If the malloc call fails, often because of
allocation limits within the operating system, then the test retries
for less memory (about 1MB less each time) until the malloc succeeds.
The minimum amount of memory that will be tested is about 1MB.
During this initialization phase, watch.BIT will indicate that the mem
test is "ON PASS 0".  The amount of memory malloc'd is output in the file
PID.OPTmem and displayed by watch.BIT.

The test first clears the memory, then performs a 
forward moving inversions memory test followed by a reverse moving 
inversions test.  On an error, the address, expected, and actual data values
are appended to the file PID.ERRmem.  Finally, the malloc'ed memory is free'd
and the loop is repeated.  After each complete pass, the pass count
is incremented by one.  If the amount of memory under test is large
(>= 10MB), then on passes 1 through 10, a letter is appended to the 
passcount to reflect each complete pass through memory.  This is allows
the user to see that the memory test is progressing without having
to wait for a full pass.

Systems with large memories are tested by running several mem.BIT
tests.  The amount of free memory is determined by calling 'mem.BIT FREE'.
Based on this amount, one mem.BIT process is started for each 32MB of 
memory available, plus one extra test to cover the remainder.
This allocation algorithm is somewhat non-deterministic due to variations
in "free memory".  For instance, if a large application is run on a system
prior to running BIT, it may remain in memory long enough to fool BIT
into testing less memory.  However, these allocation variations only
tend to occur on large memory configurations where a 10MB-20MB variation
is not a problem.

This test can be run individually by entering 'mem.BIT &' or
'mem.BIT <bytes> &'.  The current amount of free memory can be determined 
by entering the command 'mem.BIT FREE'.
