At this point an extended sample shell script is supplied: see ``Writing a readability analysis program: an example''. This example explains how to do the following:
The syntax described in this chapter is common to the Bourne shell
and the Korn shell; for C shell syntax, refer to
csh(C)
in the Operating System User's Reference.
Creating a shell script
A shell script is a text file containing a sequence of shell
commands. The commands are normally entered on separate lines, for
readability, but can be separated by semicolons (;).
To create a shell script, create a new file with a text editor (for
example vi) and type SCO OpenServer commands into it. Save
the file, and use chmod to set the executable
permission bit on the file so the shell can run it. For example:
vi is.logged.in
Enter the following text:
who | grep fred
Save the file, and issue the following command:
chmod +x is.logged.in
This assigns the owner of is.logged.in permissions to read, write, and execute the file.
If the current working directory is included in your search path ($PATH), you can execute the file as follows:
$ . is.logged.in
fred console Aug 13 11:28
If the file is not held in a PATH directory, an alternate
notation is required:
$ ./is.logged.in
fred console Aug 13 11:28
The notation ``./'' has the same effect as typing the directory's
absolute pathname.
The program runs the command who, to list currently logged in users on the system, then uses grep to search it for the line containing fred, indicating that fred is logged in.
Suppose you want to use the script to see if people other than fred are logged in. You can modify the script as follows:
who | grep $1The positional parameter $1 (used in place of fred) refers to the first word on the command line after the name of the script. Where $1 is used in the script, it is substituted for the first argument entered on the command line. You use it like this:
$fred is logged in; mary is not logged in. This script gives no output if it cannot find the name you supply it with in the output from who../is.logged.in fredfred console Aug 13 11:28 $./is.logged.in mary$
For example, if your login shell is the C shell, but you want to write scripts for the Korn shell, the first line of your script should be as follows:
#!/bin/ksh
This is a general mechanism: that is, shellpath does not
have to be a shell, but can be any binary program. For example,
awk(C)
is a small programming language used for textual analysis tasks (see
Chapter 13, ``Using awk''
for an introduction to using awk). awk scripts
could start with the following:
#!/usr/bin/awk -fIf the -f flag is omitted, awk will exit with an error. See exec(M) for details of this mechanism.)
When a file has several links, you must remove all of them before you can delete the file itself. Therefore, it is desirable to be able to trace all the links to a given file. You can do this using the inode to search for all the filenames that have that inode number.
To list the inode of a file, use the following command:
ls -i
This lists each inode in the current directory, followed by the filename associated with it:
$ ls -i
1125 0.order.number
2852 0.parts.index
5315 0.order.index
770 00.partno.err
4225 00.partno.out
$
For example, the inode number of 0.parts.index is 2852.
To find the inode of a given file, you could type something like the
following:
ls -i 0.parts.index | awk '{print $1}'
The first part of the pipeline lists the inode of the file called 0.parts.index. The output from this command is fed into awk, which prints the first field, that is, the inode number:
2852However, printing the inode number of a file and using the inode number to do something useful are not the same. We need some way to capture the output of a command.
This can be done using a variable, using the backquote notation
(recognized in the Bourne and Korn shells):
variable=`command`
command is executed, then its output is stored in variable. For example:
$The C shell recognizes the corresponding notation:myinum=`ls -i 0.parts.index | awk '{print $1}'`$echo $myinum2852 $
Having obtained the inode number and stored it in an environment
variable, we can then use it in a find command. For
example:
find / -inum $myinum -print
The find option -inum tells find to look for files matching the inode number stored in the variable myinum. The option -print tells find to print the names of any files it matches. (This command also outputs a list of all the directories it cannot access as it reaches them.)
Now we can write a shell script that, given a filename, searches for all links that point to the same file:
myinum=`ls -i $1 | awk '{ print $1 }'`
find / -inum $myinum -print 2> /dev/null
In summary, the first line assigns the inode of the specified file
(here represented by the positional parameter $1) to the
variable myinum. Note that the second ``$1''
notation in this line is internal to awk, and refers to
the first field of the output piped from ls -i, and not
back to the specified filename.
The second line of the script invokes find, telling it to start at the root directory (/) and search through all the mounted filesystems for files matching the inode number found in the variable myinum, and print their names.
Note that an inode number is only unique within a given filesystem. It is possible for two files with the same inode number to exist on different filesystems, and not be linked together. It is therefore worth checking the output to make sure that all the files output by this script reside on the same filesystem.
find prints a message to the standard error if it cannot
look inside a directory. We do not want to see these error messages,
so the standard error output from find (output
stream 2) is redirected to the device /dev/null; an output
stream sent to this device is silently ignored. Consequently, the
error messages are discarded and a clear, uncluttered output is
produced. (Non-spurious errors are also indiscriminately
discarded. However, in this example all errors are probably
spurious, so discarding all messages is acceptable.)
Passing arguments to a shell script
Any shell script you run has access to (inherits) the environment
variables accessible to its parent shell. In addition, any arguments
you type after the script name on the shell command line are passed
to the script as a series of variables.
The following parameters are recognized:
For example, create the following shell script called mytest:
echo There are $# arguments to $0: $* echo first argument: $1 echo second argument: $2 echo third argument: $3 echo here they are again: $@When the file is executed, you will see something like the following:
$ mytest foo bar quux
There are 3 arguments to mytest: foo bar quux
first argument: foo
second argument: bar
third argument: quux
here they are again: foo bar quux
$# is expanded to the number of arguments to the script,
while $* and $@ contain the entire argument
list. Individual parameters are accessed via $0, which
contains the name of the script, and variables $1 to
$3 which contain the arguments to the script (from left to
right along the command line).
Although the output from $@ and $* appears to be the same, it may be handled differently, as $@ lists the positional parameters separately rather than concatenating them into a single string. Add the following to the end of mytest:
function how_many {
print "$# arguments were supplied."
}
how_many "$*"
how_many "$@"
The following appears when you run mytest:
$ mytest foo bar quux
There are 3 arguments to mytest: foo bar quux
first argument: foo
second argument: bar
third argument: quux
here they are again: foo bar quux
1 arguments were supplied.
3 arguments were supplied.
expr evaluates an expression and prints the result, which can then be captured with backquotes. For example:
$Note the backslash in front of the ``*'' symbol. * is short for multiplication in expr (and many other programs), but the shell treats it as a filename wildcard character and replaces it with a list of matching files unless it is escaped (see Chapter 12, ``Regular expressions'').var=65$result=`expr $var \* 5`$echo $result325 $
expr can also be used to manipulate variables containing text (strings). A portion of a text string can be extracted; for example:
$ expr substr bobsleigh 4 6
sleigh
$
The substr expression returns a substring of its first
parameter (``bobsleigh'') starting at the character position
indicated by its second parameter (the fourth character: the character
is ``s''), of a length indicated by its third parameter (6 characters).
There are many additional options to expr. In general, you
can use expr to search a string for a substring, extract
substrings, compare strings, and provide information about a
string. It can also perform basic arithmetic on integer numbers,
but not on real numbers. For calculations that require decimals or
fractions, you should use a calculator, like bc. (See
``Putting everything together''
for an example of using bc within a shell script.)
Performing arithmetic on variables in the Korn shell
The Korn shell can be told to perform arithmetic using
variables. Because this facility is built into the shell
calculations can be executed faster than by using expr,
which is a separate program that must be forked and exec'ed (see
fork(S)
and
exec(S)).
Although variables are normally treated as strings of characters, the command typeset -i can be used to specify that a variable must be treated as an integer, for example typeset -i MYVAR specifies that the variable MYVAR is an integer rather than a string. Following the typeset command, attempts to assign a non integer value to the variable will fail:
$To carry out arithmetic operations on variables or within a shell script, use the let command. let evaluates its arguments as simple arithmetic expressions. For example:typeset -i MYVAR$MYVAR=56$echo $MYVAR56 $MYVAR=fredksh: fred: bad number $
$ let ans=$MYVAR+45
echo $ans
101
$
The expression above could also be written as follows:
$ echo $(($MYVAR+45))
101
$
Anything enclosed within $(( and )) is
interpreted by the Korn shell as being an arithmetic expression. It
is possible to include variables within such arithmetic expressions;
it is not necessary to prefix them with the usual dollar sign
although no error condition is caused if the dollar sign is used.
If you need to carry out calculations on floating point numbers, it
is necessary to use the binary calculator, bc.
Sending a message to a terminal
There are several methods of producing output in a shell script. The
first, and simplest, is the echo command used in the last
example (see
``Passing arguments to a shell script'').
Note that the echo command exists in four separate forms. Originally, echo was a separate program, /bin/echo: but a version of it is now built into all three shells. There are subtle differences between them, and although the core functionality is the same (the command echo hello always prints the word ``hello'') you should check any special options you use against the relevant shell manual pages. Next, the Korn shell provides the print command. print is more versatile than echo, but cannot be used under the Bourne shell.
Finally, a more sophisticated output mechanism is the printf command. This is similar to the printf command built into awk and the callable function used by the C programming language. See printf(C) for details.
As far as the system is concerned, terminals are just a special type of
file. You send data to a terminal or read data from it just like any
other file.
The echo command
The echo command prints its argument list, separating each
argument with a space, and following the last argument with a
newline. For example:
$ echo Hi there!
Hi there!
$
Variables and file specifications are expanded by the shell before
being passed to echo. Consider the following command:
This prints the specified text string before producing a listing of all the files in the current working directory, across the screen.
echo recognizes a number of escape sequences which it expands internally. An escape command is a backslash-escaped character that signifies some other character. The ones recognized by echo are common throughout the shell syntax, as follows:
$ echo The available files are \n *
The available files are
aaaa bbbb cccc dddd eeee
Here, the escape sequence only is quoted. Otherwise, the whole
string can be quoted:
$For example, see the following echo command:foo="a\ty"$echo $fooa y $
$ echo "Mary had a little lamb \n \t Its fleece was white as snow"
Mary had a little lamb
\(t1 Its fleece was white as snow
The \n escape causes echo to emit a newline, and the \t
escape causes echo to emit a tab.
You can redirect the output from echo. For example, the who and w commands list the users on your system and the terminals they are logged in on. To send a message to a terminal being used by someone else, you can use a command like the following, if /dev/tty015 is the name of the terminal you want to print a message on:
$ echo Hi there! > /dev/tty015
(Note that this is not the best way to send messages between
terminals;
write(C)
and
talk(TC)
are commands intended for this purpose, and allow two-way
conversation.)
The -u option is equivalent to redirecting the standard output, but doesn't open or close the destination file. This is particularly useful if you have opened some files in ksh and want to write data to them (for later reading with the read command); see ``More about redirecting input and output''.)
The basic shell syntax for redirecting input and output is as follows:
$This effect is particularly useful when appended to a command that generates copious but unwanted error messages; it sends the output from file descriptor 2 (the standard error) to /dev/null, the ``bit bucket'' or ``black hole'' device. (/dev/null is also known as the null device; if you send data to it, it absorbs it silently, and if you read from it all you get is a null character.)cat thingcat: cannot open thing: No such file or directory $cat thing 2> /dev/null$
Other useful fragments are:
To append output to the end of an existing file, use the ``>>'' notation instead.
If you want to permanently prevent the Korn shell from destroying an existing file when you use the ``>'' redirection operator, adjust the shell parameter noclobber by issuing the command set -o noclobber. If the shell finds that a file it is writing to already exists, it will issue an error message and refuse to overwrite it, as follows:
$ cat aaa > bbb
ksh: bbb: file already exists
Once noclobber is set, you have to redirect using the
override command, >! (instead of >) if you want
to overwrite it.
The << operator has a special meaning: it is used to tell the shell to read its standard input from the current script. For example, if you have a shell script containing the line:
<<terminating_string
:
:
Everything from that line down, until it encounters a line with just
``terminating_string'' on it will be taken as a here
document, a file which is treated as the standard input. So, to
send a multiline message to the screen, instead of using
print or echo you could embed a help message in
your script:
_help()
{
cat <<%%
Readability Analysis Program
A shell/awk demo to determine the readability grade of texts
Either run rap with no options for full menu-driven
activity, or use the following flags:
-[h|H] prints this help
-l cause output to be logged to a file
-f file enter the name of the file to check
-b run in batch mode (no menus)
%%
exit 1
}
This defines a function called _help within a shell
script. When the script subsequently encounters the command
_help it will cat the text between two sets of
``%%'' symbols to the standard output, then exit.
Scripts running under the shell may have many file descriptors in use simultaneously. Some programs may not be able to deal with reading and writing lots of redirected file descriptors: other programs expect to read a filename on their command line, rather than look for redirected input.
To get round this, you can use the special files /dev/stdin, /dev/stdout, and /dev/stderr; see ``Forcing a program to read standard input and output'' for an example of this.
The following example shows an instance of extracting streams of
information from one file and placing them in two different output
files using only one pipeline, as follows:
2>second_field; cat myfile | awk '{ print $2 > "/dev/stderr"; print $1 }' | sort
The first command on this line attaches the standard error to a file
(in this instance second_field). The input file
myfile is then piped into an awk program. The
awk program prints the second field of every line to
/dev/stderr, the standard error, and prints the first
field of every line to the standard output. Because the standard
error has been redirected, the second field of each line ends up in
second_field, while the first fields are sorted and
presented on the standard output.
Getting input from a file or a terminal
In addition to printing information on the screen and redirecting
the output from commands, you will almost certainly want to let your
scripts prompt you for information, and make use of that
information. The Bourne and Korn shells both provide the
read command, which is the inverse of print or
echo; it reads a line from a file (using the standard
input as a default) and stores the successive words in the line in a
series of shell variables which you specify on the command line. If
you don't specify enough variables to hold all the words on the line
read by print, all the remaining words will be stored in
the last variable you name.
For example, suppose we use the following script to get a line of input from the terminal:
print Hi there! Please type a line of text.\n read foo print $fooWhen you run the script, it prompts for a line of text, and reads it all into the variable foo. The next line then prints the contents of foo. (Remember, to the shell, $foo means ``the contents associated with the variable named foo'', but foo on its own is simply a name; so the command print foo will output the word ``foo'', rather than the contents of the variable foo. This is a common pitfall when you start programming the shell.) For example, if the script above is called getline:
$The Korn shell provides a shorthand notation for this, as follows:./getlineHi there! Please type a line of text.This is a test.This is a test. $
read 'foo?Hi there! Please type a line of text. 'This is equivalent to the following:
print Hi there! Please type a line of text. read fooText up to the question mark is interpreted as the name of a variable in which the input is stored: text after the question mark is used as a prompt.
To read two words into different variables, you might use a script like the following:
print Hi there! Type two words then press enter.\n read foo bar print The first word I read was $foo print and the second was $barIf you type three words when you run this script, instead of two, the last two words will appear in the second variable. For example, if the script is called getwords:
$When you use the Korn shell (but not the Bourne shell) read takes a number of options. These are as follows:./getwordsHi there! Type two words then press enter.hello yourself, program!The first word I read was hello and the second was yourself, program! $
Here is a simple function to obtain a keystroke:
getc ()
{
stty raw
tmp=`dd bs=1 count=1 2>/dev/null`
eval $1='$tmp'
stty cooked
}
To use it, insert it at the top of your shell script, then invoke it
lower down the shell script:
echo "Enter a character: \c" getc char echo echo "You entered $char"getc puts the terminal into raw mode. Instead of passing your input through to the system a line at a time, the terminal now passes each keystroke you type straight through, unmodified.
The dd command reads a single character from the standard input and writes it to the standard output, that is captured in the variable tmp. The next line is used to assign the literal contents of tmp to the variable named by $1. The eval command in front of this line is necessary to force the shell to scan the line twice; once to expand $1 into the name of a variable, and again to carry out the actual command. The quotes around $tmp are stripped off by eval; if you omit them, then if your character is a whitespace character, it will be lost.
Afterwards, getc puts the terminal back into normal operating mode with the command stty cooked (or stty -raw, or stty sane).
We can write getc more succinctly like this:
getc ()
{
stty raw
eval $1=´`dd bs=1 count=1 2>/dev/null`´
stty cooked
}
Because getc returns a single character in whatever
variable you specify, you can use it flexibly. For example, the
following function can be used to make a program pause until you are
ready for it to continue:
press_any_key()
{
echo "Strike any key to continue ...\c"
getc anychar
}
Combine the two functions in a script called char_handler,
as follows:
getc ()
{
stty raw
eval $1=´`dd bs=1 count=1 2>/dev/null`´
stty cooked
}
press_any_key()
{
echo "Strike any key to continue ...\c"
getc anychar
}
echo "Enter a character: \c"
getc char
echo
echo "You entered $char"
press_any_key char
echo \r
Execute char_handler as follows:
$./char_handlerEnter a character:xYou entered x Strike any key to continue ...y$
To open files for reading, use the exec command. exec causes the commands following it on the line to be executed immediately without invoking a sub-shell. The command to be execed overlays the shell process, and when it terminates control returns to the parent of the process that carried out the exec.
You can use exec to attach new files to the input and
output file descriptors of the current shell process. For example,
to open a file called newscript as standard input to the
current shell, use the following command:
exec <newscript
newscript should be executable and contain the following line:
echo "Hello world!"In this case, exec forces newscript to be opened as standard input, then causes its contents to be executed.
To open file1, file2 and file3
for input as file descriptors 1, 4 and 5 respectively, use the
following:
exec 1< file1 4< file2 5< file3
Note that there is an anomaly in the Korn shell when opening file
descriptors using exec. Although the Bourne and Korn
shells allow you to open any recognized file descriptor for input or
output, the Korn shell closes them immediately after executing the
command line (with the exception of file descriptors 0, 1 and 2:
standard input, standard output, and standard error). The C shell
does not allow you to redirect or attach file descriptors: this is
one of its major shortcomings.
What to do if something goes wrong
If your shell script stubbornly refuses to work, there are two
possibilities:
Another common error is to give your file the same name as an existing command. If the current directory (.) precedes the directory in which the synonymous command exists in your PATH, your script will be used instead of the command whenever you call it; on the other hand, if the directory in which the command exists is before ``.'' in your PATH, the command will be executed instead of your script.
Consider the following search path:
/bin:/usr/bin:/u/charles/bin::/usr/sco/bin:/u/bin:
For example, if you create a script called test in the current directory, and you attempt to execute it by typing the command test, the shell will search along your path and execute /bin/test instead of ./test (pointed to by the fourth, null, field in the path).
Try to avoid giving your scripts a name already used by a SCO OpenServer utility. A quick way to test a proposed name is to invoke man on it; if man provides a manual reference, it is a bad idea to use the name. It is also worth checking the relevant manual page for the shell you are using, in case your script shares a name with a built in shell command.
Another common problem is to invoke a script under the wrong
shell. To ensure that the script is always run by the correct shell,
use the hash-bang notation (#!) on the first line of the
script to specify which shell to use (See
``Running a script under any shell'').
Solving problems with your script
Even if your environment is set up correctly, any long script that
you write will almost certainly fail to work correctly under some
circumstances. This may be due to a failure to consider all the
conditions under which the script may be run, or due to an oversight
or syntax error in the script. The best way to get used to creating
small to medium sized shell scripts is to do the following:
The Bourne shell's equivalent is as follows:
set -x
The xtrace option causes the Korn shell to list each command after it has been expanded, but before it has been executed. This enables you to catch any errors due to alias substitution, wildcard expansion, or quote stripping.
(The set -o command can be used to reset the Korn shell's startup options from within a running shell; type set -o for a listing of the current option states, then use set -o option to switch option on, or set +o option to switch option off.) The set - command will also turn off the xtrace facility.
Another useful technique is to use print as frequently as possible, to let you know what your script thinks it is meant to be doing. Print the contents of variables before and after you change them, along with a message to explain what kind of operation you are carrying out. Better still, make print send this output to a log file. The file provides you with a permanent record of what happened during a test run of the script.
An important rule to bear in mind if your script fails is not to change more than one thing at a time between test runs. Errors are eliminated by making a single change to a script, running it, and seeing how it behaves, then trying to deduce where the error is coming from. Randomly changing your script will make it much harder to pinpoint the source of errors and is unlikely to eliminate them.
Here is an extended example that demonstrates these techniques and
introduces some new concepts.
Writing a readability analysis program: an example
For the rest of this chapter, and at intervals in the following
chapters, we will refer to a single recurring example: a program to
analyze the readability of text files. Such a program needs to
identify the files it is to work on. It must open them, use several
other programs to obtain information about the files, then print the
results. It also serves as a demonstration of several useful
techniques: notably, how to built a simple menu driven program, how
to build up complex regular expressions, and how to integrate
awk scripts and other programming languages into shell
programs.
The objective of a readability analysis program is to scan a file or files of text, and report various statistics about their internal complexity. There is more to this than just running wc; we want to generate a report on such things as the number of sentences in a file, the average length of each sentence, the average number of syllables per word, and the readability grade of the file. It would be useful to be able to invoke the program from the shell prompt with a variety of options: it would also be useful to provide the program with a menu driven front end. All these tasks, and more, will be explained as we encounter them in building up our example.
The first step in writing a large program is to analyze what it is
intended to do: what its inputs are, and what its outputs are
expected to be. We can then write a ``skeleton'' for it: a script
that does not actually do anything to the data, but ensures that all
the pieces are in place. (The actual task of analyzing a file for
readability can be farmed out to a function that we will fill in
later.) This is described below.
How to structure a program
In general, there are two types of program: batch programs, and
interactive programs. The internal structures of batch and
interactive programs differ considerably.
A batch program is a typical SCO OpenServer filter. You run it by specifying a target file (and optional flags) at the shell prompt: it runs, possibly prints messages to the standard output, and exits.
An interactive program prints a menu. You select options from the menu: the program then changes its internal state, and prints another menu, until it has assembled all the data it needs to select and execute a routine that carries out some task. It does not exit until you select a quit option from some menu.
Interactive programs are harder to write, so we will start by looking at a short batch program. An explanation of the program follows the code:
1 : #!/bin/ksh
2 : #-----------------------------------------------------
3 : #
4 : # rap -- Readability Analysis Program
5 : #
6 : # Purpose: skeleton for readability analysis of texts.
7 : #
8 : #------------- define program constants here ----------
9 : #
10 : CLS=`tput clear`
11 : HILITE=`tput smso`
12 : NORMAL=`tput rmso`
13 : #
14 : #---------- initialize some local variables -----------
15 : #
16 : SCRIPT=$0
17 : help='no'; verbose=' ' ; record=' '
18 : log=' ' ; next_log_state=' ' ; batch=' '
19 : file=' '; fname=' '
20 : #
21 : #----------------- useful subroutines -----------------
22 :
23 : do_something()
24 : {
25 : # This is a stub function; it does not do anything, yet,
26 : # but shows where a real function should go.
27 : # It contains a dummy routine to get some input and exit.
28 : echo
29 : print "Type something (exit to quit):"
30 : read temp
31 : if [ $temp = "exit" ]
32 : then
33 : exit 0
34 : fi
35 : }
36 :
37 :
38 : _help()
39 : {
40 : echo "
41 :
42 : ${HILITE}Readability Analysis Program${NORMAL}
43 :
44 : A shell/awk demo to determine the readability grade of texts
45 :
46 : Usage: $SCRIPT -hHlb -f <file>
47 :
48 : Either invoke with no options for full menu-driven
49 : activity, or use the following flags:
50 :
51 : -h or -H prints this help
52 : -l log output to file
53 : -f file name of file to check
54 : -b run in batch mode (no menus)
55 :
56 : "
57 : }
58 : #
59 : #
60 : TrapSig()
61 : {
62 : echo ""
63 : echo "Trapped signal $1...`c"
64 : }
65 : #
66 : #========== START OF MAIN BODY OF PROGRAM ============
67 : #
68 : #------------ define program traps -------------------
69 : #
70 : for foo in 1 2 3 15
71 : do
72 : trap "TrapSig $foo" $foo
73 : done
74 : #
75 : #---------- parse the command line---------------------
76 : #
77 : mainline=$*
78 : echo ""
79 : while getopts "hHvlbf:" result
80 : do
81 : case $result in
82 : h|H) help=yes ;;
83 : v) verbose=yes ;;
84 : l) record=yes
85 : next_log_state=off
86 : log=ON ;;
87 : b) batch=yes ;;
88 : f) file=yes
89 : fname=$OPTARG ;;
90 : *) help=yes ;;
91 : esac
92 : done
93 : shift `expr ${OPTIND} - 1`
94 : if [ $help = 'yes' ]
95 : then
96 : _help
97 : exit 1
98 : fi
99 : #
100 : #---------- enter the main program ---------------------
101 : #
102 : while :
103 : do
104 : do_something
105 : done
(Line numbers are provided for reference only, and are not part of the program.)
At first sight this appears to be quite a complicated program, but most of it is used to set up some facilities which will be useful later. The real start of the program is line 10:
09 : #------------- define program constants here ---------- 10 : CLS=`tput clear` 11 : HILITE=`tput smso` 12 : NORMAL=`tput rmso` 13 : # 14 : #---------- initialize some local variables ----------- 15 : # 16 : SCRIPT=$0 17 : help='no'; verbose=' ' ; record=' ' 18 : log=' ' ; next_log_state=' ' ; batch=' ' 19 : file=' '; fname=' ' 20 :
Text following a ``#'' is ignored by the shell. This comes in useful when you want to leave comments in your program for other users.
Lines 10 to 20 set a number of variables. These variables are only used while the program runs: when the script ends, they will not be made available to its parent shell. One set, CLS, HILITE, and NORMAL, are constants; they are not changed during the execution of the program. The second set are variables that the program may use. We initialize them (to a string containing a single <Space> character) in case they have some other meaning within the parent shell from which the script is executed.
It is worth considering lines 10-12 in more detail. Lines of the form variable=`tput mode` use the command tput(C) to obtain the codes necessary to put the terminal into some special mode, for example reverse video mode, or to restore it to normal.
All terminals have the capability to carry out some basic actions when they receive a corresponding control code: for example, positioning the cursor, switching to reverse video, and clearing the screen. Because different terminals use different control codes, the system terminfo database maintains a table of the codes to use for a given capability on any specified terminal. These capabilities are assigned symbolic names, and the terminfo database matches the name to the escape code for each terminal.
tput takes a terminal ``capability'' name and returns the escape sequence to use for the current terminal. In this program, we capture the output from the tput command in a variable for later use. Once you have the control code for a given capability, you can echo the code to your terminal and it will enter whatever mode you specified.
We are using three special terminal-dependent capabilities here:
Lines 23 to 56 define two functions: a stub (which does nothing useful), and a help routine. The stub simply shows where a more complex function will go, when we have written it. (At present, it prompts for an input string; if you type exit the script terminates.) The help routine is similar to the one we looked at in ``More about redirecting input and output''. If it is called later in the script it prints a message and exits, terminating the script. Note the use of the variable $SCRIPT in the help function. SCRIPT is initialized to whatever the name of the function is, when it is executed. (It is used here in case someone renames the script, so that the usage message reflects the current name of the program.)
Note that before you can call a function, it must have been defined
and the shell must have read the definition. Therefore, functions
are defined at the top of a shell script and the actual program
(that calls them) is right at the bottom.
Making a command repeat: the for loop
Lines 70 to 73 allow our script to survive if it receives a
signal. Interactive scripts frequently do this, but batch scripts
rarely do so. First, we provide a function to handle signals if any
are received. It expects a parameter, $1, that tells it
the number of the signal. All the example below does at present is
to echo the number of the signal and exit, but later on we will show
how it can be used to resume control of the program if something
goes wrong. The signals are caught by the traps set up in lines 56
to 59:
for foo in 1 2 3 15 do trap "TrapSig $foo" $foo doneThis is an example of a for loop.
A for loop is a mechanism for repeating an operation for
every item in a set. The general structure of a for loop
is as follows:
for variable in list
do
command
command
.
.
.
done
In the example, variable is set in turn to each value in the list (a collection of items from the command in to the end of the line). All the commands between do and done are carried out, for each successive value of variable. So in the example, the variable foo is set to 1 and the trap command is carried out; then foo is set to 2, then 3, and so on.
You can assign strings such as filenames to variables in a for loop. This enables you to use for loops to apply several commands in order to every file in a directory, or to iteratively work through a list of words (for example, invoking mail to send a personalized message to each of a list of recipients).
The loop in the example script from line 56 to line 59 is equivalent to writing the following:
foo=0 trap "Error $foo" $foo foo=2 trap "Error $foo" $foo foo=3 trap "Error $foo" $foo . . .Each time the body of the loop (the part from do to done) is executed, it sets a trap for a signal (the number of which is set by the for statement).
You can use for loops with wildcards to select files. For example:
for target in *
do
cp $target ../$target
echo Copied $target
done
When the shell reads the first line, it expands the ``*''
into a list of all files in the current directory. Then, for each
named file, the commands in the body of the loop are executed.
For example, we might want our program to respond to any of the
following:
prog -h
prog -H
prog -v
prog -f filename
To handle command line options, we need a means of distinguishing between parameters that are filenames, and parameters that are flags.
To use getopts, first establish the various flags the program is to understand. For example, for the above syntax, the options are hHvf:. The colon after the ``f'' indicates that the ``f'' is to be followed by an additional parameter (such as a filename).
For example:
79 : while getopts "hHvlbf:" result 80 : do 81 : case $result in . . .Each time the while loop runs, getopts is invoked, scans the parameters to the script, and places the first new option it finds in a special variable called result. The index number of the next shell argument to process is placed in another special variable called OPTIND, and if the flag has an optional argument (like the f: option above) the argument is placed in OPTARG. If getopts cannot find an option, it exits with a non-zero (or failure) exit value.
It is up to the shell script to retrieve all the options from a
parameter list. So optargs is usually used in a structure
called a while loop, explained below.
Repeating commands zero or more times: the while loop
A while loop differs from a for loop in that a
for loop is executed a set number of times (for each item
in its list), but a while loop is repeated indefinitely,
or until some condition ceases to be true. The general format of a
while loop is as follows:
while condition
do
command
.
.
.
done
The condition is a command or test of some kind. (For an explanation of tests, see ``Different kinds of test''.) If it exits with an exit value of 0, implying success, the commands in the body of the do loop are carried out; if it failed (has a non-zero exit value) the loop is skipped and the script continues to the next line.
Note that there is no guarantee that the commands in the body of the
loop will ever be carried out. For example:
while [ "yes" = "no" ]
do
some_command
.
.
.
done
some_command will never be carried out, because the test
[ "yes" = "no" ] always fails. On the other hand, the
opposite effect can occur:
while [ "yes" ]
do
some_command
.
.
.
done
Because the literal string ``yes'' exists, test returns
true all the time, so the loop repeats endlessly.
Repeating commands one or more times: the until loop
It is sometimes necessary to execute the body of a loop at least
once. Although the while loop provides the basic looping
capability, it does not guarantee that the body of the loop will
ever be executed because the initial test may fail. For example, the
body of the loop in the example above will never be executed because
the test condition is always false.
We could make sure that the body of the loop was executed at least
once by duplicating it before the while statement, like
this:
some_command
while [ "red"="blue" ]
do
some_command
done
However, this is prone to error when the loop body contains a lot of
commands. Luckily the shell gives us a different type of looping
construct: the until loop. An until loop looks
very similar to a while loop; the difference is that the
loop body is repeated until the test condition becomes
false, rather than while it remains true.
For example, this loop will repeat infinitely, because the test
always returns a non-zero (false) value:
until [ "red"="blue" ]
do
some_command
done
By carefully choosing our test, we can ensure that the body of an
until loop will be executed at least once: to do so, we
must make sure that the test parameter is false. For example:
leave_loop="NO"
until [ leave_loop="YES" ]
do
some_command
.
.
.
leave_loop="YES"
done
The body of this loop will be executed at least once. If we change
the until on the second line to a while, the
loop will never be entered.
Making choices and testing input
To handle the command line options to our script, lines 79 to 93 run
getopts in a while loop. As long as
getopts continues to return an option, the body of the
loop is executed: when getopts can no longer detect any
options, the while loop fails. shift is then
used to discard the options.
Embedded in the loop to get options, we see another kind of statement: a case statement. Immediately after it, on lines 81 to 83, we see an if statement. These are both mechanisms for choosing between two or more options. if depends on the return value of a test condition; case operates by matching patterns.
When we need to repeat an operation a variable number of
times, we must check after each repetition
to determine whether it has produced the desired
result. If not, we may need to repeat the task again: otherwise, we
may want to do something else. The if statement allows us
to choose between alternative courses of actions; the test
or [ ... ] command allows us to check whether a condition
holds true. (The case statement can be used as a
generalized form of if statement, for choosing between
many options. We will deal with it later.)
Choosing one of two options: the if statement
The simplest form of if statement is illustrated on lines
81 to 84:
94 : if [ $help = 'yes' ] 95 : then 96 : _help 97 : exit 1 98 : fiThe statement following if is evaluated. If it is true (that is, if it returns a value of 0), the body of the if statement (from then to fi) is carried out. If it is nonzero, the body of the if statement is skipped.
if has the following structure:
if condition
then
commands executed if condition succeeds
fi
An alternative structure is the following:
if condition
then
commands executed if condition succeeds
else
commands executed if condition does
not succeed
fi
The following structure is also valid:
if condition1
then
commands executed if condition1 succeeds
elif condition2
then
commands executed if condition2 succeeds
fi
condition is a command that returns an exit value: zero if successful or some other value if it failed. The if command carries out test, then executes the series of commands (from then to else or fi) if and only if test returned a value of ``0'' or TRUE. (fi is the command denoting the end of an if construct.)
If the if command contains an else portion, the commands between else and fi are only carried out if the test returns a result other than TRUE; that is, if the test statement fails.
If the if statement is followed by an elif, the elif statement is carried out if the condition tested by the previous if statement fails. An elif statement is otherwise identical with an if statement.
The following two lines of code have the same effect:
if [ $answer = 'y' ]
if test $answer = 'y'
If the test succeeds, indicating that the value of answer
is ``y'', then the first set of commands is carried
out. Otherwise, the else ... fi section of the script is
executed.
Different kinds of test
In general, tests are carried out either by enclosing them in square
braces (as above) or by using the command
test(C).
The most useful tests are as follows:
if who | grep -e "$1" > /dev/null
then
print -- $1 is logged in
fi
In this example, the output from who is piped to
grep. The if statement tests the output from the
pipe, which is the value returned by grep. grep
returns 0 if it finds the target string, or a non-zero value if it
fails.
This example is therefore equivalent to a test that returns
TRUE if a string is present in a given file.
The && and || operators
There are two compact versions of the if test which you
may see from time to time; these tests operate on a single statement
and determine whether a subsequent command is to be executed. They
are && (AND IF) and || (OR
IF). These operators evaluate $? for the previous
command. && executes the following command if the previous
command succeeded; || executes the following command if
the previous command failed. Note that the execution of the second
command is entirely dependent on the result of executing the first
command. Thus, if you write a line with two or more of these
operators, each command is executed in turn along the line until one
of them results in a test failing.
For example, the test to see if a given user is logged on could be
written as follows:
who | grep -e "$1" || echo "$1 is not logged on"
A who listing is piped to grep, which searches for the subject (whose name is the first argument to the script). The OR IF test examines the returned value from grep. If grep failed (that is, if the user is not logged on), a message is printed. If grep succeeded and returned ``0'', no message is needed because grep printed the line from the who listing.
In general, you can use || to execute a command when the previous command has failed, and you can use && to execute a command if the previous command has succeeded.
For example, take the command:
compress $1 || print "Something went wrong compressing $1"
The program compress is executed. When it finishes, its exit value $? is tested by ||. If it is non-zero, the error message is printed.
This compares with the other command:
compress $1 && print "Finished compressing $1"
If the exit value of compress is 0, the message is printed.
A common problem when using the && and || operators is to assume that they are equivalent to the logical operators provided by other programming languages. In fact, these operators are conditional constructs that evaluate strictly from left to right. Consequently it is hazardous to use them for evaluating logically true or false values (like the && or || operators in C). These operators are not strictly equivalent to if ... else ... fi either. For example, the following short script determines if someone is logged in:
if who | grep $1 >/dev/null
then
echo $1 is logged in
else
echo $1 is not logged in
fi
Using the && and || operators, we might be
tempted to rewrite this script more succinctly as follows:
who | grep $1 >/dev/null && echo $1 is logged in || echo $1 is not logged inHowever, this version will execute the second echo incorrectly if the pipe (who | grep $1) fails. The if ... else ... fi version, in contrast, does not exhibit this behavior (despite looking superficially similar in logical terms).
81 : case $result in 82 : h|H) help=yes ;; 83 : v) verbose=yes ;; 84 : l) record=yes 85 : log=off 86 : LOG=ON ;; 87 : b) batch=yes ;; 88 : f) file=yes 89 : fname=$OPTARG ;; 90 : *) help=yes ;; 91 : esacThe case command is followed by a variable. This is tested against each of the options in turn, until the esac statement (signifying end of case) is reached.
In addition to setting variables, you can use branches of a case construct to call functions or exit. (An exit statement is used to exit from the current script.)
case statements are not essential to writing scripts that can handle multiway choices, but they make things easier. Consider the following alternative:
if [ ${result} = "h" ]
then
help=TRUE
else
if [ ${result} = "H" ]
then
help=TRUE
else
if [ ${result} = "v" ]
then
verbose=TRUE
else
if [ ${result} = "l" ]
then
record=TRUE
log=off
LOG=ON
else
if [ ${result} = "b" ]
then
batch="yes"
else
if [ ${result} = "f" ]
then
file=TRUE
fname=${OPTARG:-unset}
else
help=TRUE;;
fi
fi
fi
fi
fi
fi
This compound if statement does exactly the same thing as
the earlier case statement, but is much harder to read and debug.
The general format of a case construct is as follows:
case $choice in 1) # carry out action associated with selection 1 . . . ;; 2) # carry out action associated with selection 2 . . . ;; 3) # carry out action associated with selection 3 . . . ;; 4) # carry out action associated with selection 4 . . . *) # carry out action associated with any other selection . . . ;; esacThe case command evaluates its argument, then selects the matching option from the list and executes the commands between the closing parenthesis following the option and the next double semicolon. In this way, only one out of several possible courses of action can be taken. case tests the argument against its options in order, from top to bottom, and once it has executed the commands associated with an option it skips all the subsequent possibilities and the script continues running on the line after the esac command.
To trap any possible selection use an option like:
*) # match any possible argument to case . . . ;;The * option matches any possible argument to the case construct; if no prior option has matched the argument, the commands associated with the * option are automatically carried out. For this reason, the *) option should be placed at the bottom of the case construct; if you place it at the top of the construct, the * option will always be executed before the shell has a chance to check any other options.
There is no effective size limit to a case construct, and
unlike an if ... then ... elseif cascade the construct is
``flat''; that is, it is an indivisible structure, and there is
consequently no difficulty in working out which construct is being
evaluated.
Generating a simple menu: the select statement
Although not used in the readability analysis sample program, the
select statement can be used to simply generate menus. It
is restricted to the Korn shell, and has no equivalent in the Bourne
and C shells. It has the following syntax:
select name [in list]
do
statements # statements use $name
done
The in list construct can be omitted, in which case, list defaults to $@ (see ``Passing arguments to a shell script'').
The select statement generates a menu from the entries in list, one per line, with each preceded by a number. It also displays a prompt, by default a hash sign followed by a question mark (#?). The user's response to the prompt is stored in the variable name; on the basis of the value of $name, the appropriate statement is executed. select then prompts for another choice, unless an explicit break command causes the loop to terminate.
The following trivial sample code illustrates select in use:
print "Choose a dinosaur:" select dino in allosaurus tyrannosaurus brontosaurus triceratops do case $dino in allosaurus ) print "Jurassic carnosaur" ;; tyrannosaurus ) print "Cretaceous carnosaur" ;; brontosaurus ) print "Jurassic herbivore" ;; triceratops ) print "Cretaceous carnosaur" ;; *) print "invalid choice" ;; esac break doneThe following shows the code in use (the program is called dino_db):
Choose a dinosaur: 1) allosaurus 2) tyrannosaurus 3) brontosaurus 4) triceratops #?
Given the size of the skeleton structure we have already created, it might look as if it will take a lot of work to make it do anything useful. However, surprisingly little additional programming is needed.
As a first step towards writing a style analysis program, it would be useful to know how many words, characters and lines there are in the target file. We can use wc to obtain this information for any given file; we can also use backquotes to capture the output and process it.
To add word counting to our program, all we need to do is change the following lines:
23 :
24 : do_something()
25 : {
26 : wordcount=`wc -w ${fname} | awk '{ print $1 }'`
27 : lines=`wc -l ${fname} | awk '{ print $1 }'`
28 : chars=`wc -c ${fname} | awk '{ print $1 }'`
29 : echo "File ${fname} contains:
30 : ${wordcount}\t\twords
31 : ${lines}\t\tlines
32 : ${chars}\t\tcharacters "
33 : }
The main task of the program is to call the function
do_something. This function runs wc, pipes the
output through a short awk command, and traps the result
in a variable; then it prints a formatted report.
For example:
$The awk program '{ print $1 }' prints the first field on every line awk reads from the standard input. This is a typical awk program: short, integrated into a shell script, and used to carry out a transformation on a stream of text. For more information on using awk, see Chapter 13, ``Using awk''.rap -f rapFile rap contains: 243 words 95 lines 1768 characters
$
The important point to note here is that by encapsulating the functionality of the program in a subroutine (the function do_something) we have made it a lot easier to change the program. (Ideally do_something would be written as three separate functions, to count words, lines, and characters. However, because it is comparatively short it is presented here as a single unit.)
We can make our program do something else entirely, simply by
modifying do_something and changing the help text in
_help. Most of the program is actually a skeleton that we
can use to hang useful subroutines off: you can reuse it as a
starting point for your own batch mode shell scripts.
Making menus
Starting from our current example, it is not difficult to turn the
script into a fully interactive program with menus. We have already
seen most of the structures we need: all that is necessary is to put
them together in a different order.
The general structure of a batch mode script is as follows:
Define constants (variables that will not change)
Define functions (routines to handle specific jobs)
Set traps
get command line options with getopts
use options to set control variables
for all in $*
do
some_function()
done
The only element of repetition is the loop at the end, which repeats
for each file passed to the script as an argument.
A menu driven script behaves differently:
Initialize variables and define functions
Repeat (until some ``exit'' state is reached)
{
Display a menu
Get the user's choice
Do something with the choice (change state or call function)
}
On ``exit'' close files and quit
This process, an endless loop, is called a mainloop. The menu is
displayed, then a function like getc (described in
``Reading a single character from a file or a terminal'')
is used to retrieve a single keystroke. Such a function may either
grab the first key the user presses, or let them correct the entry
and press <Enter> before accepting input. (There are arguments for and
against both strategies. In general, you should always give your
users an opportunity to check their input, and correct any mistakes
they may have made.)
Depending on the value of the key, an option is selected from a case statement. Each option either sets a variable, or calls a function (called a callback) which does something in the background, ``behind'' the menu. Finally, if the option to quit is selected, the break statement is executed to quit the loop.
Here is part of a menu based script, containing the mainloop:
282 : done
283 : if [ $help = "yes" ]
284 : then
285 : _help
286 : exit 1
287 : fi
288 : if [ $batch = "yes" ]
289 : then
290 : analyze
291 : exit 0
292 : fi
293 : #
294 : #---------- enter the mainloop ------------------------
295 : #
296 : while :
297 : do
298 : echo $CLS
299 : echo "
300 :
301 : ${HILITE}Readability Analysis Program${NORMAL}
302 :
303 : Type the letter corresponding to your current task:
304 :
305 : f Select files to analyze [now ${HILITE}$fname${NORMAL} ]
306 : p Perform analyses
307 : l switch ${next_log_state} report logging [now ${HILITE}$log${NORMAL}]
308 : q quit program
309 :
310 :
311 : =======>"
312 : getc char
313 : case $char in
314 : 'f') getloop=1
315 : get_file ;;
316 : 'p') analyze
317 : strike_any_key ;;
318 : 'l') toggle_logging ;;
319 : 'q') break ;;
320 : *) continue ;;
321 : esac
322 : done
323 : clear
324 : exit 0
The first part of this extract, lines 283 to 292, check to see
whether help is to be printed, or the script is to be run in batch
mode: if the answer to the latter question is yes, a function called
analyze is called and the script exits without presenting
a menu. Then we see the mainloop, from line 284 to
324. $endloop is initially set to NO, so the
test at the top of the loop evaluates to true: therefore the body of
the do loop is executed at least once.
Within the loop, a menu is printed and then the script waits for the user to press a key. The character that is read is used to trigger a case statement (lines 312 to 321) that either modifies the state of some variables, or calls a function (like analyze, which does the analysis work, or getfile, which prompts the user for the name of a file to work on, or strike_any_key, which prints a message like ``Press any key to continue'').
Note the use of reverse video in the menu to emphasize important information. In general, you should try to make menu driven interfaces guide the user through to the next step in an intuitive and natural manner. One way of doing this is to highlight the important default information (like the file to be processed), in close proximity to the option that changes it (like the option to select a file to analyze).
Also worth noting is the use of ``toggle'' variables, that switch an additional feature on or off. The variables $log and $next_log_state perform this function for logging. They are switched within a separate function, toggle_logging:
83 : toggle_logging ()
84 : {
85 : log=$next_log_state
86 : case $log in
87 : ON) next_log_state=OFF ;;
88 : OFF) next_log_state=ON ;;
89 : esac
90 : }
log indicates whether output is to be logged to a file;
next_log_state is used in a message display that tells the
user whether they can switch logging on or off. (By definition,
next_log_state and log must be in opposite
states at all times.)
It is very easy for a mainloop to become too big to read. For this reason, any task that has more than one step is farmed out to another function. This includes the display of submenus. For example, get_file uses a menu to select a file to check:
145 : get_file()
146 : {
147 : while :
148 : do
149 : echo $CLS
150 : echo "
151 :
152 : ${HILITE}Select a file${NORMAL}
153 :
154 : Current file is: [${HILITE} $fname ${NORMAL}]
155 :
156 : Type the letter corresponding to your current task:
157 :
158 : [space] Enter a filename or pattern to use
159 : l List the current directory
160 : c Change current directory
161 : q quit back to main menu
162 :
163 :
164 : =======>"
165 : getc char
166 : case $char in
167 : ' ') get_fname ;;
168 : 'l') ls | ${PAGER:-more} ;;
169 : 'c') change_dir ;;
170 : 'q') break ;;
171 : *) ;;
172 : esac
173 : strike_any_key
174 : done
175 : }
This function contains a couple of features that do not appear in
the mainloop. Notably, it calls a routine for changing directory, a
routine for getting a filename, and lists the contents of a
directory (using the pager indicated by the environment variable
PAGER, or more if PAGER is not set).
This assigns the value of newvalue to $value. But there are times when we want to provide a default option, in case $newvalue is bogus (for example, if the user accidentally pressed <Enter> instead of entering a name). An assignment of the form variable=${value:-default} assigns value to $variable if it is set: otherwise it assigns default to $variable. In the example above, the variable ${PAGER:-more} is expanded to either the value of $PAGER, or if this is not set, to more.
For example, here is get_fname:
94 : get_fname ()
95 : {
96 : echo "Enter a filename: \c"
97 : read newfname
98 : fname=${newfname:-${fname}}
99 : }
At the beginning of the script (we have not yet looked at this in
detail) fname is set to `` '' (a space character). So if
the user fails to enter a reasonable value, it remains `` ''.
There are other uses for this mechanism. For example:
117 : newdir=${newdir:-`pwd`}
This line sets newdir (the directory to change to) to the
newly entered directory, or (if nothing is specified) to the current
working directory.
Variations exist on the default behavior for a variable assignment. Some of the most common variable substitutions you can use are as follows:
Good shell programming technique relies on an understanding of the desired goal and the ability to write clear, easily debugged scripts, but you can also add efficiency through awareness of a few simple rules of thumb.
An effective redesign of an
existing procedure improves its efficiency by reducing its size, and
often increases its comprehensibility.
In any case, you should not
worry about optimizing shell procedures unless they are intolerably
slow or are known to consume an inordinate amount of a system's
resources. Your time, as the programmer, is almost certainly more
expensive than the computer's.
How programs perform
A general law of programming, proven through long experience, is
that in any program the computer spends 90% of its time processing
about 10% of the code. A second general law is that as programs age
and are maintained, the changes introduced to them tend to add
complexity to the original structure and reduce their efficiency. In
this section, we'll look at program performance and means of
improving it.
The flow of control within a program is determined by two types of construct; the loop construct and the branch construct. In batch programs such as filters, these are used in conjunction so that the program does something like this:
# generic filter program
#
read command line arguments
using getopts, for each flag {
set a variable
}
open input and output files
while (input != FALSE) {
read in some data
do something with it
write it to the output file
if an error occurred, exit with a message
}
close input and output files
exit
The first action taken by this generic program is to check its
command line for flags. Using a loop, it reads through each argument
in turn and sets up any internal variables it needs. This loop is
only used by the program when it starts up; for this reason it is
called initialization code.
Having ``parsed'' its arguments, the program now opens its data files. An input and an output file are the lowest common denominator; some programs open several files each for input and output, but this is a simple, generic example. Again, opening the files is only carried out once. Note that in a real program each attempt to open a file will be enclosed in an if construct that checks for errors; if the attempt fails, the else part of the if construct usually causes the program to exit with an error message.
The program now enters a loop, reading data from the input file, doing something to it, and writing it to the output file, while the input is available. (By convention, if an operation succeeds it usually returns a value of 0.) This is the meat of the program; it is where the activity for which the program was written takes place, and it is repeated for a number of times proportional to the amount of data in the input files.
When the program can no longer read any more input, it exits the main loop and executes the termination code of the program. Termination code is used to tidy up after the main loop; to close open files and write a final message to the output. (The command wc, which counts words, uses its termination code to print out a final sum of all the words it counted in its main loop.) This section of the program, like the initialization code, is only executed once.
This program structure is not universal, but it is sufficiently
common to be worth using as a model to demonstrate how to tune your
programs, and it accounts for the vast majority of shell scripts and
non-interactive filters. While shell scripts rarely open data files
and process them directly, they frequently invoke other programs
which do just that; consequently, the same general techniques for
improving performance are applicable to them.
How to control program performance
As mentioned earlier, in any shell script, 90% of the computational
load is imposed by about 10% of the script. The bottlenecks to look
out for are as follows:
The standard development cycle, which should be applied to shell procedures as to other programs, is to write code, get it working, thoroughly test it, measure it, and optimize the important parts (outlined above), looping back to earlier stages wherever necessary. The time(C) command is a useful tool for optimizing shell scripts. time is used to establish how long a command took to execute:
$ time ls
real 0m0.06s
user 0m0.03s
sys 0m0.03s
The values reported by time are the elapsed time during
the command (the real time); the time the system took to execute the
system calls within the command (the ``sys'' time); and the time
spent processing the command itself (the user time). In practice,
only the first value, the real time, is relevant at this
level. Note that this is the output from the Korn shell's built-in
time command; the Bourne shell output may vary. (If you
have the Development System, the
timex(ADM)
command offers additional facilities.)
Because the SCO OpenServer system is multi-tasking, it is impossible to accurately judge how long a program is taking to run by any other means; a seemingly slow process may be the result of an unusually heavy load being placed on the computer by some other user or process. Each timing test should be run several times, because the results are easily disturbed by variations in system load.
A useful technique is to encapsulate the body of a loop within a
function, so that the sole activity within the loop is to call
that function; you can then time the function, and time
the loop as a whole. Alternatively, you can time individual steps
in the process to see which of them are taking longest.
Number of processes generated
When you execute large numbers of short commands, the actual
execution time of the commands might be dominated by the overhead of
creating processes. The procedures that incur significant amounts of
such overhead are those that perform much looping, and those that
generate command sequences to be interpreted by another shell.
If you are worried about efficiency, it is important to know which commands are currently built into the shell, and which are not. Here is an alphabetical list of those that are built in to the Korn shell and Bourne shell (select is Korn shell only):
break case cd continue echo
eval exec exit export for
if read readonly return select
set shift test times trap
umask until wait while .
: {}
Note that echo and test also exist as external
programs. Some other external commands have been added to the
shells, but they are nonstandard and their use will impact the
performance of shell scripts on other systems.
Parentheses, (), are built into the shell, but commands enclosed within them are executed as a child process; that is, the shell does a fork, but no exec. Any command not in the above list requires both fork and exec. The disadvantage of this is that when another process is execed it is necessary to perform a disk I/O request to load the new program. Even if the program is already in the buffer cache (an area of memory used by the system to store frequently accessed parts of the filesystem for rapid retrieval) this will increase the overhead of the shell script.
You should always have at least a vague idea of the number of
processes generated by a shell procedure. In the bulk of observed
procedures, the number of processes created (not necessarily
simultaneously) can be described by the following:
processes = (k*n) + c
where k and c are constants for any given script, and n can be the number of procedure arguments, the number of lines in some input file, the number of entries in some directory, or some other obvious quantity. Efficiency improvements are most commonly gained by reducing the value of k, sometimes to zero. Any procedure whose complexity measure includes n squared terms or higher powers of n is likely to be intolerably expensive.
As an example, here is an analysis of a procedure named file2lower, whose text is as follows:
#!/bin/ksh
#
# file2lower -- renames files in parameter list to
# all-lowercase names if appropriate
#
PATH=/bin:/usr/bin
for oldname in "$@"
do
newname=`echo $oldname | tr "[A-Z]" "[a-z]"`
if [ $newname != $oldname ]
then
{
if [ ! -d "$oldname ]
then
{
mv "$oldname" "$newname"
print "Renamed $oldname to $newname"
}
else
print "Error: $oldname is a directory" >&2
fi
}
fi
done
This shell script checks all the names in its parameter list; if a
file of that name exists, is writable, and contains uppercase
letters in its name, it is renamed to a lowercase equivalent. This
is useful when copying files from a DOS filesystem, because files
imported from DOS have all uppercase names.
For each iteration of the main do loop, there is at least one if statement. In the worst case, there are two ifs, an mv and a print. However, only mv is not built into the shell. If n is the number of files named by the parameter list, the number of processes tends towards (4*n)+0. (The c term of the equation given above is applicable to commands executed once before and after the loop.)
Some types of procedures should not be written using the
shell. For example, if one or more processes are generated for each
character in some file, it is a good indication that the procedure
should be rewritten in C or awk. Shell
procedures should not be used to scan or build files a character at
a time.
Number of data bytes accessed
It is worth considering any action that reduces the number of bytes
read or written. This might be important for those procedures whose
time is spent passing data around among a few processes, rather than
in creating large numbers of short processes. Some filters shrink
their output, others usually increase it. It always pays to put the
shrinkers first when the order is irrelevant. For
instance, the second of the following examples is likely to be
faster because the input to sort will be much smaller:
sort file | grep pattern grep pattern file | sort
In addition, the performance of some programs degrades significantly
as their input files increase in size. Any complex sorting or
comparison operation (using sort or diff)
usually takes significantly longer to perform on a single large file
than on two smaller files containing the same amount of
information. This degradation is an unavoidable consequence of the
nature of the problem these programs are dealing with and can rarely
be worked around, although it is not significant when working with
short files.
Shortening directory searches
Directory searching consumes a lot of time, especially in
those applications that utilize deep directory structures and long
pathnames. Judicious use of cd, the change
directory command, can help shorten long pathnames and thus
reduce the number of directory searches needed. For example, try
the following commands:
time ls -l /usr/bin/* >/dev/null
time cd /usr/bin; ls -l * >/dev/null
The second command runs faster because of the fewer directory
searches.
Directory-search order and the PATH variable
The PATH variable is a convenient mechanism for allowing
organization and sharing of procedures. However, it must be used in
a sensible fashion, or the result might be a great increase in
system overhead.
The process of finding a command involves reading every directory included in every pathname that precedes the needed pathname in the current PATH variable. As an example, consider the effect of invoking nroff (that is, /usr/bin/nroff) when the value of PATH is :/bin:/usr/bin. The sequence of directories read is as follows:
. / /bin / /usr /usr/binA long path list assigned to PATH can increase this number significantly.
The vast majority of command executions are of commands found in /bin and in /usr/bin. Careless PATH setup can lead to unnecessary searching. The following three examples are ordered from worst to best with respect to the efficiency of command searches:
:/usr/john/bin:/usr/local/bin:/bin:/usr/bin :/bin:/usr/john/bin:/usr/local/bin:/usr/bin :/bin:/usr/bin:/usr/john/bin:/usr/local/binThe first one above should be avoided. The others are acceptable and the choice among them is dictated by the rate of change in the set of commands kept in /bin and /usr/bin.
A procedure that is expensive because it invokes many short-lived
commands can often be speeded up by setting the PATH
variable inside the procedure so that the fewest possible
directories are searched in an optimum order.
Recommended ways to set up directories
It is wise to avoid directories that are larger than necessary, for
the same reason that you should avoid large files; directories are a
special type of file, and when a directory grows too large any
process that searches it becomes slower.
You should be aware of several special sizes. A directory that contains entries for up to 62 files (plus the required . and ..) fits in a single disk block and can be searched very efficiently. A directory can have up to 638 entries and still be viable, as long as it is used only for data storage; anything larger is usually a disaster when used as a working directory. The figures 62 and 638 apply to filenames of 14 characters or less. As filename lengths increase, up to a maximum of 255 characters, the number of files that fit on a single disk block decreases, thus reducing the optimum number of files in a directory.
It is especially important to keep login directories small,
preferably one block at most. Note that, as a rule, directories
never shrink. This is very important to understand, because if your
directory ever exceeds either the 62 or 638 thresholds, searches
will be inefficient; furthermore, even if you delete files so that
the number of files is less than either threshold, the system will
still continue to treat the directory inefficiently.
Putting everything together
We have covered most of the shell-specific elements of a style
analysis program, except for two components: the global constants
set up at the top of the file, and the function analyze,
which reports on the readability indices of a file. Here is a
complete listing of the program. (See below for a commentary on the
features that have not yet been covered.)
1 : #-----------------------------------------------------
2 : #
3 : # rap -- Readability Analysis Program
4 : #
5 : # Purpose: provide readability analysis of texts to:
6 : # Kincaid formula, ARI, Coleman-Liau Formula, Flesch
7 : # Reading Ease Score. Also word count, sentence length,
8 : # word length.
9 : #
10 : # Note that rap is _not_ as functional as style(CT),
11 : # which is dictionary-driven; this is the outcome of
12 : # a deliberate attempt to keep everything in a single
13 : # shell script.
14 : #
15 : #------------- define program constants here ----------
16 : #
17 : DEBUG=${DEBUG:-true}
18 : CLS=`tput clear`
19 : HILITE=`tput smso`
20 : NORMAL=`tput rmso`
21 : #
22 : #----- define the lexical structure of a sentence -----
23 : #
24 : # a `word' primitive is any sequence of characters.
25 : #
26 : WORD='[A-Za-z1-90]+'
27 : #
28 : # whitespace is what goes between real words in a sentence;
29 : # it includes carriage returns so sentences can cross line
30 : # boundaries.
31 : #
32 : WHITESPACE="[[:space:]]"
33 : #
34 : # an initial -- one or two letters followed by a period --
35 : # is defined so we call tell that it is not a short sentence.
36 : # (Otherwise Ph.D. would be counted as two sentences.)
37 : #
38 : INITIAL="($WHITESPACE|.)(([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9]).)"
39 : #
40 : # syllabic consonants; consonants including letter pairs:
41 : #
42 : CONS="[bcdfghjklmnpqrstvwxyz]|ll|ght|qu|([wstgpc]h)|sch"
43 : #
44 : # syllabic vowels; include the ly suffix
45 : #
46 : VOWL="[aeiou]+|ly"
47 : #
48 : # definition of a syllable (after Webster's Collegiate Dictionary)
49 : #
50 : SYL="(${CONS})*\
51 : ((${CONS})|((${VOWL})+))\
52 : (${CONS})*"
53 : #
54 : # Finally, a sentence consists of (optionally) repeated
55 : # sequences of one word followed by zero or more
56 : # whitespaces, terminated by a period.
57 : #
58 : SENT="($WORD($WHITESPACE)*)+."
59 : #
60 : #---------- initialize some local variables -----------
61 : #
62 : SCRIPT=$0
63 : help='no' ; verbose=' ' ; record=' '
64 : next_log_state='ON'; log='OFF' ; batch=' '
65 : file=' ' ; fname=' ' ; LOGFILE=$$.log
66 : #
67 : #--------------- define program traps here ------------
68 : #
69 : trap "strike_any_key" 1 2 3 15
70 : #
71 : #----------------- useful subroutines -----------------
72 : #
73 : getc ()
74 : {
75 : stty raw
76 : tmp=`dd bs=1 count=1 2>/dev/null`
77 : eval $1='$tmp'
78 : stty cooked
79 : }
80 : #
81 : #-----------------------------------------------------
82 : #
83 : toggle_logging ()
84 : {
85 : log=$next_log_state
86 : case $log in
87 : ON) next_log_state=OFF ;;
88 : OFF) next_log_state=ON ;;
89 : esac
90 : }
91 : #
92 : #-----------------------------------------------------
93 : #
94 : get_fname ()
95 : {
96 : echo "Enter a filename: \c"
97 : read newfname
98 : fname=${newfname:-${fname}}
99 : }
100 : #
101 : #------------------------------------------------------
102 : #
103 : strike_any_key()
104 : {
105 : echo '
106 : strike any key to continue ...\c'
107 : getc junk
108 : echo $CLS
109 : }
110 : #
111 : #-----------------------------------------------------
112 : #
113 : change_dir ()
114 : {
115 : echo "Enter a directory: \c"
116 : read newdir
117 : newdir=${newdir:-`pwd`}
118 : cd $newdir
119 : echo "Directory set to: $newdir"
120 : }
121 : #
122 : #-----------------------------------------------------
123 : #
124 : _help()
125 : {
126 : echo "
127 :
128 : Readability Analysis Program
129 :
130 : A shell/awk demo to determine the readability grade of texts
131 :
132 : Usage:
133 :
134 : Either invoke with no options for full menu-driven
135 : activity, or use the following flags:
136 :
137 : -[h|H] prints this help
138 : -l cause output to be logged to a file
139 : -f file enter the name of the file to check
140 : -b run in batch mode (no menus)
141 : "
142 : }
143 : #
144 : #---------- define the menu handler functions here ----
145 : get_file()
146 : {
147 : while :
148 : do
149 : echo $CLS
150 : echo "
151 :
152 : ${HILITE}Select a file${NORMAL}
153 :
154 : Current file is: [${HILITE} $fname ${NORMAL}]
155 :
156 : Type the letter corresponding to your current task:
157 :
158 : [space] Enter a filename or pattern to use
159 : l List the current directory
160 : c Change current directory
161 : q quit back to main menu
162 :
163 :
164 : =======>\c"
165 : getc char
166 : case $char in
167 : ' ') get_fname ;;
168 : 'l') ls | ${PAGER:-more} ;;
169 : 'c') change_dir ;;
170 : 'q') break ;;
172 : esac
173 : strike_any_key
174 : done
175 : }
176 : #
177 : #------------------------------------------------------
178 : #
179 : analyze()
180 : {
181 : if [ $fname = " " ]
182 : then
183 : echo "
184 :
185 : You must specify a filename first
186 : "
187 : strike_any_key
188 : return 1
189 : fi
190 : wordcount=`wc -w < $fname`
191 : lines=`wc -l < $fname`
192 : nonwhitespace=`sed -e "/${WHITESPACE}/s///g" < $fname | wc -l`
193 : sentences=`awk -e ' BEGIN { sentences = 0
194 : target = ""
195 : marker = "+X+"
196 : }
197 : { target = target " " $0
198 : initials = gsub(init, "", target)
199 : hit = gsub(sent, marker, target)
200 : sentences += hit
201 : if (hit != 0) {
202 : for (i= 0; i < hit; i++) {
203 : found = index(target, marker)
204 : target = substr(target, found+3)
205 : } # end for
206 : } # end if
207 : hit = 0
208 : }
209 : END { print sentences }
210 : ' sent="$SENT" init="$INITIAL" < $fname`
211 : letters=`expr $nonwhitespace - $lines`
212 : sylcount=`awk -e ' BEGIN { sylcount = 0 }
213 : { target = $0
214 : sylcount += gsub(syllable, "*", target)
215 : }
216 : END { print sylcount }
217 : ' syllable="$SYL" < $fname`
218 : echo "
219 :
220 : Number of words: $wordcount
221 : Number of syllables: $sylcount
222 : Number of sentences: $sentences
223 :
224 : "
225 : export letters wordcount sentences sylcount
226 : ARI=`bc << %%
227 : l = ($letters / $wordcount)
228 : w = ($wordcount / $sentences)
229 : 4.71 * l +0.5 * w -21.43
230 : %%
231 : `
232 : Kincaid=`bc << %%
233 : w = ($wordcount / $sentences)
234 : s = ($sylcount / $wordcount)
235 : 11.8 * s + 0.39 * w - 15.59
236 : %%
237 : `
238 : CLF=`bc << %%
239 : l = ($letters / $wordcount)
240 : s = ($sentences / ($wordcount / 100))
241 : 5.89 * l - 0.3 * s - 15.8
242 : %%
243 : `
244 : Flesch=`bc << %%
245 : w = ($wordcount / $sentences)
246 : s = ($sylcount / $wordcount)
247 : 206.835 - 84.6 * s - 1.015 * w
248 : %%
249 : `
250 : if [ log = "ON" ]
251 : then
252 : echo "
253 : ARI = $ARI
254 : Kincaid= $Kincaid
255 : Coleman-Liau = $CLF
256 : Flesch Reading Ease = $Flesch" > $LOGFILE
257 : fi
258 : echo "ARI = $ARI
259 : Kincaid= $Kincaid
260 : Coleman-Liau = $CLF
261 : Flesch Reading Ease = $Flesch" > /dev/tty
262 : }
263 : #
264 : #=========== THIS IS WHERE THE PROGRAM BEGINS =========
265 : #
266 : #
267 : #---------- parse the command line---------------------
268 : #
269 : while getopts hHvlbf: result
270 : do
271 : case $result in
272 : h|H) help="yes" ;;
273 : v) verbose="yes" ;;
274 : l) record="yes"
275 : next_log_state=off
276 : log=ON ;;
277 : b) batch="yes" ;;
278 : f) file="yes"
279 : fname=${OPTARG:-" "} ;;
280 : *) help="yes" ;;
281 : esac
282 : done
283 : if [ $help = "yes" ]
284 : then
285 : _help
286 : exit 1
287 : fi
288 : if [ $batch = "yes" ]
289 : then
290 : analyze
291 : exit 0
292 : fi
293 : #
294 : #---------- enter the mainloop ------------------------
295 : #
296 : while :
297 : do
298 : echo $CLS
299 : echo "
300 :
301 : ${HILITE}Readability Analysis Program${NORMAL}
302 :
303 : Type the letter corresponding to your current task:
304 :
305 : f Select files to analyze [now ${HILITE}$fname${NORMAL} ]
306 : p Perform analyses
307 : l switch ${next_log_state} report logging [now ${HILITE}$log${NORMAL}]
308 : q quit program
309 :
310 :
311 : =======>\c"
312 : getc char
313 : case $char in
314 : 'f') getloop=1
315 : get_file ;;
316 : 'p') analyze
317 : strike_any_key ;;
318 : 'l') toggle_logging ;;
319 : 'q') break ;;
320 : (**) continue ;;
321 : esac
322 : done
323 : clear
324 : exit 0
The variable definitions from lines 17 to 65 set up some constants
for screen clearing and highlighting, initialize variables for use
in the script, and define some extended regular expressions, as
explained in
Chapter 12, ``Regular expressions'',
that are used later to scan the target file for initials, sentences,
and syllables. The mechanism used to conduct the scan is a pair of
scripts written in the awk programming language (explained
in
Chapter 13, ``Using awk'')
that identify the number of sentences in a file, and the number of
syllables in the file. These scripts lie between lines 190 and 217;
they are explained in detail in
``Spanning multiple lines''.
Readability analysis
Four different readability statistics are calculated within
analyze. Readability statistics assess variables including
the average number of words per sentence, average length of
sentences, number of syllables per word, and so on, to derive a
formulaic estimate of the ``readability'' of the text. They do not
take into account less quantifiable elements such as semantic
content, grammatical correctness, or meaning. Thus, there is no
guarantee that a text that a readability test identifies as easy to
understand actually is readable. However, in practice it has been
found that real documents that the tests identify as ``easy to
read'' are likely to be easier to comprehend at a structural level.
The four test formulae used in the analyze function are as follows:
File rap-bat.wc contains:
243 words
95 lines
1768 characters
Sentences are counted using a custom awk script, explained
in
``Spanning multiple lines''.
Then the number of letters is established (by subtracting the white
space from the file and counting the number of characters), and the
number of syllables is estimated using another awk
script. Finally, these values are fed into four calculations that
make use of bc, the SCO OpenServer binary calculator.
bc is a simple programming language for calculations; it recognizes a syntax similar to C or awk, and can use variables and functions. It is fully described in bc(C), and is used here because unlike the shell's eval command, it can handle floating point arithmetic (that is, numbers with a decimal point are not truncated). Because bc is interactive and reads commands from its standard input, the basic readability variables are substituted into a here-document which is fed to bc, and the output is captured in another environment variable. For example:
233 : Flesch=`bc << %% 234 : w = ($wordcount / $sentences) 235 : s = ($sylcount / $wordcount) 236 : 206.835 - 84.6 * s - 1.015 * w 237 : %% 238 : `analyze also prints the output from the tests, as follows:
ARI = -10.43 Kincaid= -7.01 Coleman-Liau = -17.00 Flesch Reading Ease = 184.505Depending on the setting of $LOG (the variable that controls file logging) the output is printed to the terminal, or printed to the terminal and a logfile (the name of which is set by the variable $LOGFILE.)
If you want to customize the script for your own purposes, the place
to start is in the callback functions. Strip out the existing
functions, and replace them with your own: then change the
here-document that displays the opening menu. If you change the keys
that trigger the callback functions, remember to modify the
case statement below the menu. You can add as many extra
callbacks as you like to the menu, but it is a good idea not to
provide too many options on any one screen: remember that your users
can become confused if confronted with too many choices or too much
information.
Other useful examples
This section gives examples of some other useful procedures for
automating tasks. All the scripts and sections listed below are
intended to run under the Korn shell; you may have to modify them if
you want to use the Bourne shell.
Mail tools
The following tools are used for manipulating mail folders and
sending large files through mail.
Consider the following script:
cnt=`grep '^A^A^A^A' $1 |wc -l` print $(( cntot = cnt / 2 ))MMDF stores messages in a folder as continuous ASCII text, delimited at top and bottom by a line containing four <Ctrl>A characters. This script searches for the message delimiters and sets cnt to the number of lines containing delimiters. It then uses the Korn shell arithmetic facility to divide this total by two (because there are twice as many delimiters as messages). Thus, this script prints the number of messages in a MMDF mail folder.
It is not appropriate to use this script on a XENIX-format mail folder.
The following short script searches the files named by its positional parameters for lines beginning with the string ``Subject:''.
grep "^Subject:" $*|cut -c9-7Mail headers consist of a series of lines beginning with keywords, like this:
From: To: Subject: Date: Organization: Sender: Reply-To: Message-Id: X-Mailer: Status:The subject lines are printed through a pipe to cut, which chops out and prints only character positions 9 through 71 on each line (thus removing the string ``Subject:'' and truncating long lines).
Note that this script makes no allowances for mail messages that contain other (quoted) messages without indentation. To do this, it would be necessary to write a longer script. (Hint: The end of a mail message is indicated by two lines containing four <Ctrl>A characters each. Valid mail messages can have only one ``Subject:'' line. A better script would search for the first occurrence of a ``Subject'' line following a sequence of ``^A^A^A^A''.) Note also that the ``Subject:'' line is not mandatory, so this script will miss messages that lack a subject line altogether.
Note that the line numbers in this example are not part of the script, but are provided for clarity: script, but are provided for clarity:
1 : #! /bin/ksh
2 : #
3 : #----- blocksize*80 is the maximum size of each chunk created
4 : #
5 : blocksize=512
6 : #
7 : #----- perform sanity checks on input
8 : #
9 : case $# in
10 : 2) : break
11 : ;;
12 : *) echo "
13 :
14 : $0 <user> <file>
15 :
16 : compress, uuencode, split into 1000 line chunks and mail
17 : <file> to <user>.
18 :
19 : This script is used to send large files (greater than
20 : 32KB) via email. <user> must be a valid mail address;
21 : On completion, chunk will send a status report to you
22 : via email.
23 : "
24 :
25 : exit 2
26 : ;;
27 : esac
28 : #
29 : #--------- test for a valid file -----------
30 : #
31 : target=$2
32 : user=$1
33 :
34 : [ -s "$target" -a -r "$target" ] || {
35 : print -- Missing, empty or not readable: $target >&2
36 : exit 1
37 : }
38 : #
39 : # -------- end of sanity checks ------------
40 : #
41 : tmpdir=${TMPDIR:-/u/tmp}/$$
42 :
43 : mkdir $tmpdir || exit 1
44 : compress < $target | uuencode $target | (cd $tmpdir; split -$blocksize)
45 : cd $tmpdir
46 : for chunk in *
47 : do
48 : mail -s "section $chunk of $target" $user < $chunk &&
49 : print "Sent section $chunk at"; date
50 : done 2>&1 | mail -s "Result of sending $target" $user
51 : cd
52 : rm -rf $tmpdir
This script (called chunk) takes two arguments; a valid mail address and a filename. Because the consequences of proceeding on the basis of a bad argument list could be messy, some checks are carried out (from lines 9 to 27). The case statement on line 9 tests whether there are too few arguments, and aborts with a usage message if this is the case.
The real work of the script is carried out from lines 41 to 52: target has previously been assigned the name of the file to transmit. The file is compressed, and uuencoded, then piped through split into sequentially named chunks of blocksize lines that are stored in $tmpdir.
Some mail gateways will not handle messages which are more than some arbitrary size; therefore the exact size of the chunks created by this mailer is defined in a single variable which can be adjusted easily.
A for loop now iterates over each chunk and invokes mail. Because the chunks contain no human readable information, it is vital to incorporate the name of each chunk in the message header.
Finally, a record of the transmission is mailed to the recipient, so that they know what to do with the pieces.
To reassemble a file from its component pieces,
save the pieces (in order) to a file,
edit the file to remove mail headers and blank lines, uudecode the
file, and uncompress it. This method can be used to send
large files through size-restricted mail gateways.
File tools
The following scripts are used for manipulating and returning
information on files.
The following is a script called filesize:
l "$@" | awk ' { s += $5
f = f" "$NF
}
END { print s, "bytes in files:", f} '
The l command (equivalent to ls -l) returns a
long listing, the fifth field of which contains the size of a file
in bytes. This script obtains a long listing of each file in its
argument list, and pipes it through a short awk
script. For each line in its standard input, the script adds the
fifth field of the line to the variable s and appends the
last field (the filename) to the variable f; on reaching
the end of the standard input, it prints s followed by a
brief message and f.
The compress(C) command can compress a batch of files listed as arguments; however, if you run compress in this way only one process is created, and it compresses each file consecutively.
The following code is a script called squeeze:
((jobcount=0)) ; rm squish.log
for target in $*
do
if ((jobcount+=1 > 18))
then ((niceness = 18 ))
else
((niceness = jobcount ))
fi
((jobcount % 18 != 0)) || sleep 60
nice -${niceness} compress ${target} && print "Finished compressing " \
${target}>> squish.log &
print "Started compressing "${target} "at niceness " \
${niceness} >> squish.log
done
print "finished launching jobs" >> squish.log
A concurrently running squeeze process is started for each
file. However, if run on a large directory, this could overload the
system: therefore, squeeze uses
nice(C)
to decrease the priority of processes as the number increases.
The first section of this script keeps track of the niceness (decrement in scheduling priority) with which each squeeze job is to be started:
if ((jobcount+=1 > 18))
then ((niceness = 18 ))
else
((niceness = jobcount ))
fi
The value of jobcount is incremented every time a new file
compression job is started. If it exceeds 18, then the niceness
value is pegged to 18; otherwise, the niceness is equal to the
number of files processed so far. (nice accepts a maximum
value of 18; this construct places a bounds check on the argument
passed to it.)
The following line is a special test:
((jobcount % 18 != 0)) || sleep 60If jobcount is not a multiple of 18 (that is, if there is a nonzero remainder when jobcount is divided by 18) then the first statement evaluates to TRUE and the second statement (separated by the logical OR) is not executed. Conversely, when jobcount is an exact multiple of 18, the first statement is evaluated to ``0 != 0'', which is false. When the first statement fails, the second statement (sleep 60) is executed. Thus, on reaching every eighteenth file, the script sleeps for one minute to allow the earlier compression processes to complete.
The real action of the script is as follows:
nice -${niceness} compress ${target} && print "Finished compressing " \
${target}>> squish.log &
print "Started compressing "${target} "at niceness " \
${niceness} >> squish.log
nice is used to start a compress process for
each target file with the niceness level predetermined by the
counter in the if loop at the top of the program. A
logical AND connective is used to print a message to the
file squish.log when the compression job terminates; the
whole command line is executed as a background job. The shell then
executes the next line, which prints a start message to the logfile,
almost certainly executing it before the compression process has
begun. (This illustrates the asynchronous execution of processes.)
It is well worth examining the logfile left after running squeeze on the contents of a directory. This illustrates how concurrent execution of processes can provide a significant performance improvement over sequential execution, despite the apparent complexity of ensuring that a rapid proliferation of tasks does not bring the system to its knees.
You can adapt squeeze to run just about any simple filter
job in parallel; simply define a function to do the operation you
want, then use it to replace compress.
Useful routines
The following routines are not entire scripts, but may be useful in
context.
It is sometimes necessary to use a shell script that controls access to a shared resource; for example, a file which should only be written by one person at a time. The following skeleton code shows an appropriate wrapper for such a script:
trap "exit 1" 1 2 3 15
#
# trap is vital, otherwise we may loop infinitely
#
LOCKFILE="/tmp/$$.LCK"
OMASK=$(umask)
umask 777
until > $LOCKFILE
do
sleep 1
done 2> /dev/null
umask $OMASK
# now we can write critical data safely, unless root
.
.
.
# finished critical section
rm -f $LOCKFILE
The user's old umask value is saved in OMASK,
and their umask is reset to 777; this means that any files
the user creates will have no read, write or execute permissions.
LOCKFILE is the name (determined elsewhere in the script) of a lock file. While a lock file exists, only the owner of the file should be allowed to operate on the shared data. This is ensured by the until loop:
until > ${LOCKFILE}
do
sleep 1
done 2> /dev/null
The value of until only becomes TRUE when it can
create a lockfile; this can only happen when no other users of the
script have created a lock. (The lock has no write permission for
anyone other than its creating process.) If this condition is true,
the script creates the empty ${LOCKFILE} and continues; if
false, it sleeps for a second and tries again. Having acquired the
lockfile, the script resets umask to the user's original
file creation permissions.
Having acquired a lock file, it is now certain that anyone else trying to run the script at the same time will get as far as the loop but no further; it is therefore safe to work on the shared resource, knowing that nobody else is simultaneously using it and might accidentally overwrite the user's changes. After using the shared resource, it is important to delete the lockfile; if the lock file is left behind, nobody will be able to access the shared resource.
This kind of access locking is typically used to control databases
or critical applications where it is unsafe to risk a race condition
(where two processes try to update a shared resource concurrently,
overwriting each other's changes).
Context sensitive scripts
Some programs, for example ls, have many options. Rather
than require users to always specify the commonest options,
ls has a number of links (alternative names). When you run
ls it examines the parameter $0, which contains
the name under which it was invoked, and uses the appropriate
options. For example, l is equivalent to ls -l;
lc is equivalent to ls -c, and so on.
Your scripts can behave the same way. For example:
# should check number and type of args here
case `basename $0` in
add) expr $1 + $2
;;
subtract) expr $1 - $2
;;
multiply) expr $1 \* $2
;;
divide) expr $1 / $2
;;
*)
echo "Unknown operation: $0" >&2
exit 1
;;
esac
exit
This short script has four names; it can be invoked as
add, subtract, multiply and
divide. It takes two arguments, and evaluates them
according to the name under which it was invoked. basename
is used to remove any preceding path (which might prevent the
case statement from matching anything). For example:
$The variable $0 contains the name under which the script was invoked. By using links to the script (rather than four separate script files) we conserve the number of files needed. In addition, if it is necessary to alter the behavior of all the programs, you can alter just the core file and the change will be recognized by all the links to it.add 5 49 $subtract 4 5-1 $
As an alternative, we could write an application that used several command line tools to update a database, all of which were links to a single tool that behaved differently depending on the context in which it was invoked.