One of the things professors of computer science like to believe is
that it doesn't matter which programming language you're going to use,
that the important thing is to learn to program.
This writeup is a comparison of bash to Python, in the framework
that Jongleur set up in his Learn to Program writeup. I don't
actually think you should learn to program bash as your first language.
It is, however, a very useful language to know if you spend a lot of
time logged into a terminal, and it's also nice to be exposed to
different languages as you're learning them. One of the key things
about programming in real life is that you often don't get to choose
which language you'll be working in. So it's best to learn early on
how to learn a new language: what sorts of things to look for as
differences and similarities, and how to work around the limitations of
a language.
It's possible to write real programs in bash, and if you can manage to
do that, you can probably learn any other language too. bash has one
huge advantage over Python: it's already installed on every Linux
machine, and most UNIX machines as well. These days, it's even
installed on Macs, although the majority of users like to remain
blissfully unaware of the powerful command line interface that sits
just a click away... (If you use Windows, you'll have to
install it, but nobody actually writes scripts for Windows,
right?)
It's worth learning to program in a nice programming language
first. But it's also worth learning that you can do everything in that
language in another, more clunky language. And who knows? Someday you
might need to write a huge program in a clunky language, so it'll be
nice to be able to think in terms of nice programming principles while
you're doing it.
I'm not going to analyze any of the programs in depth. They're
invariably translations of programs from Jongleur's writeups. Each
heading below is a link to Jongleur's writeups on the same topic.
The instructions for writing a program in bash are amazingly similar
to those for Python. One of the primary reasons for this is that both
Python and bash are interpreted languages. The first step is to put
your program in a file.
So, let's call the first program first.sh. Type or paste
the following:
echo 'My name is Grae'
echo
echo 'Oh my god.' 'This actually worked!'
In order to run this program, you'll go to the directory you saved it
in, and type bash first.sh.
If you'd like to do things with escape characters (like \n), you'll have
to tell echo that your string contains these characters, by
passing it a -e. You can still use the same escape
characters available in Python (or C, C++, Java...). An example:
To print:
Hello!
This is on another line.
You could run a single echo command:
echo -e 'Hello!\nThis is on another line.'
In bash, it's also possible to use double-quotes to encode strings.
However, they'll act a little bit differently. You'll need to escape
various characters in order to get them to show up, because otherwise
bash thinks you're trying to do something else. For example, the
! character is special (it's the history expansion
character), so if I wanted to write the program above using double
quotes, I'd have to say:
echo "My name is Grae"
echo
echo "Oh my god." "This actually worked\!"
In bash, expressions are assumed to be strings unless otherwise
indicated. What I mean by this is that if you put 3
somewhere in your bash program, bash will treat it like a string, not a
number.
In order to get it to act like a number, you need to surround it by
$[ and ] characters. Inside these
brackets, you're allowed to do any sort of math that bash recognizes.
Example:
echo 3
echo $[3+3]
echo $[3*3]
echo $[3/3]
echo $[3-3]
echo $[((3+3)*(3*3)/(3/3))-(3-3)+3]
echo $[3+3*3*3/3/3-3-3+3]
echo $[3**3]
Unfortunately, bash doesn't understand floating point math. So if you
try to do something like echo $[3.0+2.0],
you'll get an error message:
bash: 3.0+2.0: syntax error in expression (error token is ".0+2.0")
There are various tricks you can use to get a decimal point into
integer math, but they're sort of annoying to use, and they're a bit
of a hack anyways. The answer that most bash programmers use is to
just call a program that knows how to do floating point math.
The simplest way to do this is to use bc.
echo "(98.6-32)*5/9" | bc -l
By using another program, you get to avoid
reinventing the wheel.
This is an important technique in bash programming: there are a lot of
programs out there that already do the things you might want to do.
This usually isn't a major limitation in programs. You'll want to be
careful of using floating point numbers to store things that you
actually care about getting accurate numbers for, due to round-off
error. If you ever want to write something that's going to keep track
of money (or anything else that needs to be accurate), look into using
fixed point arithmetic. Oh, and you probably don't want to be using
bash for it either.
bash also has both variables and functions. It's rather difficult to
program without these. I think the best way to think of both of these
concepts is as a name for some other thing. Variables encapsulate a
piece of data; functions encapsulate a way to do something. This
distinction is a little artificial, and most languages have some way to
treat functions as just another form of data.
Variables
Variables in bash work much the way they do in python, with a couple
of differences:
- you can't put spaces next to the equal sign of an assignment
operation. (And people complain that python is dependent on
whitespace...)
- when you reference a variable after it has been set, you prefix it
with a $ to let bash know it's not a string.
An example:
majorstring="I am the very model of a modern major-general"
echo $majorstring
echo $majorstring
echo $majorstring
echo $majorstring
echo $majorstring
The temperature conversion program would look like:
celsius=22
echo "The temperature is $celsius degrees C, or $[$celsius*9/5+32] degrees F"
(Unlike Python (but like
Perl), it's typical to embed variable
substitutions in the middle of strings in bash programming.)
The next example: changing a variable:
row=1
echo "$[row*0] $[row*1] $[row*2] $[row*3] $[row*4] $[row*5] $[row*6]"
row=$[$row+1]
echo "$[row*0] $[row*1] $[row*2] $[row*3] $[row*4] $[row*5] $[row*6]"
Functions
In bash, calling a function and running a program look exactly the same.
When writing a bash program, it's useful to think of every other program
that's installed on your computer as just another function.
In order to read input in bash, just use read name.
This would set the variable $name to whatever the user types. Example:
echo "What is your name? "
read name
echo "What is your age? "
read age
year=$(date +"%Y")
echo "Your name is $name"
echo "You are $age years old"
echo "you were born in the year $[$year-$age]"
I snuck in another function call that didn't exist in the
Python version of this tutorial.
In the process, I show how to grab the output of a function (just
surround it by $(...)). One reason I do this is because I'm lazy; I
don't want to have to change this write-up whenever the year changes.
And an important thing to remember when programming: a good programmer
is a lazy programmer. At any rate, the date command tells
you the date, and the string after the + tells the format you want.
"%Y" turns into the year.
The temperature conversion program would look like:
echo "Enter degrees Celsius"
read celcius
fahrenheit=$[(9*$celcius)/5 + 32]
echo "$celsius degrees C is $fahrenheit degrees F"
(the astute reader might notice that I haven't actually introduced any functions yet... read is actually a special statement in bash, and date is a separate program. Still, the syntax you use to call them is the same, so I'm going to ignore that for now. bash doesn't really have any built-in functions (they're all special statements or add-on programs), so until you write your own functions, you can't really call one.).
The concept of types is an important concept in programming. The comparison between Python and bash will illustrate some of the major differences between different programming languages.
The type system of bash is simple. Everything's a string. It turns
out you can interpret strings as lists or numbers, but deep down,
everything is considered a string. In some ways, this is a bad thing:
if we have to convert from a string to a number and back again every
time we do some math, it's going to take a long
time to do simple arithmetic. This is one of the major reasons you'd
never want to do molecular modeling in a bash script: it doesn't make
that sort of thing efficient.
It's not entirely true that the type system of bash is this
simple. As of version 2.0, bash supports arrays. However, I think
it's a bit more useful to learn a dialect of bash that will work on more
computers, so I'm going to ignore their existence.
Another difference between bash and Python: functions aren't a data
type in bash. Although because bash is an interpreted language, you can
get a lot of the benefits of having functions as data types by writing
programs which generate their own code dynamically. At this point,
though, it's unlikely that you'll want to do anything like that.
bash supports the concept of code modules. In order to import a module
in bash, you say source module. Alternatively,
you can use . as an abbreviation for source
(and most programmers do).
Programming languages are pretty useless if you can't do something
different based on different values of a variable. The basic format of
a bash if statement follows:
if condition_to_test; then
actions_to_run_if_condition_passed
else
actions_to_run_if_condition_failed
fi
The condition_to_test is just a call to a
command or
function. Every function or program returns a code telling whether it
succeeded or failed (this is stored in the special variable
$?.)
An example:
echo 'What is your name?'
read name
if [ "$name" = 'grae' ]; then
echo 'you wrote this program'
else
echo "Your name is $name"
fi
You can even make a pretty dumb game:
number=7
echo 'What number am I thinking of?'
read guess
if [ "$guess" = "$number" ]; then
echo 'You got it!'
else
echo "Sorry, I was thinking of $number"
fi
So up above, I mentioned that the condition of the if statement was
a command. The most common command to use for this is the test
command, and [ is an abbreviation for test. The [ command allows
you to do various comparisons. The [ command takes all sorts of
options to do comparisons. It's generally considered good practice to
quote strings that you use inside comparisons, to protect against bad
data that can get put into the string. In the above example, if the
user simply hit enter, guess would be an empty string,
and [ would barf when it was used to run
[ = 7 ].
One
interesting note: if you use =, you're doing a string comparison. If
you want to compare if two numbers are equal, use -eq. (The numeric
comparisons consist of two-letter abbreviations). An example:
if [ \( \( 1 -lt 2 \) -a \( 4 -lt 3 \) \) -o ! \( 5 -lt 4 \) ]; then
echo it was true
else
echo it was false
fi
As you'll see from the above example, using bash for numeric comparisons
is a bit annoying.
Parentheses are special characters, and must be
escaped with backslashes. Also, you need to know that
! means
"not", -a means "and", and -o means "or". These last two are only true
inside a [; outside of a test, you can use && and ||. An
equivalent version of the previous program:
if ( [ 1 -lt 2 ] && [ 4 -lt 3 ] ) || ! [ 3 -lt 4 ]; then
echo it was true
else
echo it was false
fi
In some environments, [ is a built-in command, and in others it's actually a program that gets called. This is true of a few of the "standard" bash functions.
Like comparisons, loops are essential to good programming. They're the
way to run a statement multiple times without writing the same thing
over and over.
for loops
The most common sort of loop is the for loop. The basic syntax of the
for loop follows:
for name in list; do
instructions
done
This executes
instructions once for each item in
list, setting
name to each
item in turn. In bash, a
list is just a
string separated by
whitespace. Example:
words="what is going on?"
for i in $words; do
echo $i
done
This program will echo each word in $words on a separate line.
A fairly common thing to do is loop a certain number of times. The
easiest way to do this is to loop over a list with the right number of
elements. bash doesn't have any nice built-in functions for
constructing lists, but once again, a stand-alone program comes to the
rescue. The seq program, given two numbers, constructs a list
counting from the first number to the second number. So if I wanted to
print my name 12 times, I could write:
for i in $(seq 1 12); do
echo "Grae"
done
while loops
While loops are the most general loops in programming, and can be used
to emulate each of the other sorts of loops that exist. It's often a
bit annoying to do this, though, so you probably want to use whichever
specialized loop is most appropriate for your application.
In a while loop, the body of the loop is executed as long as the
specified condition is true. An example:
answer=7
echo "What number am I thinking of?"
read guess
while [ "$guess" != "$answer" ]; do
if [ "$guess" -lt "$number" ]; then
echo "Nope, too low"
elif [ "$guess" -gt "$number" ]; then
echo "Nope, too high"
fi
echo "What number am I thinking of?"
read guess
done
One of the most important things to figure out when programming is how
to write the code so that it's not overly repetitious. There are
various reasons that this is a good thing. One of the best is that if
you repeat part of your program that's buggy, you only have to fix it in
one place.
The typical way of accomplishing this modularity is by writing
functions. So, how could we write a function that tests if a string
is two words long, has a first word that starts with 'a', and a second
word that starts with 'b'?
function conformsToSpec {
if [ -n "$3" ]; then
return 1
fi
[ "${1#a}" != "$1" ] && [ "${2#b}" != "$2" ]
}
This snippet of code is full of new concepts. The first thing to notice
is that a function stops running when it hits a return statement. The
number that gets passed to the return statement is the return value of
the function, and that's what we can use in an if statement. 0
means true, and any other number means false (this is completely
backward from most other languages).
The first line says that we're defining a function, and it's called
conformsToSpec. The curly braces surround the body of
the function.
The arguments to the function are referred to as $1, $2, $3, etc.
So the first test (the next three lines of the function) are doing
something with the third argument. Above, I didn't mention the -n
flag to the [ command; it tests to see if a string is a non-empty
string. So we're testing to see if there's a third word, and if so, we
return a non-zero value.
There's a second way a function can return a value. If no other return
value is specified, a function returns the same as the last command it
called. So in this case, we're returning the results of the last test.
bash isn't very good at string processing, but there are a couple of
things you can do without resorting to calling other programs. One of
them is to cut off a given prefix of a string. This is done by saying
${variablename#prefix}. If
$variablename starts with prefix, this
expression turns into the variable with the prefix cut off. So, for
example, if $a is "foo", ${a#f} is "oo".
In our example above, ${1#a} is the same as $1
only if $1 doesn't start with the letter 'a'. Likewise,
${2#b} is the same as $2 only if
$2 doesn't start with the letter 'b'. So if both strings
compare differently, we know that the first word starts with 'a' and the
second starts with 'b'.
So now that we have a function, how do we go about using it? We could
do something like the following:
for str in "a bouncer" "Apple bat" "apple" "a b c d" "ab" "booger alpha"; do
if conformsToSpec $str; then
echo "$str is a conformant string."
else
echo "$str doesn't conform."
fi
done
This particular example wasn't all that useful, so let's write a
function that does something a bit more interesting, converting
temperatures from fahrenheit to Celcius.
function convertTemperature {
celcius=$[($1-32)*5/9]
echo $celcius
}
This function is a little bit different than the last one, in that we're
not returning the interesting bit in the return value (which will always
be 0, because
echo always returns 0, and it's the last thing we call
in our function.) Instead, we're returning it by printing it out, and
we can use $(...) to capture it in a variable we specify later. For
example, if we wanted to convert the boiling point of water, we could
say boilingPoint=$(convertTemperature 212).
It's hard to write an interesting program that's also short. And once
programs get to a certain length, you're probably going to want to
separate bits of your code out into other files.
Unlike most other languages, bash doesn't have a concept of separate
namespaces. So once you include a file that has all your functions in
it by saying . filename, you can call any function that was
defined in that file. (Incidentally, it will also run any code that
wasn't inside a function definition, so you'll have to be a little bit
careful about what sorts of files you include in this way.)
Conclusion: Why would I want to learn bash?
Like I mentioned up in my introduction, if you spend all your time in
front of a terminal window, it's a great language to know.
Essentially what you're doing whenever you're running in a terminal
window is writing a script that bash is running.
I'll give a short example from my work. I happen to work at a
popular search engine, and sometimes it's handy to come up with
lists of the top 100 queries in each individual language. So, if I
happen to have a directory that has a whole bunch of files in it that
contain lists of queries, and I want to count up all the queries and
tabulate them into top 100 lists, I could write the following:
cd query_lists
for file in *; do
sort $file | uniq -c | sort -rn | head -100 > $file.top100
done
Since I've written so much bash, I don't even have to think to write
that. But it saves me a lot of time over typing:
sort english | uniq -c | sort -rn | head -100 > english.top100
sort japanese | uniq -c | sort -rn | head -100 > japanese.top100
sort german | uniq -c | sort -rn | head -100 > german.top100
sort esperanto | uniq -c | sort -rn | head -100 > esperanto.top100
sort klingon | uniq -c | sort -rn | head -100 > klingon.top100
sort french | uniq -c | sort -rn | head -100 > french.top100
sort portuguese | uniq -c | sort -rn | head -100 > portuguese.top100
sort russian | uniq -c | sort -rn | head -100 > russian.top100
sort dutch | uniq -c | sort -rn | head -100 > dutch.top100
Not only that, but I don't accidentally forget some important language
(oops... I left out
Chinese...)
Another reason bash is a good language to learn is that it's a good
glue language. It's easy to do things like I did up above and pipe
the output of one program into another. So you can easily do something
like sort a file, count unique occurrences that are next to each other,
sort that list in reverse numberical order, take just the first 100
lines, and put the results in a new file. Try doing that in python...
It's not hard, but it takes a lot more code (and therefore more
opportunities for bugs).
So have a good time with your new bash skills...
acknowledgments: I'd like to thank Jongleur, fuzzy and blue, Delta-sys, Simpleton, and ariels for their suggestions and feedback.