StriNg-Oriented symBOlic Language: An obsolete computer language designed as a quick and dirty string manipulation/pattern matching language. It slithered out of Bell Labs sometime in the 1960s, peaking in 1967 with SNOBOL4. It was one of the first languages to use a a virtual machine thus anticipating Java and the like. It has since been out-quicked and out-dirtied by newer languages and tools such as Perl.

A sample:

 OUTPUT = 'Hello World!'
END

This is the genealogy of the programming language Snobol:

Snobol is a child of Comit.
Snobol was born in year 1962.
It became Snobol 2 in year 1964.
It became Snobol 3 in year 1965.
It became Snobol 4 in year 1967.
Then it begat Icon in year 1970.

This genealogy is brought to you by the Programming Languages Genealogy Project. Please send comments to thbz.

My first exposure to Snobol 4 was in the spring of 1977 when a professor at the University of Alberta hired me for a summer job and put me to work writing a program to keep track of student grades (quick reminder: a "grade" is what you get when the course is over whereas "marks" are what you get for the individual assignments and exams during the course). He gave me a Snobol 4 manual and off I went. I guess that you could say that my first Snobol 4 program was 13,000 lines long (it wasn't a pretty sight and it was easy to see how I'd grasped more and more of the essence of the language as it grew). Nothing ever came of the grades program. I got it working but I don't think it was ever used for anything meaningful. Such is the fate of most student summer job projects.

I went on to use Snobol 4 as my tool for writing GLPs (Grungy Little Programs) and some larger projects as well. My personal favourite was YAMP which stood for Yet Another Marks Program. YAMP came into existence very quickly for a simple reason: I was the head lecturer for a 300 student course and when I tried to enter the marks for the first two or so assignments into the "approved" marks program, the "approved" program collapsed. It was simply not up to the task.

So here it was, about five weeks into a twelve week term and I had three hundred students and the instructors in the other sections asking me when I was going to make any interim marks available (spreadsheets hadn't been invented yet). Well . . . to make a long story short, two weeks later I had a marks program called YAMP that did the key parts of what I wanted it to do and by the end of the term, it did everything that I needed it to do. It wasn't a pretty sight (sadly, few Snobol programs are) but it did what it needed to do well and it was customizable in all sorts of ways (it even had a fairly elaborate report-writer). Truth be told, YAMP was a 7,000 line GLP (see above).

Word got around and it wasn't too long before it seemed like most of the department was using YAMP (this is when I learned the simple yet sobering fact that what is important is how a system behaves as practically nobody looks at it closely enough to ever understand or appreciate how it works or what sorts of risks they're taking relying on the system - sigh!). YAMP's origins, carefully kept secret by me for the last twenty years and being revealed here in public for the first time, started to cause me a lot of grief. There were subtle bugs in YAMP which were hard to find (YAMP probably used every Snobol 4 feature ever documented and accidentally used a few which weren't documented). In addition, it needed new features and some of these features really required pretty major structural changes to the program.

In the end, I managed to keep YAMP running for a few years. It finally died a gentle death after I left the University (there was nobody in the department who knew Snobol well enough to maintain it, a fact which is more of a condemnation of me and YAMP than of anyone in the department!).

Snobol 4 (and it's cousin Spitbol) and I parted company in the mid-1980s when I left the University. By this time, I'd written around 200,000 lines of Snobol and/or Spitbol and can claim to have become fairly proficient in it. There may be better languages available today for the tasks that I used Snobol for back then but the delta between them and Snobol 4 isn't as large as might appear at first glance! In fact, some of the new kids on the block seem to be actually starting to catch up! (grin)

The standard reference for Snobol 4 is

The Snobol 4 Programming Language second edition; by Griswold, R. E., Poage, J. F., and Polonsky; Copyright © 1971, 1968 Bell Telephone Laboratories, Incorporated; published by Prentice-Hall Inc.
Unfortunately, it has been out-of-print for at least twenty years and there is zero chance that you'll ever talk anyone into lending you their copy!

P.S. I've still got one tool that I use at least a few times a day which is partially written in Snobol 4.


Snobol 4B

I once ran into a variant of Snobol 4 called Snobol 4B. The B stood for blocks, a concept which was an elegant if somewhat twisted combination of two-dimensional character strings and the anchored connectors supported by some modern drawing packages.

I can distinctly remember sitting down with the Snobol 4B manual and working out in my head how to implement the features as I read about them. By about half way through the description of the blocks feature, my mental model of how to implement it had collapsed. I continued reading until the end of the section and then I sat there contemplating what I'd read. It sure would have been fun to spend a few weeks playing with the language (I had a compiler and interpreter available) but I just never got the chance.

If I can find my Snobol 4B manual, I'll produce a writeup on the language. Unfortunately, I only wrote a couple of tiny programs in it and just don't remember enough about the blocks feature to be able to write something coherent without the manual.


Sample program

Here's a sample Snobol 4 program. There's sample data at the end along with the output which is produced by the sample data.

Enjoy!

*
* Sample Snobol 4 program
*
* This program builds a linked list of each of the
* unique words in the text read from stdin.
* The words are also maintained in a table to make it
* easy to spot duplicates.
* When the end of the input is reached, the words are
* printed out in order of appearance along with a count
* of how many were found.
*
* This program is intended to demonstrate Snobol 4
* features.  It is not intended to be particularily
* useful.  It also deliberately violates a number of
* fairly standard Snobol 4 style rules in order to
* illustrate certain language features (the violations
* are noted in the program's comments).
*
* A few points before we begin:
*
*    1. keep an open mind as Snobol 4 is almost certain
*       quite unlike any other programming language
*       that you've ever seen.
*
*    2. remember that Snobol 4 is also a language from
*       a different era.  Many of the things which we
*       take for granted today, like the "knowledge"
*       that gotos are bad, weren't taken for granted
*       in the 1960s when Snobol was "growing up".
*       Edsger W. Dijkstra wrote his famous letter,
*       GO TO Statement Considered Harmful, to the
*       ACM in 1968 (one year after Snobol 4 was
*       finalized).
*
*       N.B. I'm not suggesting that gotos are good.
*       Just keep in mind that it really was a different
*       era.
*
*    3. Snobol 4 is quite a powerful language with
*       many aspects which can't be illustrated in
*       a program as short as this one.  Deciding what
*       you think of Snobol 4 and its abilities based
*       on this sample is like reading a page or two
*       at random from a novel.  You don't get much
*       context and you've no idea how the novel's
*       story plays out.
*
*    4. Snobol 4 is a dead language in the same sense
*       that Latin is a dead language.  i.e. there
*       are still folks out there who use the language
*       in various ways.  Check out www.snobol4.com for
*       more information (I'm not affiliated with
*       www.snobol4.com in any way).
*
*    5. Snobol 4 programs run in an interpretive
*       environment.  This tends to make them a fair
*       bit slower than compiled languages like C and
*       FORTRAN.
*
*       The only meaningful benchmark is the time which
*       elapses between when a question is asked and
*       when an answer is obtained.  The actual time
*       that it takes to run the final program is often
*       a rather small portion of the total time between
*       the question and the answer.
*
*    6. Snobol 4 comes from the era of punched cards.
*       One consequence of this is that Snobol 4 ignores
*       the case of everything except the contents of
*       quoted strings.  I've written this sample in
*       lower case.  It would have been more traditional
*       if I'd have written it in UPPER CASE but people
*       today find that rather hard to read.
*
*    7. Snobol 4 was a testbed for new ideas.  Some of
*       these ideas, like the near total lack of control
*       structures, didn't work out.  Others, like the
*       approach to pattern matching, were incredibly
*       powerful.
*
* Bias alert:  I've written somewhere on the order of 200,000
* lines of Snobol in my career.  I wouldn't have used it
* this much if it didn't have certain redeeming features!
* Snobol and I enjoyed the time that we spent together (well,
* at least I did!).
*
* Three major Snobol 4 language characteristics to keep in mind:
*
*    1. Snobol 4's strengths are in the areas of string
*       manipulation and data structures.  Only a small
*       part of these two aspects will be apparent in this
*       sample.
*
*    2. Snobol has exactly two control structures - an
*       unconditional goto and a conditional goto.
*       This results in, shall we say, some pretty unstructured
*       code sometimes.  I've tried to keep the structure
*       of the code in this sample reasonably well structured
*       although the lack of control structures does make that
*       rather difficult at times.
*
*    3. Snobol has no declarations.  Consequently, everything
*       that the user's program needs to define gets defined
*       at runtime.

* Shall we begin?

*
* Execution begins with the very first line of the Snobol 4
* source file and continues from there.

*
* Set the statement execution limit to a really big number.
* By default, a Snobol program is terminated if it executes
* more than 50,000 statements.  &stlimit is an example
* of a Snobol keyword.  The various keywords are used to
* manage certain runtime parameters.
* There's nothing special about 999999999 other than that
* it is pretty big.

        &stlimit = 999999999

*
* Let's create a datatype to hold the words and their
* usage counts.  This uses the built-in "data()" function
* to create the new datatype on-the-fly.  Since Snobol 4
* has no declarations, this is the only way to create a
* new datatype.
*
* The new datatype will be called "wordnode" and will
* have a "word" field, a "count" field and a "next" field.
* Since Snobol is a weakly typed language, we don't need
* to specify a type for these fields.

        data("wordnode(word,count,next)")

*
* Now we need a table to hold the words in.  We'll use the
* built-in TABLE() function to create a new table which we'll
* assign to the "wordtable" variable.

        wordtable = table()

*
* Define a function to process a single word from the input.
* This uses the built-in define() function which defines
* a new function type.  The function will be called "doword".
* It will take the "word" as a parameter.  The function will
* need a local variable called "tmp".
*
* Once we've defined the function, we need to skip around the
* body of the function as we certainly don't want to execute it
* right now.  The ":(skip.doword)" at the end of the "define()"
* line is an unconditional goto to the "skip.doword" label.
*
* The body of "doword" starts on the label "doword" (i.e. the same
* label as the name of the function).
*

        define("doword(word)tmp")                    :(skip.doword)

doword
*
* Map the word to upper case.
* Note the use of a + in column 1 to continue the previous statement
* onto a new line.

        word = replace( word, "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
+                             "abcdefghijklmnopqrstuvwxyz" )

*
* Get the word's wordnode out of the wordtable.
* tmp will be NULL if we've never seen this word before.

        tmp = wordtable<word>

*
* Create a new wordnode and insert it at the end of the
* linked list if one doesn't already exist.
*
* There's a LOT going on here.  The first key point is the call to
* "ident()".  The "ident()" built-in function checks if the TWO
* parameters that it is given are identical.  Since we've only
* provided one parameter, Snobol supplies a NULL pointer as the
* second parameter which means that we're checking if "tmp" is
* NULL.
* If the two parameters are identical then "ident()" returns a
* NULL pointer which gets concatenated with our word.
* Concatenating NULL with any string yields the string (i.e.
* this concatenation doesn't do anything.
*
* Things get MUCH more interesting if the two parameters to "ident()"
* are NOT identical.  In this case, the "ident()" function call
* fails.  A failed function call is NOT like a runtime error.
* It is just an indication that what you tried to do either didn't
* work or was incorrect or false.  If any part of a Snobol statement
* fails then the entire statement fails which, in this case, means
* that the call to "wordnode()" never happens and "tmp" is not
* assigned to.
*
* Now, if the "ident()" call worked then we create a new "wordnode"
* by calling the "wordnode()" function (which was implicitly defined
* when we created the "wordnode" datatype above).  If we created a
* new "wordnode" then we need to add it to the end of the linked list
* of "wordnode"s.  What we do is conditionally go to donode.not.new.word
* if the statement failed.  Remember, if the "ident()" call fails
* then the entire statement fails.  There's nothing else in this
* statement which can fail so checking if the statement failed is
* the same as checking if the "wordnode" already existed.
*
* As an aside, Snobol has runtime errors and runtime errors are
* fatal (i.e. if your program causes a runtime error then it dies).
* I'll point out a potential runtime error shortly.
*
* The parameters to "wordnode" are the initial values to assign to
* each of the "wordnode" fields.  We'll be lazy and not bother to
* even specify values for the "count" and "next" fields (Snobol
* will provide NULL values for these fields since we left them out).

        tmp = wordnode( ident(tmp) word )       :f(donode.not.new.word)

*
* We just created a new "wordnode" - add it to the "wordtable" table
* and put it at the end of the linked list.  Adding it to the
* "wordtable" table is easy.  The linked list is a bit more work.
* We have two global variables, "wordlist" and "wordlist_last".
* We maintain "wordlist_last" to point at the most recently added
* "wordnode" in the list (i.e. the last one in the list).  If this
* is the first "wordnode" that we've ever seen then both "wordlist"
* and "wordlist_last" will be NULL (because they've never been
* assigned a value and every identifier starts out life NULL).
*
* Failing to initialize the "wordlist" and "wordlist_last" global
* variables to NULL as a reminder that they exist was a dirty move.
* I did it to emphasize that initializing them isn't necessary.
* That said, initializing them is very important from a style
* perspective.
*

* Add the new "wordnode" to the table.
*
* Note that if wordtable isn't actually associated with a table or
* an array then trying to treat it like a table or an array by using
* the <...> construct would result in a runtime error
* (i.e. the program would die at this point).

        wordtable<word> = tmp

* Use the "ident()" trick again to see if we've every seen
* a word before.

        ident(wordlist)                         :f(donode.not.first.word)
*
* This is the first word ever - make "wordlist" and "wordlist_last"
* point at it and we're done.
*
* Note that except for calls to "define()" (see above), I always put
* unconditional gotos on their own line.  This makes it easier to add
* new statements immediately before the unconditional goto (there is
* never a need to put anything between a "define()" call and the
* subsequent unconditional goto so I put them together on the same line).

        wordlist = tmp
        wordlist_last = tmp
                                                :(donode.done.first.word)

*
* This isn't the first word ever - add it to the end of the list.

donode.not.first.word
        next(wordlist_last) = tmp
        wordlist_last = tmp

* We jump to here when we're done handling the very first word.

donode.done.first.word

* We jump to here when we're dealing with a word that we've seen before.
* Even though both of these 'heres' are essentially the same place, I use
* separate labels since it maintains a vague structure (i.e. I try to structure
* my Snobol code to at least resemble the if-then-else-fi structure found
* in programming languages that have real control structures). 

donode.not.new.word

*
* Done dealing with new "wordnode"'s.
*
* tmp now references the "wordnode" for the current word.
* Increment the word count.
*
* There's a tiny bit of magic (and stupidity) going on here.  We
* didn't provide initial values for "count" or "next" when we created
* the "wordnode" above.  If the increment below is being done on a
* virgin "wordnode" then the call to "count(tmp)" to the
* immediate left of the "+" operator will return NULL.  If one of the
* operands of an arithmetic operation is NULL then Snobol provides
* a value of 0 which is exactly what you want.  I said that this is
* both "magic" and "stupid".  The magic is that this whole approach
* to handling NULLs can come in handy from time to time.  The "stupid"
* is that we should have explicitly initialized "count" to 0 when we
* created the "wordnode".

        count(tmp) = count(tmp) + 1

*
* We're done.  Return to the caller.  The value of this function will
* be the last value assigned to the variable whose name is the same
* as the function's name.  Since we never assigned anything to "doword",
* the return value will be NULL. In "real life", it might make sense
* to set "doword" to "tmp" or just use "doword" instead of "tmp".
* This would make the return value of "doword" be the "wordnode"
* for the word which was passed in as a parameter.
*
* (seems reasonable to me but we're not going to bother).
*
* This function call worked so we'll do a normal return by branching
* to the special "return" label.  If we wanted the function call to
* fail (see discussion of what failure means above), we'd return by
* branching to the "freturn" label.  You can bail out of a function
* at any time by branching to "return" or "freturn".

                                                            :(return)

* Now we need the label that we jumped to when we defined the "doword"
* function.  Here it is.
*
* IMPORTANT: we get here during program startup immediately
* after executing the "define()" call that "defined" the "doword" function.

skip.doword

*
* Here's the main loop.
*
* We read input lines by referencing the special "input" variable.
* The reference will fail when the end of the input is reached.

mainloop
        line = input                                    :f(done)

*
* We've got an input line.
* Strip off words and pass each one to "doword".
* Jump back up to mainloop when there are no more words.
*
* A bit of pre-processing of "line" will help things along here.
* If we put a blank at the front and at the end then the pattern
* to identify words is much simpler.
*
* The next line concatenates a blank with line and then concatenates
* the result with another blank.

        line = ' ' line ' '

* "letters" is just all the letters in the alphabet in both lower
* and upper case.  Defining it across two lines makes it really
* obvious if we missed a letter by accident in one of the lines.
        letters = 'abcdefghijklmnopqrstuvwxyz'
+                 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

*
* Now remove everything from line that is neither a letter or a blank.
* There are better ways to do this but this will do (and the better
* ways open up issues that I'd rather leave closed).
*
* This is our first pattern matching statement.
* It looks through "line" for the first character which isn't a letter
* or a blank (because we concatenated "letters" with ' ' to construct
* the argument).  If it finds one then it replaces it with a blank.
* If the pattern match succeeds then the statement succeeds and we
* spin back to rmpunc to hit the next one.  If the pattern match
* fails then we just continue.

rmpunc
        line notany(letters ' ') = ' '          :s(rmpunc)

* We need a pattern that matches a sequence of letters.
* We could do this on the fly but style says we should do it here
*
* (in an optimized version of Snobol called Spitbol, pre-defining
* patterns can make a dramatic difference to how fast the program
* runs).
*
*
* This pattern matches a blank followed by one or more letters followed
* by a blank.  "span()" is aggressive in the sense that it always
* matches as many characters as possible.  If the "span()" part of
* the pattern works then what it matches will be saved in the
* "newword" variable.
*
* In "real life", this pattern would probably be defined before "mainloop"
* to avoid any cost inside the "mainloop" loop (it isn't very expensive
* so we're not going to worry about it).

        word_pattern = ' ' ( span(letters) . newword )  ' '

*
* We're ready to roll.
*
* Spin through the line matching words.  If we find a word then
* it will be remembered in "newword" so we replace it with a blank.
* If the match failed then there are no more words so we jump back
* "mainloop" to get the next input line.  If it worked, hand the word
* to "doword()" and spin back for the next word in the input line.

wordloop
        line word_pattern = ' '                         :f(mainloop)
        doword(newword)                                 :(wordloop)

*
* We'll get here when there's no more input (see mainloop label above).

done

*
* Run down the linked list and print out all the words along with a
* count of how often they appeared.
* The "rpad()" function pads the first parameter on the left with
* spaces to make it at least as long as the length specified by
* the second parameter.  It returns the first parameter unchanged
* if it is already long enough or too long.

        tmp = wordlist
doneloop
        ident(tmp)                                      :s(alldone)
        output = lpad(count(tmp),10) ' ' word(tmp)
        tmp = next(tmp)                                 :(doneloop)

*
* We are (almost) all done.
alldone

*
* Let's say goodbye the hard way, shall we?
* Pay attention, I'm only going to say this once . . .
*
*   1. set "mysource" to a string containing some Snobol code
*      which will print out "Goodbye!" and then jump to the
*      end label (where all good Snobol programs go to terminate
*      normally).
*
        mysource = "doit output = 'Goodbye!' :(end)"
*
*   2. invoke the compiler (at runtime here folks) to compile
*      the code segment we just wrote.  Put the compiled code
*      into "mycode".  if "code()" fails then there's a syntax
*      error (finding it is left as an exercise for the writer).
*
*      If the "code()" function worked then immediately jump
*      to the "doit" label that we defined in the source code
*      string (see the assignment to "mysource" above).
*
*      This particular Snobol 4 feature definitely takes the
*      notion of self-modifying code to new heights!

        mycode = code(mysource)                         :s(doit)
*
* If we get here then the compilation attempt failed.
* Mumble something and terminate anyways by walking into the
* "end" label.
        output = "Hmph!"
*
* The "end" label marks the end of the program.
end

Sample data

Here are some words.
Even more words.
But not very many words.
After all, we don't want to run out of words.
Do we?

Output for the sample data

Here's the output which is produced if this sample program is run using the sample data provided above. This output was produced on 2002/10/04 using Snobol 4 running under Linux on an Athlon system. See www.snobol4.com if you're interested in running Snobol 4 on your system.
         1 here
         1 are
         1 some
         4 words
         1 even
         1 more
         1 but
         1 not
         1 very
         1 many
         1 after
         1 all
         2 we
         1 don
         1 t
         1 want
         1 to
         1 run
         1 out
         1 of
         1 do
Goodbye!

Log in or register to write something here or to contact authors.