This is the genealogy of the programming language awk:

awk is a child of C.
awk was first known as awk in year 1978.
It became nawk in year 1985.
Then it begat Perl in year 1987.

This genealogy is brought to you by the Programming Languages Genealogy Project. Please send comments to thbz.

To expand a bit the above Jargon File definition, awk is a language based on the pattern-action concept.

An awk program is a sequence of couples of (pattern,action). The pattern is an egrep-like regular expression, the action is a block of code.

The pattern is matched against the input stream; when the pattern matches, the corresponding action is executed. Awk, in fact, is based on the assumption that you are reading in some sort of input file that you want to maul in a creative form.
You are also given the two magic patterns BEGIN and END, whose action is executed respectively before attempting to open the input strem and after the EOF comes in.
Awk by default breaks the input line into variables corresponding to fields. The variables are named $1, $2, $3 ... $NF. The variable $0 (like $_ in perl) contains the whole input line.

To give you an idea of a typical awk program, I will include a short fragment of code for generating HTML formatted Editor Log reports.:

	# formatreport.awk
	# tested only with GNU awk

BEGIN	{"date" | getline; print "<P ALIGN=RIGHT><I>" $0 "</P>";

	# the command date is executed and read into the input stream
	# and subsequently printed using $0 between P tags.

	print "<UL>";

	# The field separator is specified as the Tab character
	# awk by default separates fields on whitespace

/^$/	{next;}
	# skip completely empty lines

/^^#/	{print "<LI><B>" $1 "</B> by " $2 " because: " $3;
	if ($4) print "(NB: " $4 ")";}

	# format as list items input lines, decorated with formatting
	# if there is a fourth field, print it out as a pedantic NB

/^#/	{notes = notes "<BR>" substr($0,2);}
	# if a line starts with a #, accumulate it into a variable
	# like in perl, no need to declare or initialize a variable
	# and variables are all strings anyway.

END	{print "</UL>";
	print "<P>" notes "</P>"}

	# No more input lines: close the UL, spew out the notes

This would be invoked like this:

gawk -f formatreport.awk <todaysmoans.txt >pastemenow.html

This other awk fragment will print out all the words that end in "ough" (on many Unix systems), ready for pasting in the chatterbox. Notice that the concatenation operator is -quite amusingly- the space character.

awk '/ough$/ {t = t "[" $0 "] "} END {print t;}' </usr/dict/words

Weaknesses of awk

... especially when compared with PERL.

  • Weak code modularity: no way of having nice libraries like you can do in PERL.
  • Forcibly high level: scarcity of bit-level operators.
  • Weak handling of binary data
  • a certain lack of standardization: there is more than one awk implementation, and they differ in crucial points.

Strengths of awk

awk is a rather small and efficient language. In fact, I have seen one distribution of Linux designed for fitting on a floppy where many of the typical UNIX utilities had been rewritten as short awk scripts.
awk was designed to write small programs, typically filters, that do their job well (you can find some examples in the excellent awk writeup by vyrus). If you find yourself going beyond the 100 line limit, maybe you should have written the program in another language.

thanks to sakke and ponder for useful observations

avatar = A = B5

awk /awk/

1. n. [Unix techspeak] An interpreted language for massaging text data developed by Alfred Aho, Peter Weinberger, and Brian Kernighan (the name derives from their initials). It is characterized by C-like syntax, a declaration-free approach to variable typing and declarations, associative arrays, and field-oriented text processing. See also Perl. 2. n. Editing term for an expression awkward to manipulate through normal regexp facilities (for example, one containing a newline). 3. vt. To process data using awk(1).

--The Jargon File version 4.3.1, ed. ESR, autonoded by rescdsk.

The thing I most frequently see awk used for is in pipelines -- one-liners, rather than true scripts. One great way to use awk is to split up variable-length fields delimited by some character, or by whitespace (the default). For example, the following:

who | awk '{print $1}'
prints a list of users currently logged on the system. In a longer pipeline, it can be more useful:
who | awk '{print $1}' | sort | uniq
This sorts the list of users alphabetically, then removes the duplicates. (The sort is a prerequisite for the uniq.)

The single-quotes around the awk statement are necessary to escape it from the shell. The curly braces with nothing before them indicate an awk statement which is to be executed for every line of the input file (stdin, in this case); the print $1 prints the first field on each line.

Fields are not always delimited by whitespace, however; the password file, for example, uses colons. Not to worry; awk -F changes the delimiter.

cat /etc/passwd | awk -F: '{print $1}'
This will print a list of all users who exist on the system; it simply prints the first field of each line of the password file, where fields are delimited by colons, not whitespace.

Awk is very convenient just for all the programs of the form

something | awk '{something else}'
which perform some operation unconditionally on every line of the first program's output (or a file). This power is increased a hundredfold by another simple addition: regular expressions. I will not get into the gory details of constructing one here, but it is extremely powerful to write a program of the form
something | awk '/expression/ {something 1}; {something 2}'
which will do "something 1" to matching lines and "something 2" to all lines. For example, to comment out any line in a perl program containing the word 'excrement', the following awk one-liner suffices:
cat program.old | awk '/excrement/ {$0 = "#" $0}; {print $0}' | cat >
OR, more succinctly,
awk '/excrement/ {$0 = "#" $0}; {print $0}' < program.old >
(NB: $0 represents the whole line.)
This has the effect of going, line-by-line, through program.old and printing out each line, but for matching lines, first prepending a "#". It looks funny, but try it -- it works.

Getting into the realm of Programs That Shouldn't Really Be One-Liners, we find many uses of awk in pipelines which are occasionally useful, but more often just fun to write. Most of these involve BEGIN and END expressions. Without wasting too much more of your precious time, a BEGIN expression is written 'BEGIN {something}' and an END expression is written 'END {something}'. Note the similarity to regular expression lines -- BEGIN matches before the program starts, and END matches after it is done. This allows things like:

cat file | awk 'BEGIN {lines = 0}; {lines++}; END {print lines}'
which is a fancy line-counter, not useful as such, but unique in that it is the first program we have seen thus far which keeps internal state in the form of the lines variable. The BEGIN is not strictly necessary, which is often the case; END, OTOH, is very useful for summarizing results, such as line counts, word counts, and the like. Now go write the hardest awk one-liners you can think of! It's a great mental exercise, and if you like a challenge, you'll enjoy it. If you are really interested, read the awk manpage for more ways to match lines (before the {}) and more ways to manipulate them (inside the {}). I have only scratched the surface.

Awk (?), a. [OE. auk, awk (properly) turned away; (hence) contrary, wrong, from Icel. ofigr, ofugr, afigr, turning the wrong way, fr. af off, away; cf. OHG. abuh, Skr. apac turned away, fr. apa off, away + a root ak, ak, to bend, from which come also E. angle, anchor.]


Odd; out of order; perverse.



Wrong, or not commonly used; clumsy; sinister; as, the awk end of a rod (the but end).




Clumsy in performance or manners; unhandy; not dexterous; awkward.

[Obs. or Prov. Eng.]


© Webster 1913.

Awk, adv.

Perversely; in the wrong way.



© Webster 1913.

Log in or register to write something here or to contact authors.