To expand a bit the above Jargon File definition, awk is a language based on the pattern-action concept.

An awk program is a sequence of couples of (pattern,action). The pattern is an egrep-like regular expression, the action is a block of code.

The pattern is matched against the input stream; when the pattern matches, the corresponding action is executed. Awk, in fact, is based on the assumption that you are reading in some sort of input file that you want to maul in a creative form.
You are also given the two magic patterns BEGIN and END, whose action is executed respectively before attempting to open the input strem and after the EOF comes in.
Awk by default breaks the input line into variables corresponding to fields. The variables are named $1, $2, $3 ... $NF. The variable $0 (like $_ in perl) contains the whole input line.

To give you an idea of a typical awk program, I will include a short fragment of code for generating HTML formatted Editor Log reports.:

	# formatreport.awk
	# tested only with GNU awk

BEGIN	{"date" | getline; print "<P ALIGN=RIGHT><I>" $0 "</P>";

	# the command date is executed and read into the input stream
	# and subsequently printed using $0 between P tags.

	print "<UL>";

	# The field separator is specified as the Tab character
	# awk by default separates fields on whitespace

/^$/	{next;}
	# skip completely empty lines

/^^#/	{print "<LI><B>" $1 "</B> by " $2 " because: " $3;
	if ($4) print "(NB: " $4 ")";}

	# format as list items input lines, decorated with formatting
	# if there is a fourth field, print it out as a pedantic NB

/^#/	{notes = notes "<BR>" substr($0,2);}
	# if a line starts with a #, accumulate it into a variable
	# like in perl, no need to declare or initialize a variable
	# and variables are all strings anyway.

END	{print "</UL>";
	print "<P>" notes "</P>"}

	# No more input lines: close the UL, spew out the notes

This would be invoked like this:

gawk -f formatreport.awk <todaysmoans.txt >pastemenow.html

This other awk fragment will print out all the words that end in "ough" (on many Unix systems), ready for pasting in the chatterbox. Notice that the concatenation operator is -quite amusingly- the space character.

awk '/ough$/ {t = t "[" $0 "] "} END {print t;}' </usr/dict/words

Weaknesses of awk

... especially when compared with PERL.

  • Weak code modularity: no way of having nice libraries like you can do in PERL.
  • Forcibly high level: scarcity of bit-level operators.
  • Weak handling of binary data
  • a certain lack of standardization: there is more than one awk implementation, and they differ in crucial points.

Strengths of awk

awk is a rather small and efficient language. In fact, I have seen one distribution of Linux designed for fitting on a floppy where many of the typical UNIX utilities had been rewritten as short awk scripts.
awk was designed to write small programs, typically filters, that do their job well (you can find some examples in the excellent awk writeup by vyrus). If you find yourself going beyond the 100 line limit, maybe you should have written the program in another language.

thanks to sakke and ponder for useful observations