flex is the "fast lexical analyzer generator". It's based upon lex
, but can generate much faster lexical analyzers
. That's great, except that it doesn't actually explain anything. First, what is a lexical analyxer?
A lexical analyzer (also called a parser, tokenizer, or scanner) takes an input stream and breaks it up into smaller parts called tokens. It does this by doing pattern matching on the stream. Each pattern that the parser looks for has a rule to go along with it. When the parser finds a certain pattern, it does whatever the rule says.
flex has its own language for the definition of the scanner. You create this definition in the language, which is sort of like a bastardized version of C. Then, you run flex on the file and it generates a C program that you can compile. After those various steps, you've got a program that takes an input stream and looks for patterns in it.
Now you're wondering what good that is. Well, breaking a text stream up into parts is very useful for a stack calculator, the parser part of a compiler, or creating your own specialty languages. flex is a good thing, because it allows you to very rapidly create a parser
Check out the flex man page for lots of good examples. Here's a quick example of writing the Unix program wc in flex:
int num_lines = 0, num_chars = 0;
\n ++num_lines; ++num_chars;
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
Everything before the first %% is regular C code. Here, we just declare a few variables. It can also contain #include directives, struct definitions, or whatever else you want. Everything between the first %% and the second %% are pattern/rule pairs. Each line is a regular expression, then whitespace for seperation, then some C code for the rule. More than one line of C means you have to put braces around all the code. Everything after the second %% is more C. You can have functions that are called from the rules, code that gets exectued before and after the lexer is run, and more. Use your imagination!