Common problem encountered when using
yacc(1) (or its
GNU-decendant,
Bison) to generate a
parser for a given computer
language.
To understand what is going on here, it is necessary to have some understanding
of how yacc-generated parsers work. The yyparse() function
generated by yacc attempts to reduce a string of "tokens" read from a
source file down to one single token, according to a set of "rules" supplied by
the user in a manner not dissimilar to Backus-Naur form. To do this, it
creates a "stack" onto which tokens can be placed, and proceeds according to
the following algorithm:
- Read a token from the file and store it in the variable yychar
(this is called the "look-ahead" token)
- If there is a rule by which the top n tokens on the stack can be
reduced down to a single token, and there is neither a rule which requires the
look-ahead token nor the possibility of a syntax error should the reduction
occur, perform the reduction, and any semantic actions associated with it.
- Repeat step 2 until no further reductions can be performed.
- Shift the look-ahead token onto the top of the stack.
- Return to step 1, and repeat until end of file.
(actually, that's a bit of an oversimplification, but it'll do for the purposes
of this writeup.)
A shift/reduce conflict occurs at step 2 of the above process, when it is
ambiguous whether or not a reduction is required. For instance, consider the
following hypothetical fragment from a grammar for the C language:
if_statement : IF '(' expression ')' statement_list
| IF '(' expression ')' statement_list ELSE statement_list
;
In other words, an
if_statement consists of the keyword "
if", an
expession in brackets, and a list of statements, followed optionally by an
"
else" keyword and another list of statements (assuming that
statement_list has been defined elsewhere as either a single statement
or several statements between braces). Then consider the following ambiguous,
but
ANSI-legal, code fragment:
if (x>y) if (x<z) foo(); else bar();
The parser will read "if (x>y)" and expect a statement as the next token. It
finds another "if" and reads this in, followed by the rest of the statement up
to the "else". It then faces the problem of deciding whether to reduce "if (x<z)
foo();" to an if_statement and then shifting the "else", or of the "else" should
be shifted first: both would be equally valid under the grammar as stated, but
reduction would result in this:
if (x>y) {
if (x<z) {
foo();
}
} else {
bar();
}
associating the "else" with the first "if", whereas shifting the "else" produces
if (x>y) {
if (x<z) {
foo();
} else {
bar();
}
}
with the "else" associated with the second "if".
Sadly such problems are
nigh-on impossible to prevent totally (the c-parse.y file found in the
GCC source produces 51 such conflicts when processed with Bison 1.28), but the
default behaviour of both yacc and bison is to shift, as this
produces the correct result in C and similar languages, and print a warning to
stderr that a possible conflict exists. Examination of the
y.output file produced should then help locate the conflict and
enable easy altering of the source file if this is the wrong behaiviour.