LR parser

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

A method of parsing certain context-free grammars very commonly used by computer programming language compilers (and other associated tools). LR parsers read their input from Left to right and produce a Rightmost derivation (hence, LR). The term "LR(k) parser" is also used; here the k refers to the number of unconsumed "look ahead" input symbols that are used in making parsing decisions. Usually k is 1 and is often omitted. A context-free grammar is called LR(k) if there exists a LR(k) parser for it.

In typical use when we refer to an LR parser we mean a particular parser capable of recognising a particular language specified by a context free grammar. It is not uncommon, however, to refer to an LR parser meaning a driver program that can be supplied with a suitable table to produce a wide number of different particular LR parsers.

LR parsing has many benefits:

  • virtually all programming languages can be parsed using an LR parser (and if not usually some small variation will suffice).
  • LR parsers can be implemented very efficiently.
  • Of all parsers that scan their input left to right, LR parsers detect syntactic errors (that is, when their input does not conform to the grammar) as soon as it possible to do.

LR parsers are difficult to produce by hand; they are usually constructed by a parser generator tool. The most common algorithms for producing LR parsers are: Simple LR (SLR for short), Lookahead LR (LALR), and Canonical LR. Those three algorithms work successfully on an increasingly large set of grammars; LALR produces an LR parser for more grammars than SLR, Canonical LR works on more grammars than LALR. The program Yacc produces LR parsers using the LALR algorithm and is fairly popular.

The LR Parsing Algorithm

An LR(k) parser is a bottom-up shift-reduce table driven parser. The parser has a stack (on which it will store states) which starts with an initial state s0 on it. The parser proceeds by using the next k symbols of input and the state currently stored on the top of the stack to determine an action. Usually these actions are stored in a table, called the action table; naturally the table is indexed by state and symbol. The possible actions are:

shift sn - a new state is pushed onto the stack and the input symbol is consumed.

reduce A -> β - (A -> β will be one of the rules in the grammar specifying the language to be parsed) a number of states equal to the number of grammar symbols in β are popped off the stack revealing a new top state sp. A goto table is consulted and the state goto[sp, A] is pushed onto the stack. Any semantic actions associated with the rule A -> β are executed.

accept - the entire input has been consumed and found to be a member of the language specified by the grammar. Stop parsing.

error - the combination of state and input symbols is invalid, the input string cannot be reduced using the grammar.

Most of the time the parser is executing either a shift or a reduce which is why it is known as a shift-reduce parser.

The problem then is how to produce the two tables, the action and the goto table. That is essentially the concern of the three parser generator algorithms mentioned above.