Context-free grammar

From Wikipedia

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

A context-free grammar is a formal grammar in which every production rule is of the form

V -> w

where V is a nonterminal symbol and w is a string consisting of terminals and/or non-terminals. The term "context-free" comes from the feature that the variable V can always be replaced by w, no matter in what context it occurs. A formal language is context-free if there is a context-free grammar which generates it.

Context-free grammars are important because they are powerful enough to describe the syntax of programming languages; in fact, almost all programming languages are defined via context-free grammars. On the other hand, context-free grammars are simple enough to allow the construction of efficient parsing algorithms which for a given string determine whether and how it can be generated from the grammar. See LR parser and LL parser for examples.


A simple context-free grammar is

S -> aSb
S -> ε

(where ε stands for the empty string). This grammar generates the language {anbn : n ≥ 0} which is not regular.

Here is a context-free grammar for syntactically correct infix algebraic expressions in the variables x, y and z:

S -> T + S
S -> T - S
S -> T
T -> T * T
T -> T / T
T -> (S)
T -> x
T -> y
T -> z

This grammar can for example generate the string (x+y)*x-z*y/(x+x).

A context-free grammar for the language consisting of all strings over {a,b} which contain a different number of a's than b's is

S -> U
S -> V
U -> TaU
U -> TaT
V -> TbV
V -> TbT
T -> aTbT
T -> bTaT
T -> ε

Here, T can generate all strings with the same number of a's as b's, U generates all strings with more a's than b's and V generates all strings with less a's than b's.

Chomsky Normal Form

Every context-free grammar can be transformed into an equivalent one in Chomsky Normal Form. "Equivalent" here means that the two grammars generate the same language. Because of the especially simple form of production rules in Chomsky Normal Form grammars, this normal form has both theoretical and practical implications. For instance, one can use the Chomsky Normal Form to construct for every context-free language a polynomial algorithm which decides whether a given string is in the language or not (the CYK algorithm).

Decision problems

It is not possible to construct a general algorithm which takes as input two context-free grammars and decides whether the two grammars generate the same language; nor is it decidable whether their languages have a single string in common. It is however possible to decide whether a given context-free grammar generates a non-empty language or not, and it is also possible to decide algorithmically whether a given context-free grammar generates an infinite language or not.

Properties of context-free languages

The union and concatenation of two context-free language is context-free; the intersection need not be. The reverse of a context-free language is context-free, but the complement need not be context-free. Every regular language is context-free because it can be described by a regular grammar. There exist context-sensitive languages which are not context-free. To prove that a given language is not context-free, one employs the pumping lemma for context-free languages. See the Chomsky hierarchy for the position of context-free languages in the hierarchy of all formal languages.