Formal grammar

HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Privacy policy

In computer science a formal grammar is a way to describe a formal language, i.e., a set of finite-length strings over a certain finite alphabet. The basic idea behind these grammars is that we generate strings by beginning with a special start symbol and then apply rules that indicate how certain combinations of symbols may be replaced with other combinations of symbols. This is repeated until the result contains only symbols from the alphabet. The language of the grammar then consists of all the strings that can be generated that way.

One common classification system for grammars is the Chomsky hierarchy, a set of four types of grammars developed by Noam Chomsky in the 1950s. Every possible grammar is a member of at least one of these four classes. The most important result of that work was the proof that every language that can be accepted by a Turing machine can also be generated by some grammar, and vice versa. In this sense, grammars are universal.

Something about transformational grammars and generative grammars should be written here.

A formal grammar G consists of the following components:

  • A finite set N of nonterminal symbols.
  • A finite set Σ of terminal symbols that is disjoint from N.
  • A finite set P of production rules where a rule is of the form
(Σ U N)* -> (Σ U N)* where * is the Kleene star
with the restriction that the left-hand side of a rule, i.e., the part left of the ->, must contain at least one nonterminal symbol.
  • A symbol S in N that is indicated as the start symbol.

Usually such a formal grammar G is simply summarized as (N, Σ, P, S).

The language of a formal grammar G = (N, Σ, P, S), denoted as L(G), is defined as all those strings over Σ that can be generated by starting with the start symbol S and then applying the production rules in P until no more nonterminal symbols are present.

Consider, for example, the grammar G with N = {S, B}, Σ = {a, b, c}, P consisting of the following production rules

1. S -> aBSc
2. S -> abc
3. S -> ε
4. Ba -> aB
5. Bb -> bb

and the nonterminal symbol S as the start symbol. Some examples of the derivation of strings in L(G) are: (The used production rules are indicated in brackets and replaced part is each time indicated in bold.)

  • S -> (2) abc
  • S -> (1) aBSc -> (2) aBabcc -> (4) aaBbcc -> (5) aabbcc
  • S -> (1) aBSc -> (1) aBaBScc -> (2) aBaBabccc -> (4) aaBBabccc -> (4) aaBaBbccc -> (4) aaaBBbccc -> (5) aaaBbbccc -> (5) aaabbbccc

It will be clear that this grammar defines the language { anbncn | n > 1 } where an denotes a string of n a's.

See also: transformational grammar -- generative grammar -- Chomsky hierarchy -- regular grammar -- context-sensitive grammar -- context-free grammar