context-free grammar

created by xdc
(idea) by xdc (8.3 y) (print)   (I like it!) Sun Nov 14 1999 at 9:05:44
A type of grammar first described by linguist Noam Chomsky that is used for describing programming languages.

The grammar consists of rules such as "a sentence is a noun phrase followed by a verb phrase" rendered as S -> NP VP. Another rule might be "a noun phrase consists of an article followed by zero or more adjectives followed by a noun."

source: Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists, by Dennis Shasha and Cathy Lazere

(thing) by rp (1.7 hr) (print)   (I like it!) 1 C! Thu Jun 22 2000 at 16:49:16
Formally, a context-free grammar is a finite set of rules. Each rule consists of a symbol (called the left hand side), an arrow (for cosmetic purposes only), and a sequence of symbols (called the right hand side). There are two kinds of symbols: terminals and nonterminals. The nonterminals are those symbols that occur at the left hand side of rules.

In general discussions about context-free grammars, it is customary to use lowercase letters to denote terminals and uppercase letters to denote nonterminals. For example,


  A -> aAb
  A ->

This is a context-free grammar with one nonterminal (A), and two rules, with 3 and 0 symbols on the right hand side, respectively.

The purpose of grammars is to describe languages, that is, sets of possible sequences of symbols. Context-free grammars describe languages in a generative way: pick a nonterminal (in this case, A) and apply the rules until there are no nonterminals left, and you have a sequence of (terminal) symbols; do this in every possible way, and you have a set of such sequences. This set can be infinitely large.

The example grammar, for instance, describes the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's".

A context-free language is a language that can be described by a context-free grammar. Not all languages are context-free: the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's followed by an equal number of 'c's" isn't.

What context-free grammars are basically capable of doing is expressing languages with 'nested' structures, where items on the left hand must be paired off with items on the right in a systematically nested way. This is considered to be a basic feature of natural languages.

The formalism was invented in the late 50s as a way to express the structure of natural language, and a way to define the structure of computer programming languages. But it is not powerful enough to describe either of them in full.

(thing) by dmagoo (4.9 y) (print)   (I like it!) 1 C! Wed Oct 24 2001 at 23:39:24

Context Free Grammars:
A Context Free Grammar is a 4-tuple.
( V , SIGMA , R , S )
V = Set of Variables or Symbols
SIGMA = The Alphabet (terminals)
R = Set of Rules
S = Start Rule

A Rule is of the form:
Symbol -> String
Where 'Symbol' is a member of V and 'String' is a string over the set V U SIGMA. An example rule would be:
S -> aBa
This can be read as 'S may be replaced with aBa'
By convention Symbols are capitalized and terminals are not.
It is possible to have two rules that share a Symbol on the left. For instance, consider the two rules:
S->bSb
S->aBa
In a case such as this, S can be replaced with either one of {bSb,aBa}
There is a shorthand for this.
S->bSb | aBa

If a Language L is a Context Free Language iff there is some Context Free Grammar, G, that for every string w in L, there is a derivation of w in G. A derivation is a sequence of substitutions (as permitted by the set of rules) to obtain a string.

Example(pg.92):
A CFG Gis defined by:

V = {A,B}
SIGMA = {0,1,#}
R = {
      A -> 0A1 ,
      A -> B ,
      B -> #
     }
S = A


0#1 is a string in the Language of G
This is because, starting with A, there is a sequence of substitutions that will end with the string 0#1
A derivation is of 0#1 is described as follows:
  • Start with A
  • Look in the set of rules,
    Find an A on the left-hand side of a rule. There are two possible substitutions
  • We will replace A with the right-hand side of one of the rule s.
    Our A is now 0A1
  • It is time for another replacement. 0 and 1 are terminals. They are nonreplacable. Remember SIGMA and V are disjoint set s.
    The only thing to replace is the A.
  • This time we replace A with B by the second rule from our set.
    This leaves us with the string 0B1.
  • B is the only symbol (or variable) and there is only one rule for replacing B.
    This leaves us with the string 0#1


We can write the above sequence of substitiutions as a flat derivation, in the following way:

A => 0A1 => 0B1 => 0#1

This derivation is read as:
A yeilds 0A1, which yeilds 0B1, which yeilds 0#1

A flat derivation for the string 000#111 is explained on page 92 of Michael Sipsers An Introduction to the Theory of Computation and is written as follows:

A => 0A1 => 00A11 => 000A111 => 000B111 => 000#111

(idea) by Cogito (2.8 y) (print)   (I like it!) Mon Nov 12 2001 at 20:11:27

The formal definition of a context-free grammar (CFG) is a 4-tuple (V, Σ, R, S), where

  1. V is a finite set of variables
  2. Σ is a finite set of terminals
  3. R is a finite set of rules of the form {A s | A V, s (V Σ)*}
    (variable string of variables and terminals)
  4. S V is the start variable

Let u, v and w be elements of (V Σ)* (strings of variables and terminals). If A w is a rule of the grammar (A R), then uAv yields uwv. This is generally written uAv uwv. Furthermore, one can write u ⇒* v if u = v or there is a sequence u1, u2,,uk such that k ≥ 0 and u1 u2 uk v. One says that v is derivable from u.

The language of a grammar is {w Σ* | S ⇒* w}


Introduction to the Theory of Computation (ISBN 0-534-94728-X) was consulted in the creation of this writeup.

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.