Part -1:
1) Give a CFG for the following language. - TOUGH ONE!!!! Just do your best and try to get as close to a working grammar as you can.
L = { w1cw2 | w1,w2 , ∑ == {a,b}*, w1 != w2 }
2) Give a CFG for the following language.
L = {w | w is of the form 0M1N0M+N }
3) Give a CFG for the following language. Hint: You will need to build the string from the outside in....
L = { ambncpdq | m+n >= p+q}
4) Draw a PDA (pushdown automata) using JFLAP for the language defined in question #2.
5) Use JFLAP to draw a PDA for the language defined in question #3.
6) Give a RIGHTMOST derivation of the string 000111000 using the following CFG. Use the ==> notation rather than a parse tree. A rightmost derivation means that at each/any step you replace only the rightmost Variable.
S → A | BZ
A → 0A0 | Y
B → 0B1 | l
Y → Y1 | 1
Z → 0Z | 0
7) Clearly define the language of the grammar given in question #6.
Part -2:
CHOMSKY NORMAL FORM GRAMMARS - CSC 485/585
The idea behind converting an arbitrary CFG G=(V,T,P,S) to a Chomsky Normal form Grammar is to eliminate issues/problems in the grammar that can prevent us from forcing all productions to be of the form
V → AB
Or
V → t ∈ T
A. The first step is to eliminate λ productions. The algorithm to do this is as follows:
0. While there remains a production of the form V → λ in P do the following:
1. Remove V → λ
2. Foreach occurrence of V on the right-hand side of a production add a new production with V deleted. For example, if we had A → aVb we would add A → ab or if we had A → aaVbVa we would add A → aabVa, A → aaVba and A → aaba. Note that if we had A → V we would add A → λ unless that production had previously been eliminated.
B. The second step is to eliminate any UNIT productions of the form V1 → V2.
0. While there remains a production of the form A → B, A,B ∈ V do the following:
1. Remove A → B.
2. For productions of the form B → x ∈ ( V U T ) * add A → x, unless it is a unit production previously removed.
At this point we have no empty or unit productions. The next step is to eliminate useless symbols. Useless symbols are defined as those that are non-terminating or unreachable. We begin by removing non-terminating (sometimes called non-generating symbols).
C. Elminating non-terminating symbols.
- TERM = ∅
- NEW = { A ∈ V | A → w ∈ ∑* }
- While ( TERM ≠ NEW ) do
3. TERM = NEW
4. NEW = TERM U { A | A→ w ∈ ( ∑ U TERM)* }
6. P = { V → x | V ∈ TERM, x ∈ ( ∑ U TERM)* }
D. Elminating unreachable symbols.
- REACH = ∅
- NEW = { S }
- while NEW ≠ ∅ do
3. ADDED = ∅
4. foreach V ∈ NEW do
5. foreach production V → yXz , X ∈ V AND X ∉ REACH
6. REACH = REACH U { X }
7. ADDED = ADDED U { X }
8. NEW = ADDED
9. P = { V → x | V ∈ REACH, x ∈ (REACH U ∑ ) * }
Finally, we have a grammar with no empty productions, no unit productions and no useless symbols. PLEASE NOTE - the order you perform these operations is important. Always follow step A, then B, and then C to prepare the CFG.
The first step in producing a Chomsky Normal Form (CNF) grammar from a grammar prepared as above is to add productions
Ca → a for all a ∈ T.
NOTE these are NOT unit productions!!!! Unit productions map a single VARIABLE to a VARIABLE not a terminal!!!.
Next, we replace all occurrences of terminal a on the right-hand side of a production with the new variable Ca (excluding of course the a in the newly added production).
For example, if our grammar was
S → AabB | aA
A → aBB | aa
B → bb | bAb
We would add productions Ca → a and Cb → b and transform the grammar into
S → ACaCbB | CaA
A → CaBB | CaCa
B → CbCb | CbACb
Ca → a
Cb → b
Now if you stop and think about it, you should realize that our grammar has only two types of productions. V → t ∈ T or V → X ∈ V*. RIGHT??!!??
The final step in producing the CNF grammar is to reduce right-hand sides of the productions with more than 2 variables to multiple productions with only 2.
For example, if we had the production A → BAB we would add a new production of the form V1 → BA and alter the previous production to A → V1B
Thus, our previous grammar would become
S → V2B | CaA
A → V3B | CaCa
B → CbCb | V4Cb
Ca → a
Cb → b
V1 → ACa
V2 → V1Cb
V3 → CaB
V4 → CbA
NOTE - the order that we attack the productions and the direction (left to right or right to left) can obviously impact the final look of the grammar but let's just agree that we will always start reducing with the left-most production of S and work our way down and that we will always replace pairs of variables from the left to the right - PLEASE????
OK - so finally we have a grammar whose productions are only of the appropriate form. So what? Well, this will be useful to use when we try to find something similar in nature to the pumping lemma we saw for regular languages.