Question 1 (10 points) Consider a three-symbols alphabet with the specified probability of assignment shown below:
X,
|
P(X,)
|
a
|
0.70
|
b
|
0.25
|
c
|
0.05
|
Listed with the input alphabet are six binary code assignments.
Symbol
|
Code 1
|
Code 2
|
Code 3
|
Code 4
|
Code 5
|
Code 6
|
a
|
00
|
00
|
0
|
1
|
I
|
1
|
b
|
ll
|
01
|
1
|
10
|
01
|
00
|
c
|
11
|
10
|
11
|
100
|
11
|
01
|
(a) Scan these codes and determine which codes are practical (can be used for data compression application). Justify your answers.
(b) Design a Huffman code for the above three-symbols source alphabet shown above and find its code efficiency (i.e., compression efficiency).
(c) Design a Shannon-Fano code for the above three-symbol source alphabet shown above and find its code efficiency. Compare it with your answer in part (b).
(d) Can you suggest a technique to improve the code efficiency to achieve a greater compression ratio? Determine the code efficiency for your improved source coding method.
Question 2: Huffman Coding
In this problem, you will study the efficacy of Huffman source coding (data compression algorithm) on two different data sources. Generate two test data files data1.txt and data2.txt. Create the first test data file data1.txt such that it contains at least 20 characters (including spaces). Next create a second test data file data2.txt consisting of about 20 binary digits (i.e., digits 0 and 1).
(a) Suppose you wish to compress the data1.txt and data2.txt using Huffman source coding method. Find the compression efficiency and the average codeword length. Clearly show your final code design (i.e., codeword for each source symbol).
(b) Repeat part (a) but now consider at least 5000 characters and 5000 digits for data1.txt and data2.txt, respectively.
(c) Validate that your design and your program works fine (i.e., correct data encoding and decoding) for both of your test data files in parts (a) and (b). Compare your results with the compression efficiency attainable with "compress" routine in UNIX or with other data compression utilities (e.g., zip). Explain any interesting observations and/or trends from your results.
(d) Explain why Huffman coding is NOT used for data compression of practical data sources? In other words, why practical source compression utilities (e.g., zip, gnuzip, etc.) do not use Huffman coding method despite its optimality?
(e) Explain why it might be more advantageous to employ the Lempel-Ziv algorithm over Huffman coding method for digital multimedia (image/audio/video) data compression applications?