Saturday, 12 March 2022

Data Compression KCS064

Data Compression [KCS064]

Blow up Data Compression KCS064

Basic Probability concepts Learn Probability using simple games :

Click here

Huffman Coding Notes

Attempt ALL parts. All parts carry equal marks. Write answer of each part in short. [2 X 10 = 20]

What is a data compression and why we need it ?
Explain compression and reconstruction with the help of a block diagram.
State early examples of data compression.
Differentiate between static length and variable length coding schemes. Explain with the help of examples.
Based upon the requirements of reconstruction how data compression techniques are broadly classified. Explain these classifications in brief.
What are the measures of performance of data compression algorithms?
What is average information? What are the properties used in measure of average information.
How modeling and coding are related? Explain with the help of examples.
What do you understand by Markov model? Discuss the role of markov models in text compression.
Describe the steps for a test for unique decodability.

SECTION B

Attempt any FIVE questions from this section. [10 X 5 = 50]

Write Huffman coding algorithm. How this algorithm is used to design Huffman code for a source that takes latter from an alphabet.
Explain minimum variance Huffman code and encoding procedure taking a suitable example.
What do you understand by length of Huffman code and how it is defined?
Explain update procedure for the adaptive Huffman coding and encoding algorithm/ flowchart.
How rice code can be viewed? Explain the implementation of the rice code in the recommendation for loss less compression from the consultative committee on space data standards.
Explain Tunstall codes with the help of example.
Discuss the application of Huffman coding in different areas.
How a tag is generated in arithmetic coding.

SECTION C

Note: Attempt any TWO questions from this section. [15 X 2 = 30]

Compare Huffman and Arithmetic coding.
Explain the effect of dictionary size and the size of the text file being encoded on the amount of compression.
Discuss generic compression scheme with the help of block diagram. What are the distortion criteria for Lossy coding?

Data Compression Techniques

Why do data compression? What is it? Trying to reduce the amount of data sent. What is the tradeoff?

Suppose that we wanted to transfer a 20 Mb file to another machine. Would we really need to send 20 Mb of data? If the file consisted entirely of the letter “A”, we could send the letter “A”, followed by a count of the number of times it appears in the file, and have the receiver regenerate the file.

There are three general approaches to data compression. Each approach assumes that the data stream can be transformed into a more compact representation, which the receiver reconstructs back into the original data.

Approach 1: Finite Set of Symbols

Consider a library with many branch offices in which the previous days transactions are sent to every other branch after closing. Transactions consist of checked out and returned books. We could exchange information in the following ways:

We could send the name of the book, its author, the copy number, etc. together with the type of transaction.
Alternatively, the library could maintain a sitewide table assigning a unique ID number to every book in every branch. Transactions could then refer to the book’s ID number, rather than its title. Because book IDs are small (e.g. a few bytes), less data will be transmitted.

Note: The above technique is used throughout programming. We frequently exchange pointers and array subscripts to avoid the cost of transferring large amounts of data between subroutines.

The previous approach assumes that all objects occur with equal frequency, and that the set of objects (e.g., books) is finite. If we examine text, however, we immediately notice that some words appear more often than others. We could reduce the number of bits needed to represent a document by using a coding scheme that employs small code words to represent common words and longer code words to represent words the appear infrequently.

Approach 2: Huffman Encoding

Huffman encoding is a technique used to encode symbols according to the frequency of their use. The algorithm is as follows:

Create a set of nodes, one node per symbol, with a node’s value given by the probability of its occurrence in the data.
Find the two nodes having the smallest value, remove them from the set, and create a new node having the two removed nodes as children, and assign the new node a value that is the sum of its children’s values. Add the new node back to the set of nodes.
Repeat step 2 until only one node remains. We now have a tree, whose probability value is one.
The encoding for each symbol is the path from the root to the symbol. Using a code of 0 for a left child, 1 for a right child, the length of each symbol’s encoding is proportional to the relative probability of its occurrence.

One drawback with Huffman encoding, however, is that symbols have differing lengths, making it relatively expensive (computationally) to decode. Also, A single-bit error can wipe out the entire message.

Approach 3: Context Dependent Encoding

The last technique, context dependent encoding, recognizes that the probability of a particular symbol occurring next depends on the previous symbol. For instance, the probability that a “T” directly follows a “Q” is about 4 times less than the probability of a “U” following a “Q”.

The main disadvantage of conditional probability methods is the increase in table space. Each symbol has its own table that gives the codes for those symbols immediately following it. For K symbols, the tables will contain K² entries. All K symbols have K entries for the symbols that follow them.

One variation to this technique is as follows:

Use 5-bit code to represent symbols.
Have four different “modes”, where the current mode determines the meaning of symbols. Four symbols are reserved to denote a switch to another mode. For instance, different modes could represent upper-case characters, lower-case characters, numeric and special characters, and control characters.
The underlying assumption is that lower-case letters are likely to follow lower-case letters, numbers are likely to occur in groups, etc.
The occurrence of a mode symbol signifies a mode change.
Our 5-bit code now represents 4 × 28 = 112 different values.

One advantage that the above technique has over Huffman encoding is that symbols are all

fixed length, making encoding and decoding using table lookups very efficient. In addition, it is more immune to transmission errors.

Run Length Encoding

Another alternative, run length encoding, is used to encode data containing repetitive symbols. Consider binary strings of 0s and 1s. One way to handle long runs of 0 is to use a k-bit symbol that tells how many 0 bits occurred between consecutive 1s. A code word of all 1’s means that the true distance is 2^K − 1 plus the value of the following symbol(s). For example:

00010000101001100000000000000000000000100000001 (47 bits)

consists of runs of length 3, 4, 1, 2, 0, 23, and 7. Using 4-bit symbols, it would be encoded as:

0011 0100 0001 0010 0000 1111 0100 0111, for 32 bits and

a savings of 15/47 = 30 .

Using 3-bit symbols, it would be encoded as:

011 100 001 010 000 111 111 111 010 111 000 for 33 bits.

Yet another variation is to encode the difference between the current value and the previous value, as was used to reduce the number of bits needed in pulse coded modulation (PCM).

(1) Attempt all questions.

(2) All questions carry equal marks.

1. Attempt any four of the following :

(a) What is data compression and why we need it ? Explain compression and reconstruction with the help of block diagram.

(b) State early examples of data compression.

(d) Based upon the requirements of reconstruction how data compression techniques are broadly classified. Explain these classifications in brief.

(e) What are the measures of performance of data compression algorithms.

(f) What is average information ? What are the properties used in measure of average information.

2. Attempt any four of the following :

(a) How modeling and coding are related ? Explain with the help of examples.

(b) What do you understand by Markov model. Discuss the role of markov models in text compression.

(d) Write Huffman coding algorithm. How this algorithm is used to design Huffman code for a source that takes latter from an alphabet set A = , 021 CJ31 CJ41 (I5 }.

(e) Explain minimum variance Huffman code and encoding procedure taking a suitable example.

(f) What do you understand by length of Huffman code and how it is defined ?

3. Attempt any four of the following :

(a) Explain update procedure for the adaptive Huffman coding and encoding algorithm/ flowchart.

(b) How rice code can be viewed ? Explain the implementation of the rice code in the recommendation for loss less compression from the consultative committee on space data standards.

(d) Discuss the application of Huffman coding in different areas.

(e) How a tag is generated in arithmetic coding.

(f) Compare Huffman and Arithmetic coding.

4. Attempt any two of the following :

(a) Design and implement a diagram coder for text files of interest to you :

(i) Explain the effect of dictionary size and the size of the text file being encoded on the amount of compression.

(ii) Use the diagram coder on files that are not similar to the ones you used to design the diagram coder. How much this effect your compression.

(b) Discuss the steps involved in Basic Algorithm for Prediction with Partial Match. (PPM)

(c) Discuss generic compression scheme with the help of block diagram. What are the distortion criteria for Lossy coding ?

5. Attempt any two of the following :

(a) What is quantization ? Explain additive noise model of a quantizer. Differentiate between scalar quantization and vector quantization. What are the advantages of vector quantization over scalar quantization ?

(b) Explain uniform and non uniform quantization with further classifications.

1. a) Write down the steps involved in Haffman coding and also write down its merits and demerits.

b) Apply the Shannon – Fano algorithm for the seven source symbols given below with their probabilities 6 543210 S ,S,S,S,S,S,S with mobabilities , 10.0,15.0,15.0,20.0,25.0 05.0,10.0 respectively. Find out the average code word length and entropy of the source.

2. a) Explain the principle of arithmetic coding with the help of suitable example.

b) Decode the sequence 257 2592582567067 by the LZW decoding algorithm.

3. a) Define companding and explain the -law compression with example.

b) Explain backward ADPCM with neat block diagram.

4. a) Write a short note on frequency domain coding.

b) Explain MPEG encoder and decoder in detail.

5. a) Give the comparison between JPEG and JPEG – 2000.

b) Write a short note on video compression.

(1) Attempt all questions.

(2) All questions carry equal marks.

1. Attempt any four of the following :

(a) What is data compression and why we need it ? Explain compression and reconstruction with the help of block diagram.

(b) State early examples of data compression.

(d) Based upon the requirements of reconstruction how data compression techniques are broadly classified. Explain these classifications in brief.

(e) What are the measures of performance of data compression algorithms.

(f) What is average information ? What are the properties used in measure of average information.

2. Attempt any four of the following :

(a) How modeling and coding are related ? Explain with the help of examples.

(b) What do you understand by Markov model. Discuss the role of markov models in text compression.

(d) Write Huffman coding algorithm. How this algorithm is used to design Huffman code for a source that takes latter from an alphabet set A = , 021 CJ31 CJ41 (I5 }.

(e) Explain minimum variance Huffman code and encoding procedure taking a suitable example.

(f) What do you understand by length of Huffman code and how it is defined ?

3. Attempt any four of the following :

(a) Explain update procedure for the adaptive Huffman coding and encoding algorithm/ flowchart.

(b) How rice code can be viewed ? Explain the implementation of the rice code in the recommendation for loss less compression from the consultative committee on space data standards.

(d) Discuss the application of Huffman coding in different areas.

(e) How a tag is generated in arithmetic coding.

(f) Compare Huffman and Arithmetic coding.

4. Attempt any two of the following :

(a) Design and implement a diagram coder for text files of interest to you :

(i) Explain the effect of dictionary size and the size of the text file being encoded on the amount of compression.

(ii) Use the diagram coder on files that are not similar to the ones you used to design the diagram coder. How much this effect your compression.

(b) Discuss the steps involved in Basic Algorithm for Prediction with Partial Match. (PPM)

(c) Discuss generic compression scheme with the help of block diagram. What are the distortion criteria for Lossy coding ?

5. Attempt any two of the following :

(b) Explain uniform and non uniform quantization with further classifications.

Attempt ALL.

1. For which types of data adaptive Huffman coding is prefer over Huffman code? Design Huffman code using Huffman tree for the following symbols.

Symbol A B C D E F

Count 30 15 10 8 5 2

2. Explain LZ77 with suitable example also Discuss the modification carried out in LZSS compare to LZ77.

3. What is data compression? Define lossless and lossy data compression differentiate between these two. Explain modeling and coding with suitable example.

4. What is Quantization? Explain Vector and Commanded Quantization with respect to it. Also state two types of Quantization error. Explain Arithmetic coding with a suitable example.

5. What is Delta modulation & differential PCM? Explain in depth.