To study compression, let us take an example. In English language, we write everything with 26 alphabets, some punctuation and spaces. In computer everything is represented by 0 and 1 called bits.
We can code A as 0001, B as 0010, C as 0011 and so on. But that is not elegant space saving and time saving method. Hence, we will adopt the coding devised by 'Huffman'.
Suppose the following is our message.
BCAADDDCCACACAC
1. LET US CALCULATE THE FREQUENCY OF EACH CHARACTER
B=1; C=6; A=5, D=3
2. SORT THE CHARACTERS IN INCREASING ORDER OF THE FREQUENCY.
B = 1, D = 3, A = 5, C = 6
3. We are going to create tree. First create an empty node. Assign lowest frequency character as left child of the node and next lowest frequency letter as right child of the node. Total the two frequencies and put it in the empty node.
4. Insert another top node into the tree. Here, one child is empty. Insert the next lower frequency in that space. Again, total the frequencies. Put it in the node.
5. Repeat the step for remaining letter C. Then, Assign 0 to the left edge and 1 to the right edge.
To find the code for any character travel from top; collect the bits, till you reach the character.
Character code
C 0
D 101
B 100
A 11
This is the brief, and minimum code to stand for our string. High frequency characters get shorter codes. Rarer character gets longer code.
To decode the bits, we must travel from bottom to the top through the tree. This is the one of the many methods available to compress the text. Once that is encoded in bits, volumes of books can be put in thumb nail sized silicon chips.
Comments
Post a Comment