Discete cosine transform (DCT)
What is DCT?
The DCT is a mathematical operation that transform a set of data, which is sampled at a given simpling rate, to it's frequency components. The number of samples should be finite, and power of two for optimal computation time.
One-Dimensional DCT
A one-dimensional DCT converts an array of numbers, which represend signal amplitudes at various points in time, into another array of numbers, each of which represents the amplitude of a certain frequency componey from the orginal array. The resulting array of number contains the same number of values a sthe orignal array. The first element in the result array is a simple average of all the samples in the input array and is refered to as DC coefficient. The remainint elements in the result array each indicate the amplitude of a specif frequency component of the input arrya, and are known as AC coefficents. The frequency content of the sample set at each frequency is calculated by taking a weighted average of the entire set. These weight coefficients is like a cosine wave, whose frequency is proportional to the result array index.

|
Result\Sample |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
0 |
+0.707 |
+0.707 |
+0.707 |
+0.707 |
+0.707 |
+0.707 |
+0.707 |
+0.707 |
|
1 |
+0.981 |
+0.831 |
+0.556 |
+0.195 |
-0.195 |
-0.556 |
-0.831 |
-0.981 |
|
2 |
+0.924 |
-0.383 |
-0.383 |
-0.924 |
-0.924 |
-0.383 |
+0.383 |
+0.924 |
|
3 |
+0.831 |
-0.195 |
-0.981 |
-0.556 |
+0.556 |
+0.981 |
+0.195 |
-0.831 |
|
4 |
+0.707 |
-0.707 |
-0.707 |
+0.707 |
+0.707 |
-0.707 |
-0.707 |
+0.707 |
|
5 |
+0.556 |
-0.981 |
+0.195 |
+0.831 |
-0.831 |
-0.195 |
+0.981 |
-0.556 |
|
6 |
+0.383 |
-0.924 |
+0.924 |
-0.383 |
-0.383 |
+0.924 |
-0.924 |
+0.383 |
|
7 |
+0.195 |
-0.556 |
+0.831 |
-0.981 |
+0.981 |
-0.831 |
+0.556 |
-0.195 |
The following is the procedure of transforming spatial domain of pixel to DCT domain:

The steps of transformation likes the above figure, the DC coefficient is calculated by summing up all the weighted value in the sample array, wherethe weighting coefficient is chosen depend on what frequency coefficient is calculating.
Here is the equation for one-dimensional DCT:
coming soon !!
Here is the equaltion for one-dimensional weighting coefficient:
coming soon !!
Two-Dimensional DCT
The two-dimensional DCT converts the two-dimensional spatial domain to two-dimensional DCT domain.

The two-dimensional DCT uses the fundamental operation of one-dimensional DCT, it assumes 8x8 array of pixels is a eight rows of eight pixels. thus one-dimensional DCT is applied separately to each row of eight pixels, the result will be eight rows of frequency coefficients. These eight coefficients are then taken as eight column, the first column will contain all DC coefficients, the second column will contain the first AC coefficient from each row, and so one. The important thing to note about this arrangement is that even though the array represents frequency imformation in horizontal direction it still represents spatial information in the vertical direction. Therefore, the one-dimensional DCT can again be applied to the columns individually. The frequency distribution is shown as below.

Finally, each element of the resulting two-dimensional array of frequency components will then represent a two dimensional frequency component. The element in the upper-left corner is the DC coefficient for the entire two-dimensional array, and all the remaining coefficients contain frequency information.
Here is the two-dimensional DCT equation:
coming soon !!
After converting the input block of 8x8 pixels into an 8x8 block of DCT coefficients, the quantization is applied to the DCT block. The lower frequency region takes minimum quantization, the higher frequency region takes maximum quantization. Since the accuracy of DCT coefficient will be reduced by quantization, the degree and the shape of quantization is design for different picture type (Intra or NonIntra) due to not cause visible damage to the reconstructed video.

The MPEG syntax allow using separate weighting matrix, one for intra and one for NonIntra, to suit their frequency characteristic in picture type.
|
Intra |
NonIntra |
|||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 16 | 19 | 22 | 26 | 27 | 29 | 34 | 16 | 17 | 18 | 19 | 21 | 23 | 25 | 27 | |
| 16 | 16 | 22 | 24 | 27 | 29 | 34 | 37 | 17 | 18 | 18 | 21 | 23 | 25 | 27 | 29 | |
| 19 | 22 | 26 | 27 | 29 | 34 | 34 | 38 | 18 | 19 | 20 | 22 | 24 | 26 | 28 | 31 | |
| 22 | 22 | 26 | 27 | 29 | 34 | 37 | 40 | 19 | 20 | 22 | 24 | 26 | 28 | 30 | 33 | |
| 22 | 26 | 27 | 29 | 32 | 35 | 40 | 48 | 20 | 22 | 24 | 26 | 28 | 30 | 32 | 35 | |
| 26 | 27 | 29 | 32 | 35 | 40 | 48 | 58 | 21 | 23 | 25 | 27 | 29 | 32 | 35 | 38 | |
| 26 | 27 | 29 | 34 | 38 | 46 | 56 | 69 | 23 | 25 | 27 | 29 | 31 | 34 | 38 | 42 | |
| 27 | 29 | 35 | 38 | 46 | 56 | 69 | 83 | 25 | 27 | 29 | 31 | 34 | 38 | 42 | 47 | |

Zigzag ordering is to convert a two-dimensional array to a one-dimensional sequence run-length coding. The MPEG-2 stardard introduced a new run-length entropy scanning pattern (on the right hand side), it is more efficient for the interlaced video signal.
|
-22 |
17 |
1 |
1 |
0 |
0 |
0 |
0 |
|
-21 |
-15 |
1 |
0 |
0 |
0 |
0 |
0 |
|
11 |
3 |
-4 |
0 |
0 |
0 |
0 |
0 |
|
-3 |
2 |
2 |
0 |
0 |
0 |
0 |
0 |
|
0 |
-1 |
-1 |
0 |
0 |
0 |
0 |
0 |
|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Run-Amplitude pairs of above DCT block is as following:
|
Zigzap scanning |
Run |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Amplitude |
-22 |
-21 |
11 |
-3 |
17 |
-15 |
1 |
1 |
3 |
2 |
-1 |
2 |
-4 |
1 |
EOF |
||
|
Alternative scanning |
Run |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
5 |
0 |
3 |
|
|
Amplitude |
-22 |
17 |
-21 |
11 |
-15 |
1 |
1 |
1 |
3 |
-3 |
2 |
-4 |
2 |
-1 |
-1 |
EOF |
Huffman coding is a common technque to compress a set of data with a statistical methods. The technique is to use a short length code for representing high probability codes, and use short length code for lower probability one.
The Huffman encoding has following property:
Building a Huffman Code Table
To generate the Huffman code table, the procedures are as following:
Step 1 :
For Example:
|
Symbol |
Probability |
|---|---|
|
A |
0.39 |
|
B |
0.24 |
|
C |
0.18 |
|
D |
0.09 |
|
E |
0.07 |
|
F |
0.03 |
Step 2 : Combine two of the lowest probability symbol into an other symbol. The lowest probability in the list is assigned a bit value of one, the higher one is assigned a bit value of zero.
i.e. Combine E and F into M.
|
. |
Symbol | Probability |
|
. |
A |
0.39 |
|
. |
B |
0.24 |
|
. |
C |
0.18 |
|
(0)E 0.07 \ |
M |
0.10 |
| . |
D |
0.09 |
Step 3 : Repeat the step 2 recursively, until a tree form.
Combine M and D to form N.
|
. |
. |
Symbol |
Probability |
|
. |
. |
A |
0.39 |
|
. |
. |
B |
0.24 |
|
(0) E 0.07 \ |
(0) M 0.10 \ |
N |
0.19 |
|
. |
(1) D 0.09 / |
||
|
. |
. |
C |
0.18 |
Combine N and C to form O.
|
. |
. |
. |
Symbol |
Probability |
|
. |
. |
. |
A |
0.39 |
|
(0) E 0.07 \ |
(0) M 0.10 \ |
(0) N 0.19 |
O |
0.27 |
|
. |
(1) D 0.09 / |
|||
|
. |
. |
(1) C 0.18 |
||
|
. |
. |
. |
B |
0.24 |
Combine O and B to form P.
|
. |
. |
. |
. |
Symbol |
Probability |
|
(0) E 0.07 \ |
(0) M 0.10 \ |
(0) N 0.19 \ |
(0) O 0.27 \ |
|
|
|
. |
(1) D 0.09 / |
||||
|
. |
. |
(1) C 0.18 / |
|||
|
. |
. |
. |
(1) B 0.24 / |
||
|
. |
. |
. |
. |
(1) A |
0.39 |
Step 4: Example Sample Set with Huffman Codes
|
Symbol |
Probability |
Code |
|
A |
0.39 |
1 |
|
B |
0.24 |
01 |
|
C |
0.18 |
001 |
|
D |
0.09 |
0001 |
|
E |
0.07 |
00000 |
|
F |
0.03 |
00001 |
The compression ratio of Huffman encoding is totally depend on the source of data. If there are 10000 symbol , the orginal size is 10000bytes, but the encoded data size is 10000 * ( 0.39*1/8 + 0.24*2/8 + 0.18*3/8 + 0.09*4/8 + 0.07*5/8 + 0.03*5/8) = 2837.5. i,.e. the compression ratio of orginal to encoding is 1000 : 283.75.