Lossless Compression of JPEG and GIF Files through Lexical Permutation Sorting with Greedy Sequential Grammar Transform based Compression Sajib Kumar Saha, Mrinal Kanti Baowaly†, Md. Rafiqul IslamҒ, Md. Masudur Rahaman* Khulna University/CSE, Khulna, Bangladesh, e-mail:
[email protected] /
[email protected] † Khulna University/CSE, Khulna, Bangladesh, e-mail:
[email protected] Ғ Khulna University/CSE, Khulna, Bangladesh, e-mail:
[email protected] * Khulna University/CSE, Khulna, Bangladesh, e-mail:
[email protected] Abstract— This paper provides a way for lossless Compression of Color Images through lexical permutation sorting (LPS) and Greedy Sequential Grammar Transform based compression. The proposed model adopts the advantages of lexical permutation sorting for Color Images to produce a permuted data. Greedy Sequential Grammar Transform based compression, which is basically a text compression technique can now be applied easily on that permuted data. For comparison, we have taken Inversion Coding of Burrows Wheeler Compression (BWIC), and Burrows Wheeler Compression (BWC) and the model proposed in ICCIT 2006( on a paper named ‘A New Approach for Lossless Compression of JPEG and GIF Files Using Bit Reduction and Greedy Sequential Grammar Transform’ ).
Keywords— LPS, BWC, BWIC. I. INTRODUCTION High quality color images with higher resolution and huge memory spaces are the favorite of modern people. Color image compression is an important technique to reduce the image space and retain high image quality. In this paper a compression technique for Color images is proposed based on LPS and greedy sequential grammar transform. When the underlying data to be transmitted is a permutation Π, LPS algorithms generates a cyclic group of order n with Ф(n) generators [3], where Ф(n) is Euler’s Ф function on n. Among the Ф(n) possibilities, one or more choices may be cheaper to communicate than the original permutation (where Burrows Wheeler Transform (BWT) produces only one permutation). When greedy sequential grammar transform algorithm works on that permuted data, the produced grammar becomes sort. II. LITERATURE SURVAY A. Lexical Permutation Sorting Before discussing the theoretical basis of LPS, we begin this section by giving an example Let p = [3,1,5,4,2] be a given permutation. Construct the matrix
N=
3 1 5 4 2
1 5 4 2 3
5 4 2 3 4
4 2 3 1 1
2 3 1 5 5
By forming successive rows of N which are consecutive cycle left-shifts of the sequence p. Let F be the first, S the second, and L the last column of N. By sorting the rows of N lexically, we transform it to
N′ =
1 2 3 4 5
5 3 1 2 4
4 1 5 3 2
2 5 4 1 3
3 4 2 5 1
This amounts to string N respect to the first column, i.e. applying a row permutation to N so that its first column becomes (1,2,3,4,5)T. The original sequence p appears in the 3rd row of N′. If the transmitter transmits the pair (i, S′) or (i, L′), then the receiver can reconstruct the original sequence p uniquely. For example, if (i, S′) is transmitted, the receiver constructs the original sequence p by using the following procedure as described in [3] Procedure p[1] = i; for (j = 2; j ≤ n; j++) p[j] = S′[p[j-1]]; Or, if (i, L′) is transmitted, the receiver constructs the original sequence p by using the following procedure [3] Procedure p[n] = L′[j]; for (j = 1; j < n-1; j++) p[n-j] = L′[p[n-j+1]];
Of course once we realize that L′ is the inverse of S′ as a permutation, the second procedure is seen to be equivalent to the one using S′. More generally, suppose that A is an alphabet of n symbols with a linear ordering. If Y is a data string with elements in A we denote by N(Y) the n × n matrix whose ith row is Y(i) , and by N′(Y) the matrix obtained by lexically ordering the rows of N(Y). Now according to lemma 3.1 in [3], let p be a permutation of degree n given in Cartesian form. Construct an n × n matrix N whose first row is p and whose each row is a left cyclic shift of the previous row. If Πj is the jth column of N, so that N = [Π1, Π2, …,Πn], then the result of lexically ordering the rows of N is the matrix N′ = [Π1-1Π1, Π1-1Π2, …, Π1-1Πn], where Π1-1Πj is the jth column of N′ in Cartesian form. When we need to emphasize the dependency of N and N′ on the input data permutation p, we write N(p) and N′(p) for N and N′ respectively. We continue to assume that p is a given permutation as in lemma 3.1 in [6]. Now according to theorem 3.1 we have if l = Π1-1Πn and Π = l -1 = Πn -1 Π1 , then p(i+1) = Π (p(i)) Knowledge of l = Π1-1Πn and p(1) allows us to recover p completely. Let the matrix N′ = N′(p) = (ti,j ). Note that if we interpret the second column of N′ as a permutation, we get
Θ=
1 t 1, 2 2 t 2, 2 . . . . . . n t n, 2
1
=
t 1, 2
2 …
n
t 2, 2
t n, 2
Procedure C0 = Cost (Θ); k0 = 1; for (k = 2; k < n; k ++) { ¥ = Θk ; if (Cost (Θ) < C0) k0 = k; } ∂t = Θko; t = Inverse (k0 , Θ(n)); Again according to theorem 3.2 of [3] if Y be a data string of length n, from a linearly ordered alphabet A of g distinct symbols, with lexical index permutation λ = λY . And If N′ = N′ (λ), then M′ i,j = Y′ [N′ i,j]; where M = N(Y) and M′ = N′(Y). Theorem 3.2 described in [3] establishes the connection between LPS and BWT. When the data to be transmitted is a permutation, then, in general LPS Algorithm will give better results than BWT, because we are able to select the least expensive generator ∂ Є ∆ , with an additional overhead of transmitting a single integer x, 1 ≤ x ≤ n, such that ∂x = Θ. This amount to an overhead bounded by (log n)/n. B. Grammar Based Compression Let x be a sequence from A which is to be compressed. A grammar transform converts it into an admissible grammar [1, 2] that represents x, then encoder works to encode that grammar as shown in fig.1. In this paper, we are interested particularly in a grammar transform that starts from the grammar G consisting of only one production rule s0 → x, and applies repeatedly reduction rules 1–5 proposed in [1] in some order to reduce into an irreducible grammar G’. Such a grammar transform is called an irreducible grammar transform. To compress x, the corresponding grammar-based code then uses a zeroorder arithmetic code to compress the irreducible grammar G’. After receiving the codeword of G’, one can fully recover G’ from which x can be obtained via parallel replacement. Different orders via which the reduction rules are applied give rise to different irreducible grammar transforms, resulting in different grammar-based codes.
But the image under Θ Є S0 of any index j can be found by taking any row, and looking at the next element in that row. Since rows are cyclic shifts of each other, it does not matter at which row we look. In particular, we could just use the first row, i.e. we have that Θ = (1, t 1, 2 , t 1, 3 ,… t 1,n). Hence Θ is a cycle of length n. Moreover, it is clear that the third column, interpreted as a permutation is simply Θ2 … , and in general the kth column is Θi-1. According to proposition 3.1 in [3] the columns of N′(p) form a cyclic group G of order n generated by Θ. Let ∆ be the set of symbols of N′ which are generators of the cyclic group < Θ >, then Θ as well as l = Θ-1 can be completely specified as an integer power of any one Input data Binary Grammar Context-free Arithmetic element of ∆. There are | ∆ | = Ф(n), generators of the transform coder grammar G X codeword cyclic group < Θ >, where Ф(n) is Euler’s Ф function of n. If n is prime, there are Ф(n) = n-1 generators of the cyclic group < Θ >. It is straightforward to obtain all elements of Fig. 1: Structure of a grammar based compression. ' ∆, since Θk Є ∆ if and only if the integer k, 0 < k < n, is relatively prime to n. Hence ∆ = { Θk | gcd(k, n) = 1, 0 < k < n}. III. NEW MODEL FOR LOSSLESS IMAGE COMPRESSION In particular the following procedure as defined in [3] The basic idea in this proposed method is to sort the allows us to determine a generator of least entropy ∂ for G, input data through first applying LPS and then apply t as well as the integer t such that ∂ = Θ. By a cost function greedy sequential grammar transform based compression in that procedure mean a function that can measure the to compress image file. entropy of the resulting data, after decomposition.
Source image
Compressed image
Lexical permutation sorting
Arithmetic decoder
Greedy sequential grammar transform
Reverse greedy sequential grammar transform
Arithmetic coder
Reverse lexical permutation sorting
Compressed image (a)
Source image (b)
Fig. 2: Proposed model (a) Compression (b) Decompression. A. Proposed Compression Technique The proposed compression algorithm consists of two phases: 1. Lexical Permutation Sorting. 2. Greedy sequential grammar transform based compression. Greedy Grammar Transform [1] Let x = x1x2 … xn be a sequence from A which is to be compressed. It parses the sequence x sequentially into non-overlapping substrings {x1, x2…xn2 , xn t-1 +1… xnt } and builds sequentially an irreducible grammar for each x1,…xn i where 1≤ i≤ t, n1 = 1 , and nt = n. The first substring is x1 and the corresponding irreducible grammar G1 consists of only one production rule s0 → x. Suppose that x1, x2…xn , xn i-1 +1… xn i have been parsed off and the 2 corresponding irreducible grammar Gi for x1, … xni has been built. Suppose that the variable set of Gi is equal to S(ji) ={s0,s1,…, sj -1} where j1 = 1. The next substring xn i+1…. xni+1 is the longest prefix of xni +1 …. xn i that can be represented by sj for some 0 <j < ji if such a prefix exists. i ni+1 = ni+1.If ni+1- ni Otherwise, xn i+1…. xn i+1 = xn i +1 with >1 and xn +1 … xn i is represented by sj, then append sj to i+1 the right end of Gi(s0); otherwise, append the symbol xni+1 to the right end of Gi(s0). The resulting grammar is admissible, but not necessarily irreducible. Apply reduction rules 1–5 proposed in [1] to reduce the grammar to an irreducible grammar Gi+1. Then Gi+1 represents x1 … xn i +1 Repeat this procedure until the whole sequence is processed. Then the final irreducible grammar Gt represents x. Since only one symbol from S(ji) U A is appended to the end of Gi(s0), not all reduction rules can be applied to get Gi+1.Furthermore, the order via which reduction rules are applied is unique. Encoding Algorithm In the sequential algorithm [1], we encode the data sequence x sequentially by using a zero-order arithmetic
code with a dynamic alphabet to encode the sequence of x2 … . parsed phrases x1, Specifically, we associate each symbol β € S U A with a counter c(β). Initially c(β), is set to 1 if β € A and 0 otherwise. At the beginning, the alphabet used by the arithmetic code is A. The first parsed phrase x1 is encoded by using the probability c(x1) / ∑ β € A c(β). Then the counter c(x1) increases by 1.Suppose that x1, x2…xni have been parsed off and encoded and that all corresponding counters have been updated. Let Gi be the corresponding irreducible grammar for x1,… xni . Assume that the variable set of Gi is equal to S(ji)={s0, s1, … ,sji -1} where
j1 = 1.
Let xni +1,…, xni-1 be parsed off as in our irreducible grammar transform and represented by β € {s1, …,sji -1} U A. Encode β and update the relevant counters according to the following steps: Step 1: The alphabet used at this point by the arithmetic code is {s1, …, sji-1}. Encode xn i+1,…, xn i+1 by using the probability c(β) / ∑ α € s(ji) U A c(α). (1) Step 2: Increase c(β) by 1. Step 3: Get Gi+1 from the appended Gi as in our irreducible grammar transform. Step 4: If ji+1 > ji , i.e., Gi+1 includes the new variable sj i, increase the counter c(s) by 1. Repeat this procedure until the whole sequence is processed and encoded. Note that c(s0) is always 0. Thus the summation over S(ji) U A in (1) is equivalent to the summation over {s0, …, sji-1}U A. From Step 4, it follows that each time when a new variable sji is introduced, its counter increases from 0 to 1. Therefore, in the entire encoding process, there is no zero-frequency problem. Also, in the sequential algorithm [1], the parsing of phrases, encoding of phrases, and updating of irreducible grammars are all done in one pass. Clearly, after receiving enough co-debits to recover the symbol β, the decoder can perform the update operation in the exact same way as does the encoder. B. Proposed Decompression Technique The proposed compression algorithm consists of two phases: 1. Greedy sequential grammar based decompression. 2. Reverse Lexical Permutation Sorting. IV. EXPERIMENTAL RESULTS We have taken some JPEG and GIF images as sample input to our proposed model. The images used in the experiment are shown in Fig. 3 and Fig. 5. It has been found that the quality of the final decompressed image is exactly the same as that of the original image as shown in Fig. 4 and Fig. 6.
(a) House
(b) Man 1
(c) Man 2
(d) Tree
Fig. 3: Some sample JPEG input files. Table I Comparison with BWIC, BWC, the model proposed in [5], and the proposed model for JPEG files Using the Using the Using Using File Name Original Proposed Model BWC BWIC Size Model Proposed in [5] House
(bytes) 51, 202
(bytes) 51, 214
(bytes) 51, 195
(bytes) 51,038
(bytes) 49,031
Man 1
39, 821
39, 339
39, 012
38,723
36,172
Man 2
2, 149
2, 201
2, 157
2,159
2,141
Tree
7, 494
7, 588
7, 498
7,452
7,141
Total
100, 666
100, 342
99, 862
99,372
94,485
(a) House
(a) Texture
(b) Man 1
(c) Man 2 Fig. 4: Decompressed JPEG Images.
(b) Advertisement (c) Coin Fig. 5: Some sample GIF input files.
(d) Tree
(d) Woman
Table II Comparison with BWIC, BWC, the model proposed in [5],and the proposed model for GIF files Using the Using the Using BWIC Using BWC Original Size File Name Proposed Model Model Proposed in [5] (bytes) Texture Advertisement Coin Woman Total
8, 714 65, 178 137, 566 164, 597 376, 055
(bytes) 8, 364 64, 911 132, 699 160, 482 366, 456
(bytes) 8, 566 64, 860 135, 129 162, 979 371, 534
(bytes) 8,579 64,545 132,186 159,063 364,373
(bytes) 8,157 58,154 119,116 133,061 318,488
(a) Texture
(b) Advertisement
(c) Coin
(d) Woman
Fig. 6: Decompressed GIF input files. Table III Comparison of time taken by the proposed model and the model proposed in [5] (Processor: Intel-Celeron, 1.6GHz; RAM: 256; Operating system: Windows XP) Compress time Decompress time File Name Original Size Using the Using the Using the Using the Model Proposed Model Proposed Proposed in [5] Model Proposed in Model [5] (ms) (bytes) (ms) (ms) (ms) House 51, 202 874 354 585 299 Man 1 39, 821 547 241 421 223 Texture 8, 714 102 61 101 61 Advertisement 65, 178 769 463 566 399 Coin 137, 566 1912 701 1511 688 Woman 164, 597 2550 1550 1952 1463
V. CONCLUSION This paper has aimed at providing a novel for applying LPS algorithm on JPEG and GIF files. Block sorting can also be used on the place of LPS as was used in the model proposed in [6], but LPS is suitable if the underlying data to be transmitted is a permutation[3] and here for JPEG and GIF files LPS works better that Block sorting. The results have shown that the proposed method achieves better compression ratio and takes reduced compression time.
[3]
Ziya Arnavut, Spyros S. Magliveras, Lexical Permutation Sorting Algorithm, The Computer Journal, Vol. 40, no. 5, October 1997.
[4]
Burrows, M. and Wheeler, D.J (1994), A block sorting Lossless Data Compression Algorithm, SRC Research Report 124, Digital System Research center, Palo Alto,CA,gatekeeper.doc.com, /pub/DEC/SRC/research-reports/SRC-124.ps.Z. Mr. Rafiqul Islam, Sajib Kumar Saha, Mrinal Kanti Baowlay, A New Approach for Lossless Compression of JPEG and GIF Files Using Bit Reduction and Greedy Sequential Grammar Transform, ICCIT 2006. Mr. Rafiqul Islam, Sajib Kumar Saha, Mrinal Kanti Baowlay, A Modification of Greedy Sequential Grammar Transform based Universal Lossless Data Compression, ICCIT 2006.
[5]
REFERENCES [1]
[2]
En-hui Yang and Da-Ke He , Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform—Part two: With context models, IEEE Trans. Inform. Theory, vol. 49, no. 11, November 2003. E.-h. Yang and J. C. Kieffer, Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform— Part one: Without context models, IEEE Trans. Inform. Theory, vol. 46, pp. 755–788, May 2000.
[6]