The International Congress for global Science and Technology
ICGST International Journal on Graphics, Vision and Image Processing (GVIP)
GVIP Special Issue on Image Compression, 2007
www.icgst.com © ICGST, 2007
GVIP Journal ISSN Print 1687-398X ISSN Online 1687-3998 ISSN CD-ROM 1687-4005 © ICGST 2007
ICGST International Journal on Graphics, Vision and Image Processing (GVIP) A publication of the International Congress for global Science and Technology (ICGST)
Dr. Ashraf Aboshosha www.icgst.com.
[email protected] GVIP Committee Prof. Dr. x x x x x x x x x x x x x x x x
L. ALVAREZ LEON, Departamento de Informatica y Sistemas of Universidad de Las Palmas de Gran Canaria M. Bennamoun, School of Computer Science and Software Engineering , The University of Western Australia, Perth, Australia A. Farag, Director, Computer Vision & Image Processing Laboratory, CVIP Lab, University of Louisville, USA N. Alberto Borghese, Department of Information Science, University of Milan, Italy Christos Grecos, Dep.of Electronic and Electrical Engineering, Loughborough University, UK G. H. Granlund, head of the Division of Computer Vision, Department of Electrical Engineering, University of LINKOEPING L. Itti, iLab, University of southern California S. Mohammed, Department of Computer Science, Advanced Technology and Academic Centre, Lakehead University, Canada. P. Jonker, Quantitative Imaging Group, Department of Imaging Science and Technology, Faculty of Applied Sciences, Delft University of Technology, The Netherlands S.-W. Lee, Center for Artificial Vision Research, Dept. of Computer Science and Engineering, Korea University S. Loncaric, Head of Image Processing Group, Faculty of Electrical Engineering and Computing, University of Zagreb V. Roberto, Department of Mathematics and Computer Science, University of Udine M. Sarfraz, Information & Computer Science Dept., King Fahd University of Petroleum and Minerals, Saudi Arabia A. von Wangenheim, Dep. of Computer Sciences, Universidade Federal de Santa catarina, Brazil. M. Yeasin, Computer Science Department, Suny Institute of Technology (SunyIt) I. El Emary, Faculty of Engineering, Amman Al Ahliyya University, AMMAN, JORDAN
www.icgst.com Table of Contents Papers
Pages
P1150511002 R.Sudhakar and Ms R Karthiga and S.Jayaraman Image Compression using Coding of Wavelet Coefficients – A Survey
1--13
P1150513003 C. Ben Amar and O. Jemai Wavelet Networks Approach for Image Compression
15--23
P1150527001 G. K. Kharate and A. A. Ghatol and P.P. Rege Image Compression Using Wavelet Packet Tree
25--28
P1150528103 Bushra K. Al-Abudi and Loay A. George Color Image Compression Using Wavelet Transform
29--35
P1150535154 Noha A. Hikal and Roumen Kountchev A Method for Digital Image Compression with IDP Decomposition Based on 2DSOFM VQ
37--42
P1150619003 M. S. Anish Kumar and Rajesh Cherian Roy and R. Gopikakumari A New Image Compression and Decompression technique based on 8x8 MRT P1150627001 S.Esakkirajan and T.Veerakumar and V. Senthil Murugan and R.Sudhakar IMAGE COMPRESSION USING CONTOURLET TRANSFORM AND MULTISTAGE VECTOR QUANTIZATION
43--45
P1150630002 S. Anna Durai B.E.M.E and M.E1 and E. Anna Saro MCA and M. Phil An Improved Image Compression approach with Self-Organizing Feature Maps using Cumulative Distribution Function
57--65
P1150636005 K.Somasundaram and I.Kaspar Raj An Image compression Scheme based on Predictive and Interpolative Absolute Moment Block Truncation Coding
67--71
47--56
P1150651001 A.Vasuki and P.T.Vanathi Image Compression using Lifting and Vector Quantization
73--81
P1150712003 K.Veeraswamy and S. Srinivaskumar and B.N.Chatterji Designing Quantization Table for Hadamard Transform based on Human Visual System for Image Compression
83--90
GVIP Special Issue on Image Compression, 2007
Image Compression using Coding of Wavelet Coefficients – A Survey R.Sudhakar, Ms R Karthiga, S.Jayaraman Department of Electronics and Communication Engineering, PSG College of Technology Peelamedu,Coimbatore-641 004,Tamilnadu,India
[email protected],
[email protected],
[email protected]
wavelet transform alleviates blocking artifacts, while the multiresolution character of the wavelet decomposition leads to superior energy compaction and perceptual quality of the decompressed image. Furthermore, the multiresolution transform domain means that wavelet compression methods degrade much more gracefully than block-DCT methods as the compression ratio increases. Since a wavelet basis consists of functions with both short support (for high frequencies) and long support (for low frequencies), large smooth areas of an image may be represented with very few bits, and detail added where it is needed [27]. Wavelet-based coding [27] provides substantial improvements in picture quality at higher compression ratios. Over the past few years, a variety of powerful and sophisticated wavelet-based schemes for image compression, as discussed later, have been developed and implemented. Because of the many advantages, waveletbased compression algorithms are the suitable candidates for the new JPEG-2000 standard [34]. Such a coder operates by transforming the data to remove redundancy, then quantizing the transform coefficients (a lossy step), and finally entropy coding the quantizer output. The loss of information is introduced by the quantization stage which intentionally rejects less relevant parts of the image information. Because of their superior energy compaction properties and correspondence with the human visual system, wavelet compression methods have produced superior objective and subjective results [4]. With wavelets, a compression rate of up to 1:300 is achievable [22]. Wavelet compression allows the integration of various compression techniques into one algorithm. With lossless compression, the original image is recovered exactly after decompression. Unfortunately, with images of natural scenes, it is rarely possible to obtain error-free compression at a rate beyond 2:1 [22]. Much higher compression ratios can be obtained if some error, which is usually difficult to perceive, is allowed between the decompressed image and the original image. This is lossy compression. In many cases, it is not necessary or even desirable that there be error-free reproduction of the original image. In such a case, the small amount of error introduced by lossy compression
Abstract Due to the increasing traffic caused by multimedia information and digitized form of representation of images; image compression has become a necessity. New algorithms for image compression based on wavelets have been developed. These methods have resulted in practical advances such as: superior low-bit rate performance, continuous–tone and bit-level compression, lossless and lossy compression, progressive transmission by pixel, accuracy and resolution, region of interest coding and others. We concentrate on the following methods of coding of wavelet coefficients, in this paper: EZW (embedded zero tree wavelet) algorithm, SPIHT (set partitioning in hierarchical trees) algorithm, SPECK (Set Partitioned Embedded Block Coder), WDR (wavelet difference reduction) algorithm, and ASWDR (adaptively scanned wavelet difference reduction) algorithm. These are relatively recent algorithms which achieve some of the lowest errors per compression rate and highest perceptual quality Keywords: Image compression, PEG, Wavelets, EZW, SPIHT, SPECK, EBCOT, WDR, ASWDR, SFQ, CREW, EPWIC, SR, SFQ
1. Introduction Image compression is now essential for applications such as transmission and storage in data bases. In many different fields, digitized images are replacing conventional analog images as photograph or x-rays. The volume of data required to describe such images greatly slow transmission and makes storage prohibitively costly. The information contained in images must, therefore, be compressed by extracting only visible elements, which are then encoded. The quantity of data involved is thus reduced substantially. The fundamental goal of image compression is to reduce the bit rate for transmission or storage while maintaining an acceptable fidelity or image quality. One of the most successful applications of wavelet methods is transform-based image compression (also called coding). The overlapping nature of the
1
GVIP Special Issue on Image Compression, 2007
may be acceptable. Lossy compression is also acceptable in fast transmission of still images over the Internet [22]. Over the past few years, a variety of novel and sophisticated wavelet-based image coding schemes have been developed. These include Embedded Zero tree Wavelet (EZW) [13], Set-Partitioning in Hierarchical Trees (SPIHT) [1], Set Partitioned Embedded bloCK coder (SPECK) [2], Wavelet Difference Reduction (WDR)[28], Adaptively Scanned Wavelet Difference Reduction (ASWDR) [29] , Space –Frequency Quantization (SFQ) [42], Compression with Reversible Embedded Wavelet (CREW) [3], Embedded Predictive Wavelet Image Coder (EPWIC) [5], Embedded Block Coding with Optimized Truncation (EBCOT) [25], and Stack- Run (SR) [26]. This list is by no means exhaustive and many more such innovative techniques are being developed. A few of these algorithms are briefly discussed here. In Section 2, the preliminaries about wavelet transform of an image are discussed along with the performance metrics. The Section 3 gives an outline of the common features of the various wavelet based coding schemes. The salient and unique features of these schemes are summarized in Section 4.The Section 5 discusses about the unique features and the demerits of various coding schemes. The Section 6 concludes the paper.
A two-dimensional transform can be accomplished by performing two separate one-dimensional transforms. First, the image is filtered along the x-dimension using low pass and high pass analysis filters and decimated by two. Low pass filtered coefficients are stored on the left part of the matrix and high pass filtered on the right. Because of decimation, the total size of the transformed image is same as the original image. Then, it is followed by filtering the sub-image along the y-dimension and decimated by two. Finally, the image has been split into four bands denoted by LL, HL, LH, and HH, after one level of decomposition. The LL band is again subject to the same procedure. This process of filtering the image is called pyramidal decomposition of image [22].This is depicted in Fig. 1. The reconstruction of the image can be carried out by reversing the above procedure and it is repeated until the image is fully reconstructed.
(a)
(b)
2. Preliminaries Most natural images have smooth colour variations, with the fine details being represented as sharp edges in between the smooth variations. Technically, the smooth variations in colour can be termed as low frequency variations and the sharp variations as high frequency variations. The low frequency components (smooth variations) constitute the base of an image, and the high frequency components (the edges which give the detail) add upon them to refine the image, thereby giving a detailed image. Hence, the smooth variations are demanding more importance than the details. Separating the smooth variations and details of the image can be done in many ways. One such way is the decomposition of the image using a Discrete Wavelet Transform (DWT) [15], [23], [27]. Wavelets [23], [27] are being used in a number of different applications. The practical implementation of wavelet compression schemes is very similar to that of subband coding schemes [22], [21], [8]. As in the case of subband coding, the signal is decomposed using filter banks. In a discrete wavelet transform, an image can be analyzed by passing it through an analysis filter bank followed by a decimation operation. This analysis filter bank, which consists of a low pass and a high pass filter at each decomposition stage, is commonly used in image compression. When a signal passes through these filters, it is split into two bands. The low pass filter, which corresponds to an averaging operation, extracts the coarse information of the signal. The high pass filter, which corresponds to a differencing operation, extracts the detail information of the signal. The output of the filtering operations is then decimated by two.
Fig.1 (a) Three level octave-band decomposition of Saturn image, and (b) Spectral decomposition and ordering.
Quantization Quantization [22], [21] refers to the process of approximating the continuous set of values in the image data with a finite, preferably small, set of values. The input to a quantizer is the original data and the output is always one among a finite number of levels. The quantizer is a function whose set of output values are discrete and usually finite. Obviously, this is a process of approximation and a good quantizer is one which represents the original signal with minimum loss or distortion. There are two types of quantization: scalar quantization and vector quantization. In scalar quantization, each input symbol is treated in producing the output while in vector quantization the input symbols are clubbed together in groups called vectors, and processed to give the output. This clubbing of data and treating them as a single unit, increases the optimality of the vector quantizer, but at the cost of increased computational complexity [21]. Error Metrics Two of the error metrics used to compare the various image compression techniques are the Mean Square Error (MSE) and the Peak Signal to Noise Ratio (PSNR). The MSE is the cumulative squared error between the compressed and the original image, whereas PSNR is a measure of the peak error. The mathematical formulae for the two are Error E = Original image – Reconstructed image 2
GVIP Special Issue on Image Compression, 2007
MSE = E / (SIZE OF IMAGE) PSN R
2 0 lo g 1 0
§ ¨ ©
in the transform. A coefficient in a low subband can be thought of as having four descendants in the next higher subband. The four descendants each also have four descendants in the next higher subband and we see a quad-tree [13] emerge: every root has four leafs. Fig. 2 shows the quad tree structure. A very direct approach is to simply transmit the values of the coefficients in decreasing order, but this is not very efficient. This way a lot of bits are spent on the coefficient values. A better approach is to use a threshold and only signal to the decoder if the values are larger or smaller than the threshold. If we adopt bit-plane coding [13], [2] then our initial threshold t0 will be
255 · ¸ M SE ¹
A lower value for MSE means lesser error, and as seen from the inverse relation between the MSE and PSNR, this translates to a high value of PSNR. Logically, a higher value of PSNR is good because it means that the ratio of Signal to Noise is higher. Here, the 'signal' is the original image, and the 'noise' is the error in reconstruction. So, a compression scheme having a lower MSE (and a high PSNR), can be recognized as a better one. Wavelet-based coding is more robust under transmission and decoding errors, and also facilitates progressive transmission of images [35]. In addition, they are better matched to the Human Visual System (HVS) characteristics. Because of their inherent multiresolution nature [15], wavelet coding schemes are especially suitable for applications where scalability and tolerable degradation are important
Here MAX(.) means the maximum coefficient value in the image and (x,y) denotes the coefficient. If the threshold is also transmitted to the decoder, it can reconstruct already quite a lot. An interesting thing to note in these schemes is that all of them have relatively low computational complexity, considering the fact that their performance is comparable to the best-known image coding algorithms [1]. This feature seems in conflict with the well-known tenets of information theory that the computational complexity of a stationary source (i.e., source sample aggregates) increases as the coding efficiency of the source increases [1]. These coding schemes seem to have provided a breathing space in the world of simultaneously increasing efficiency and computational complexity. An important characteristic that this class of coders possesses is the property of progressive transmission and embedded nature. Progressive transmission [1] refers to the transmission of information in decreasing order of its information content. In other words, the coefficients with the highest magnitudes are transmitted first. Since all of these coding schemes transmit bits in decreasing bit plane order, this ensures that the transmission is progressive. Such a transmission scheme makes it possible for the bit stream to be embedded, i.e., a single coded file can used to decode the image at various rates less than or equal to the coded rate, to give the best reconstruction possible with the particular coding scheme. With these desirable features of excellent performance and low complexity, along with others such as embedded coding and progressive transmission, these scalar quantized significance testing schemes have recently become very popular in the search for practical, fast and efficient image coders, and in fact, have become the basis for serious consideration for future image compression standards.
3. Coding Schemes for Wavelet Coefficients Image coding utilizing scalar quantization on hierarchical structures of transformed images has been a very effective and computationally simple technique. Shapiro was the first to introduce such a technique with his EZW [13] algorithm. Different variants of this technique have appeared in the literatures which provide an improvement over the initial work. Said & Pearlman [1] successively improved the EZW algorithm by extending this coding scheme, and succeeded in presenting a different implementation based on a setpartitioning sorting algorithm. This new coding scheme, called the SPIHT [1], provided an even better performance than the improved version of EZW. The common features for the wavelet based schemes are briefly discussed here. All of these scalar quantized schemes employ some kind of significance testing of sets or groups of pixels, in which the set is tested to determine whether the maximum magnitude in it is above a certain threshold. The results of these significance tests determine the path taken by the coder to code the source samples. These significance testing schemes are based on some very simple principles which allow them to exhibit excellent performance. Among these principles is the partial ordering of magnitude coefficients with a set-partitioning sorting algorithm, bit plane transmission in decreasing bit plane order, and exploitation of self-similarity across different scales of an image wavelet transform. LEVEL 3
LEVEL 2 LEVEL 1
4. Various Coding Methods of Wavelet Coefficients
Fig 2. Quad Tree structure After wavelet transforming an image we can represent it using trees because of the sub-sampling that is performed 3
GVIP Special Issue on Image Compression, 2007
Embedded Zero tree Wavelet (EZW) Coding [13] The EZW algorithm [13], [12] was one of the first algorithms to show the full power of wavelet-based image compression. It was introduced in the groundbreaking paper of Shapiro [13]. An EZW encoder [13] is an encoder specially designed to use with wavelet transforms. The EZW encoder was originally designed to operate on images (2D-signals) but it can also be used on other dimensional signals. The EZW encoder is based on progressive encoding to compress an image into a bit stream with increasing accuracy. This means that when more bits are added to the stream, the decoded image will contain more detail. Progressive encoding is also known as embedded encoding, which explains the E in EZW. The EZW encoder is based on two important observations: 1.
2.
insignificant with respect to t0. The idea is to define a tree of zero symbols which starts at a root which is also zero and labeled as end-of-block. The EZW algorithm encodes the tree structure so obtained. This results in bits that are generated in order of importance, yielding a fully embedded code. The main advantage of this encoding is that the encoder can terminate the encoding at any point, thereby allowing a target bit rate to be met exactly. To arrive at a perfect reconstruction, the process is repeated after lowering the threshold, until the threshold has become smaller than the smallest coefficient to be transmitted. Similarly, the decoder can also stop decoding at any point resulting in the image that would have been produced at the rate of the truncated bit stream. One important thing is however still missing: the transmission of the coefficient positions. Indeed, without this information the decoder will not be able to reconstruct the encoded signal (although it can perfectly reconstruct the transmitted bit stream). It is in the encoding of the positions where the efficient encoders are separated from the inefficient ones.
Natural images in general have a low pass spectrum. When an image is wavelet transformed the energy in the sub bands decreases as the scale decreases (low scale means high resolution), so the wavelet coefficients will, on average, be smaller in the higher sub bands than in the lower sub bands. This shows that progressive encoding is a very natural choice for compressing wavelet transformed images, since the higher sub bands only add detail; Large wavelet coefficients are more important than small wavelet coefficients.
EZW encoding uses a predefined scan order to encode the position of the wavelet coefficients Through the use of zero-trees many positions are encoded implicitly. Several scan orders are possible, as long as the lower sub bands are completely scanned before going on to the higher sub bands.
These two observations are exploited by encoding the wavelet coefficients in decreasing order, in several passes. For every pass a threshold is chosen against which all the wavelet coefficients are measured. If a wavelet coefficient is larger than the threshold it is encoded and removed from the image, if it is smaller it is left for the next pass. When all the wavelet coefficients have been visited the threshold is lowered and the image is scanned again to add more detail to the already encoded image. This process is repeated until all the wavelet coefficients have been encoded completely or another criterion has been satisfied (maximum bit rate for instance). We shall describe EZW in some detail because a solid understanding of it will make it much easier to comprehend the other algorithms we shall be discussing. These other algorithms build upon the fundamental concepts that were first introduced with EZW. A zerotree is a quad-tree of which all nodes are equal to or smaller than the root. The tree is coded with a single symbol and reconstructed by the decoder as a quad-tree filled with zeroes. The root has to be smaller than the threshold against which the wavelet coefficients are currently being measured. The EZW encoder exploits the zerotree based on the observation that wavelet coefficients decrease with scale. The zerotree is based on the hypothesis that if a wavelet coefficient at a coarse scale is insignificant with respect to a given threshold t0, then all wavelet coefficients of the same orientation in the same spatial location at a finer scales are likely to be
Fig.3. Scanning Order
The scan order, as shown in Fig. 3, seems to be of some influence of the final compression result. The algorithm produces excellent results without any pre-stored tables or codebooks, training, or prior knowledge of the image source. Table 1. Compression Ratio & PSNR Results For 512 X 512 Lena: Rate 1.0 0.5 0.25 0.125 0.0625 0.03125
Compression Ratio 8 16 32 64 128 256
MSE 7.21 15.32 31.33 61.67 114.5 188.27
PSNR 39.55 36.28 33.17 30.23 27.54 25.38
For 512 X 512 Barbara: Rate 1.0 0.5 0.25 0.125 0.0625 0.03125
4
Compression Ratio 8 16 32 64 128 256
MSE 19.92 57.57 136.8 257.1 318.5 416.2
PSNR 35.14 30.53 26.77 24.03 23.10 21.94
GVIP Special Issue on Image Compression, 2007
(a)
(b)
(c)
the new subsets. This set division process continues until the magnitude test is done to all single coordinate significant subsets in order to identify each significant coefficient. To reduce the number of magnitude comparisons, a set partitioning rule that uses an expected ordering in the hierarchy defined by the sub band pyramid, is used. The objective is to create new partitions such that subsets expected to be insignificant contain a large number of elements, and subsets expected to be significant contain only one element. The relationship between magnitude comparisons and message bits is given by the significance function
(d)
Fig. 4. EZW coding on Lena 512X512 (a) original image (b) For 1 bpp (c) For 0.5 bpp (d) 0.25 bpp
The unavoidable artifacts produced at low bitrates using this method are typical of wavelet coding schemes coded to the same PSNRs. However, subjectively these are not objectionable as the blocking effects typical of block transform coding schemes. EZW encoding does not really compress anything; it only reorders wavelet coefficients in such a way that they can be compressed very efficiently. An EZW encoder should therefore always be followed by a symbol encoder, for instance an arithmetic encoder. The next scheme, called SPIHT, is an improved form of EZW which achieves better compression and performance than EZW.
S n (T )
Set partitioning sorting algorithm One of the main features of the SPIHT algorithm is that the ordering data is not explicitly transmitted. Instead, it is based on the fact that the execution path of any algorithm is defined by the results of the comparisons of its branching points. So, if the encoder and decoder have the same sorting algorithm, then the decoder can duplicate the encoder’s execution path if it receives the results of the magnitude comparisons, and the ordering information can be recovered from the execution path. One important fact in the design of the sorting algorithm is that there is no need to sort all coefficients. Actually, an algorithm which simply selects the
ci, j
The Fig. 5 shows how the spatial orientation tree is defined in a pyramid constructed with recursive four-band splitting. Each node of the tree corresponds to a pixel, and is identified by the pixel coordinate. Its direct descendants (offspring) correspond to the pixels of the same spatial orientation in the next finer level of the pyramid. The tree is defined in such a way that each node has either no offspring or four off-springs, which always form a group of 2X2 adjacent pixels. The pixels in the highest level of the pyramid are the tree roots and are also grouped in 2X2 adjacent pixels. However, their offspring branching is different, and in each group one of them (indicated by the star in Fig) has no descendants. Parts of the spatial orientation trees are used as the partitioning subsets in the sorting. With this algorithm the rate can be precisely controlled because the transmitted information is formed of single bits. The encoder can estimate the progressive distortion reduction and stop at a desired distortion value. In the algorithm all branching conditions based on the significance data Sn, which can only be calculated with
t 2 n then
the coefficient is said to be significant; otherwise it is called insignificant .The sorting algorithm divides the sets of pixels into partitioning subsets Tm and performs the magnitude test { c } t 2n .
max ( i , j )Tm
.
Fig.5. Parent –offspring dependencies in spatial orientation tree
2 n d c i , j d 2 n 1 ,with n
decremented in each pass. Given n, if
m
Spatial orientation trees Normally, most of the image’s energy is concentrated in the low frequency components. Consequently, the variance decreases as one move from the highest to the lowest of the sub band pyramid. There is a spatial selfsimilarity between sub bands, and the coefficients are expected to be better magnitude-ordered as one move downward in the pyramid following the same spatial orientation. A tree structure, called spatial orientation tree, naturally defines the spatial relationship on the hierarchical pyramid.
Set Partitioning in Hierarchical Trees (SPIHT) Coding The SPIHT coder [1], [2] is a highly refined version of the EZW algorithm and is a powerful image compression algorithm that produces an embedded bit stream from which the best reconstructed images in the mean square error sense can be extracted at various bit rates. Some of the best results—highest PSNR values for given compression ratios — for a wide variety of images have been obtained with SPIHT. Hence, it has become the benchmark state-of-the-art algorithm for image compression [22].
coefficients such that
°1 , max { c i , j } t 2 n ® ( i , j ) T °¯ 0 , otherwise
i, j
If the decoder receives a “no” as that answer, that is the subset is insignificant, then it knows that all coefficients in Tm are insignificant. If the answer is “yes”, that is the subset is significant, then a certain rule shared by the decoder and encoder is used to partition Tm into new subsets and the significance test is then applied to 5
GVIP Special Issue on Image Compression, 2007
the knowledge of c i,j are output by the encoder. Thus, to obtain the desired decoder's algorithm, which duplicates the encoder's execution path as it sorts the significant coefficients, the words output by input in the algorithm need to be replaced. The ordering information is recovered when the coordinates of the significant coefficients are added to the end of the LSP, that is, the coefficients pointed by the coordinates in the LSP are sorted. But whenever the decoder inputs data, its three control lists (LIS, LIP, and LSP) are identical to the ones used by the encoder at the moment it outputs that data, which means that the decoder indeed recovers the ordering from the execution path. It is easy to see that with this scheme coding and decoding have the same computational complexity. An additional task done by decoder is to update the reconstructed image. For the value of n when a coordinate is moved to the LSP, it is known n
that 2 d
coder is not designed to explicitly consider the human visual system (HVS) characteristics. Extensive HVS research has shown that there are three perceptually significant activity regions in an image: smooth, edge, and textured or detailed regions [20]. Table. 2. Compression ratio & PSNR Results SPIHT Results for Lena 256 x 256
information, plus the sign bit that is input just after the ^
r1.5 * 2 n . Similarly,
during the refinement pass the decoder adds or subtracts ^
2n-1 to ci , j when it inputs the bits of the binary representation of
ci, j
LEVEL
256x256 256x256 256x256 256x256 256x256 256x256
3 3 4 4 5 5
BITPLANES DISCARDED 3 5 3 5 3 5
CR
PSNR
6.57 13.03 9.44 28.38 10.38 38.94
31.28 26.81 29.00 25.87 26.76 24.66
By incorporating the differing sensitivity of the HVS to these regions in image compression schemes such as SPIHT, the perceptual quality of the images can be improved at all bit rates. Efficiency of the algorithm can be improved by entropy-coding its output, but at the expense of a larger coding/decoding time. On the other hand, the significance values are not equally probable, and there is a statistical dependence between Sn(i, j) and Sn(D(i, j)) and also between the significance of adjacent pixels.
c i , j d 2 n 1 . So, the decoder uses that
insertion in the LSP, to set ci , j
LENA
. In this manner the distortion
gradually decreases during both the sorting and refinement passes.
Compression ratio & PSNR Results SPIHT Results for Barbara 256 x 256
Features of SPIHT The SPIHT method is not a simple extension of traditional methods for image compression, and represents an important advance in the field. The method provides the following: x good image quality, high PSNR, especially for color images; x it is optimized for progressive image transmission; x produces a fully embedded coded file; x simple quantization algorithm; x fast coding/decoding (nearly symmetric); x has wide applications, completely adaptive; x Can be used for lossless compression. x can code to exact bit rate or distortion; x Efficient combination with error protection. What makes SPIHT really outstanding is that it yields all those qualities simultaneously. Table 2 gives the Compression ratio and PSNR results for SPIHT algorithm. It can be seen that the compression ratio increase, when the levels of decomposition is increased. This is because, when the levels of decomposition are increased, coefficients with higher magnitude concentrate mostly on the root levels. Also most of the coefficients will have low magnitudes. These coefficients require only less number of bits to be transmitted. Hence the compression ratio will increase when decomposition level is increased. But the resolution of the reconstructed image will reduce for higher decomposition levels. The perceptual image quality, however, is not guaranteed to be optimal, as seen from Fig. 6, since the
256x256 256x256 256x256 256x256
3 3 4 4
BITPLANES DISCARDED 3 5 3 5
256x256 256x256
5 5
3 5
BARBARA
LEVEL
CR
PSNR
4.92 12.83 6.32 28.07
28.01 23.76 26.67 23.11
6.73 38.77
25.68 22.58
Compression ratio & PSNR Results SPIHT Results for Camera man 256 x 256
CAMERA
LEVEL
BITPLANES DISCARDED
256x256 256x256 256x256 256x256 256x256 256x256
3 3 4 4 5 5
3 5 3 5 3 5
(a)
(b)
CR
PSNR
5.3053 11.5411 6.3262 22.3606 7.9141 29.3538
29.66 25.81 26.67 25.17 27.36 24.70
(c)
Fig. 6. SPIHT results (a) Original image (b) levels of decomposition=3, 0.7 bpp (c) levels of decomposition= 9, 0.1 bpp
6
GVIP Special Issue on Image Compression, 2007
The need for reducing the number of lists used in this scheme led to the forming of the next algorithm, called SPECK. Set Partitioned Embedded bloCK coder (SPECK) The image coding scheme the SPECK [2], is different from some of the above-mentioned schemes in that it does not use trees which span, and exploit the similarity, across different sub bands; rather, it makes use of sets in the form of blocks. The main idea is to exploit the clustering of energy in frequency and space in hierarchical structures of transformed images. The SPECK algorithm [2] can be said to belong to the class of scalar quantized significance testing schemes. It has its roots primarily in the ideas developed in the SPIHT, and few block coding image coding algorithms. Features of the coder The SPECK image coding scheme has all the properties characteristic of scalar quantized significance testing schemes. In particular, it exhibits the following properties: x It is completely embedded - a single coded bit stream can be used to decode the image at any rate less than or equal to the coded rate, to give the best reconstruction x of the image possible with the particular coding scheme. x It employs progressive transmission - source samples are coded in decreasing order of their information content. x It has low computational complexity - the algorithm is very simple, consisting mainly of comparisons, and does not require any complex computation. x It has low dynamic memory requirements - at any given time during the coding process, only one connected region (lying completely within a subband) is processed. Once this region is processed, the next region is then considered for processing. x It has fast encoding/decoding - this is due to the low-complexity nature of the algorithm. x It has efficient performance - its efficiency is comparable to the other low-complexity algorithms available today.
depending on the characteristics of pixels in the original set. A set of size 1 consists of just one pixel. The other type of sets used in the SPECK algorithm is referred to as sets of type I. These sets are obtained by chopping off a small square region from the top left portion of a larger square region. A typical set I is illustrated in figure below. Two linked lists: LIS - List of Insignificant Sets, and LSP - List of Significant Pixels are maintained. The former contains sets of type S of varying sizes which have not yet been found significant against a threshold n while the latter obviously contains those pixels which have tested significant against n. Two types of set partitioning are used in SPECK. They are quad tree partitioning and octave band partitioning. The motivation for quad tree partitioning of sets, as shown in Fig. 8, is to zoom in quickly to areas of high energy in the set S and code them first.The idea behind octave band partitioning scheme, shown in Fig.9, is to exploit the hierarchical pyramidal structure of the subband de-composition, where it is more likely that energy is concentrated at the top most levels of the pyramid and as one goes down the pyramid, the energy content decreases gradually.
Fig.7. Partitioning of Image X into sets S & I
Fig.8. Quad tree Partitioning of set S
Coding method Consider an image X which has been adequately transformed using an appropriate subband transformation (most commonly, the discrete wavelet transform). The transformed image is said to exhibit a hierarchical pyramidal structure defined by the levels of decomposition, with the topmost level being the root. The finest pixels lie at the bottom level of the pyramid while the coarsest pixels lie at the top (root) level. The SPECK algorithm makes use of rectangular regions of image. These regions or sets, henceforth referred to as sets of type S, can be of varying dimensions. The dimension of a set S depends on the dimension of the original image and the subband level of the pyramidal structure at which the set lies. During the course of the algorithm, sets of various sizes will be formed,
Fig.9 . Octave band Partitioning of Set I If a set I is significant against some threshold n, it is more likely that the pixels that cause I to be significant lie in the top left regions of I. These regions are decomposed into sets of type S, and are put next in line for processing. In this way, regions that are likely to contain significant pixels are grouped into relatively smaller sets and processed first, while regions that are 7
GVIP Special Issue on Image Compression, 2007
likely to contain insignificant pixels are grouped into a arge set. Partitioning a set S into four off springs O(S) (i.e., forming sets S of a new reduced size) is equivalent to going down the pyramid one level at the corresponding finer resolution. Hence, the size of a set S for an arbitrary image corresponds to a particular level of the pyramid. The decoder uses the same mechanism as the encoder. It receives significance test results from the coded bit stream and builds up the same list structure during the execution of the algorithm. Hence, it is able to follow the same execution paths for the significance tests of the different sets, and reconstructs the image progressively as the algorithm proceeds.
SPECK Results for Camera man 256 x 256 CAMERA MAN 256x256 256x256 256x256 256x256 256x256 256x256
Table 3. Compression ration & PSNR Results SPECK Results for Lena 256 x 256
256x256 256x256 256x256 256x256 256x256 256x256
LEVEL 3 4 5 3 4 5
BITPLANES DISCARDED
COMPRESSIO N RATIO
3 3 3 5 5 5
11.23 11.76 11.86 41.25 52.05 54.60
3 4 5 3 4 5
BITPLANES DISCARDED 3 3 3 5 5 5
COMPRESSION RATIO
PSNR
7.87 8.24 8.31 31.14 38.47 40.22
31.78 31.78 31.76 23.54 23.59 23.68
Hence for a certain number of bit planes discarded, the percentage error for low magnitude number is high. But for high magnitude numbers, it is negligible. Therefore with less decomposition levels the resolution of the reconstructed image is better than in SPIHT as shown in Fig. 11. Embedded Block Coding with Optimized Truncation (EBCOT) The EBCOT algorithm [24] uses a wavelet transform to generate the subband coefficients which are then quantized and coded. Although the usual dyadic wavelet decomposition is typical, other "packet" decompositions are also supported and occasionally preferable. The original image is represented in terms of a collection of subbands, which may be organized into increasing resolution levels. The lowest resolution level consists of the single LL subband. Each successive resolution level contains the additional subbands, which are required to reconstruct the image with twice the horizontal and vertical resolution.
It can be seen that SPECK gives higher compression ratios. This shows advantage of processing sets in the form of blocks rather than in the form of spatial orientation trees. The reconstructed images for these compression ratios are giving appreciable resolution for the same decomposition levels and bitplanes discarded as compared with SPIHT. Fig.10. shows reconstructed images using SPECK algorithm for various values of decomposition levels and bitplanes discarded.
LENA
LEVEL
PSNR 31.59 31.56 31.53 23.93 24.08 24.16
SPECK Results for Lena 512 x 512
(a) LENA
LEVEL
512x512 512x512 512x512 512x512 512x512 512x512
3 4 5 3 4 5
BITPLANES DISCARDED 3 3 3 5 5 5
CR 13.35 14.39 15.35 49.82 71.94 78.50
256x256 256x256 256x256 256x256 256x256 256x256
LEVEL 3 4 5 3 4 5
BITPLANES DISCARDED 3 3 3 5 5 5
COMPRESSION RATIO 7.55 7.73 8.13 46.22 58.63 61.24
(c)
PSNR 32.34 32.25 32.12 25.31 25.66 25.51
(d)
(e)
Fig.10. Reconstructed Images (a) Original Image (b) Bit plane discarded=3, levels of decomposition=3 (c) Bit plane discarded=3, levels of decomposition=5 (d) Bit plane discarded=5, levels of decomposition=4(e) Bit plane discarded=5, levels of decomposition=5
SPECK Results for Barbara 512 x 512 BARBARA
(b)
PSNR 26.83 26.88 26.89 19.85 19.84 19.62
(a) (b) Fig 11. Comparison between (a) SPIHT and (b) SPECK using Lena , for 3 levels of decomposition and 3 bitplanes discarded
We can see that images encoded with low decomposition levels is reconstructed with good resolution and for high decomposition levels, the resolution is reduced slightly.For high decomposition levels, most of the coefficients are of low magnitude,
The EBCOT algorithm is related in various degrees to much earlier work on scalable image compression [24]. 8
GVIP Special Issue on Image Compression, 2007
A key advantage of scalable compression is that the target bit-rate or reconstruction resolution need not be known at the time of compression. Another advantage of practical significance is that the image need not be compressed multiple times in order to achieve a target bit-rate, as is common with the existing JPEG compression standard. EBCOT partitions each subband into relatively small blocks of samples and generates a separate highly scalable bit-stream to represent each socalled code-block. The algorithm exhibits state-of-the-art compression performance while producing a bit-stream with an unprecedented feature set, including resolution and SNR scalability together with a random access property. The algorithm has modest complexity and is extremely well suited to applications involving remote browsing of large compressed images. Table 4. PSNR Results
BitRate
SPIHT
0.0625 0.125 0.25 0.5 1.0
28.38 31.10 34.11 37.21 40.41
EBCOT 1 layer 28.30 31.22 34.28 37.43 40.61
(a) (b) Fig .12.Comparison between (a) SPIHT (b) EBCOT at 0.15 bpp with respect to state-of-the-art compression algorithms, significantly outperforming the common reference, SPIHT. Wavelet Difference Reduction (WDR) One of the defects of SPIHT is that it only implicitly locates the position of significant coefficients. This makes it difficult to perform operations which depend on the position of significant transform values, such as region selection on compressed data. Region selection, also known as region of interest (ROI), means a portion of a compressed image that requires increased resolution. This can occur, for, example, with a portion of a low resolution medical image that has been sent at a low bpp rate in order to arrive quickly. Such compressed data operations are possible with the WDR algorithm of Tian and Wells [27]-[30]. The term difference reduction refers to the way in which WDR encodes the locations of significant wavelet transform values. Although WDR will not produce higher PSNR values than SPIHT, as observed from Table 4, it can produce perceptually superior images, especially at high compression rates. The only difference between WDR and bitplane encoding is the significant pass. In WDR, the output from the significance pass consists of the signs of significant values along with sequences of bits which concisely describe the precise locations of significant values Adaptively Scanned Wavelet Difference Reduction (ASWDR) Though WDR produces better perceptual results than SPIHT, there is still room for improvement. One such algorithm is the ASWDR algorithm of Walker [28]. The adjective adaptively scanned refers to the fact that this algorithm modifies the scanning order used by WDR in order to achieve better performance. ASWDR adapts the scanning order so as to predict locations of new significant values. If a prediction is correct, then the output specifying that location will just be the sign of the new significant value- the reduced binary expansion of the number of steps will be empty. Therefore, a good prediction scheme will significantly reduce the coding output of WDR. The prediction method used by ASWDR is: if w (m) is significant for threshold T, then the values of the children of m are predicted to be significant for half threshold T/2. For many natural images, this prediction method is reasonably good. High compression ratio images are used in reconnaissance and in medical applications, where fast
For Barbara (512 X 512) EBCOT 5 layer 28.30 31.20 34.29 37.41 40.57
EBCOT Generic 28.10 31.05 34.16 37.29 40.48
EBCOT Spacl 28.27 31.22 34.40 37.49 40.49
The first column of PSNR results, in Table 4, corresponds to the well known SPIHT [1] algorithm with arithmetic coding. The remaining columns are obtained with the EBCOT algorithm, in all cases, the popular Daubechies 9/7 bi-orthogonal wavelet filters with a five level transform is used. For EBCOT, code-blocks of size 64 X 64 with sub-blocks of size 16 X 16 are used. The EBCOT bit-stream is composed of a collection of quality layers and that SNR scalability is obtained by discarding unwanted layers. The second column in the table corresponds to a bit-stream with only one layer, so that the overall bit-stream is not SNR scalable. Results in this case are obtained by generating a separate compressed bit-stream for each of the relevant bit-rates. Each of the remaining columns is obtained by truncating a single bitstream to the relevant bit-rates. The third column corresponds to a limited form of SNR scalability in which there are only five quality layers, optimized for each of the target bit-rates in the table; this may be sufficient for some applications. The fourth column corresponds to the extreme case in which 50 separate layers are included in the bit-stream spanning bit-rates ranging from approximately 0.05 bpp to 2.0 bpp. The EBCOT images in Fig. 12. , exhibit substantially less ringing around edges and superior rendition of texture; some details preserved in the EBCOT images are completely lost by the SPIHT algorithm. In fact, for this image we find that the image quality obtained using EBCOT at 0.2 bpp is comparable to that obtained using SPIHT at 0.4 bpp. As might be expected, performance decreases as more layers are added to the bit-stream, because the overhead associated with identifying the contributions of each code-block to each layer grows. Nevertheless, performance continues to be competitive 9
GVIP Special Issue on Image Compression, 2007
transmission and ROI are employed, as well as multiresolution detection. The Table 5 shows the improved PSNR values for ASWDR compared to WDR. The WDR and ASWDR algorithms do allow for ROI while SPIHT does not. Furthermore, their superior performance in displaying edge details at low bit rates facilitates multi-resolution detection Fig.13. shows magnifications of 128:1 and 64:1 compressions of the “Lena” image. At 0.0625 bpp, the WDR compression does a better job in preserving the shape of Lena’s nose and in retaining some of the striping in the band around her hat. Similar remarks apply to the 0.125 bpp compressions. SPIHT, however, does a better job in preserving parts of Lena’s eyes.
exploit the spatial compaction property, a symbol is defined, that indicates that a spatial region of high frequency coefficients has zero value. Application of this symbol is referred to as zero-tree quantization, because it will involve setting to zero a tree-structured set of wavelet coefficients. This is done in the first phase called Tree Pruning Algorithm. In the next phase called Predicting the tree, the relation between a spatial region in image and the tree- structured set of coefficients is exploited. Zero tree quantization can be viewed as a mechanism for pointing to the location where high frequency coefficients are clustered. Thus, this quantization mode directly exploits the spatial clustering of high frequency coefficients predicted. For coefficients that are not set to zero by zero tree quantization, a common uniform scalar quantization, independent of coefficient frequency band is applied. Uniform quantization followed by entropy coding provides nearly optimal coding efficiency.
Table 5. PSNR Results Image/Method Lena,0.5 bpp Lena, 0.25 bpp Lena,0.125 bpp Goldhill, 0.5 bpp Goldhill, 0.25 bpp Goldhill, 0.125 bpp Barbara, 0.5 bpp Barbara, 0.25 bpp Barbara, 0.125 bpp Airfield, 0.5 bpp Airfield, 0.25 bpp Airfield, 0.125 bpp
SPIHT 37.09 33.85 30.85 33.10 30.49 28.39 31.29 27.47 24.77 28.57 25.90 23.68
WDR 36.45 33.39 30.42 32.70 30.33 28.25 30.68 26.87 24.30 28.12 25.49 23.32
ASWDR
Embedded Predictive Wavelet Image Coder (EPWIC) EPWIC [5] is an embedded image coder based on a statistical characterization of natural images in the wavelet transform domain. The joint distribution between pairs of coefficients at adjacent spatial locations, orientations, and scales are defined. Although the raw coefficients are nearly, uncorrelated, their magnitudes are highly correlated. A linear magnitude predictor coupled with both multiplicative and additive uncertainties, provides a reasonable description of the conditional probability densities. In EPWIC, subband coefficients are encoded one bitplane at a time using a non-adaptive arithmetic encoder. Bit-planes are ordered using an algorithm that considers the MSE reduction per encoded bit. The overall ordering of bitplanes is determined by the ratio of their encoded variance to compressed size. The coder is inherently embedded, and should prove useful in applications requiring progressive transmission.
36.67 33.64 30.61 32.85 30.34 28.23 30.87 27.03 24.52 28.36 25.64 23.50
Compression with Reversible Embedded Wavelet (CREW) CREW [3] is a new form of still image compression developed at the RICOH California Research Center in Menlo Park, California, is a new type of image compression system and is lossy and lossless, bi-level and continuous-tone, progressive by resolution and pixel depth, and can preserve the source image at encode and quantize for the target device at decode or transmission. It uses a new form of wavelet transform technology. It is pyramidal (similar to hierarchical) and progressive by nature. CREW was the stimulus for a new JPEG 2000 standard. CREW offers a number of features that should be expected of the compression standards of the next century including: The features make CREW an ideal choice for applications that require high quality and flexibility for multiple input and output environments, such as, medical imagery, fixed-rate and fixed-size applications (ATM, frame store, etc.) , pre-press images ,continuous-tone
Fig.13. SPIHT, WDR, ASWDR Compressions (a) – (c) 0.0625 bpp , 128:1 (d) – (f) 0.125 bpp , 64:1
The ASWDR compressions better preserve the shape of Lena’s nose and details of her hat, and show less distortion along the side of her left cheek (especially for the 0.125 bpp case). Some other recently developed wavelet based image coding schemes are briefed below. Space –Frequency Quantization (SFQ) SFQ [31] for Wavelet Image Coding belongs to a new class of image coding algorithms coupling standard scalar quantization of frequency coefficients with tree structured quantization Its good performance appears to confirm the promised efficiencies of hierarchical representation. This technique exploits both spatial and frequency compaction property of the wavelet transform through the use of two simple quantization modes. To 10
GVIP Special Issue on Image Compression, 2007
facsimile documents , image archival ,World Wide Web image or graphic type , satellite images .
SPIHT
Many of these applications have never used compression because the quality could not be assured, the compression rate was not high enough, or the data rate was not controllable. Three new technologies combine to make CREW possible: x the reversible wavelet transform: non-linear filters that have exact reconstruction implemented in minimal integer arithmetic x the embedded code stream: a method of implying quantization in the code stream x and a high-speed, high-compression binary entropy coder The same CREW code stream can be used for both lossless and lossy applications due to embedded quantization. The wavelet transform produces pyramidally ordered data and a natural means for interpolation. The bit-significance coding allows for bitplane progressive transmission. Furthermore, CREW compression is idempotent and CREW encoding and decoding can be simply and efficiently implemented in either hardware or software. All of these features combine to make a flexible "device-independent" compression system
SPECK
Stack- Run (SR) SR [25] image coding is a new approach in which a 4ary arithmetic coder is used to represent significant coefficient values and the length of zero runs between coefficients. This algorithm works by raster scanning within subbands and therefore involves much lower addressing complexity than other algorithms such as zero tree coding which requires creation and maintenance of lists of dependencies across different decomposition levels. Despite its simplicity and the fact that these dependencies are not explicitly used, this algorithm is competitive with best enhancements of zero tree coding.
EBCOT
5.Discussions
WDR
In this section, the various features of the main coding schemes are summarized. The performance of various coding techniques and the demerits of the same are tabulated in Table 6. The latest techniques such as EBCOT, ASWDR are performing better than its predecessors such as EZW, WDR. Each technique can be well suited with different images based upon the user requirements. The Table gives complete investigation of all the coding techniques of wavelet coefficients. Table 6 Summary of various Coding Techniques TYPE EZW
FEATURES x Employs progressive and embedded transmission x Uses zerotree concept x Tree coded with single symbol x Uses predefined scanning order x Good results without prestored tables, codebooks, training
DEMERITS x Transmission of coefficient position is missing x No real compression x Followed by arithmetic encoder
ASWDR
11
x Widely used- high PSNR values for given CRs for variety of images x Quad- tree or hierarchical trees are set – partitioned x Employs spatial orientation tree structure x Keeps track of states of sets of indices by means of 3 lists: LSP, LIS, LIP x Employs progressive and embedded transmission x Superior to JPEG in perceptual image quality and PSNR
x Only implicitly locates position of significant coefficients x More memory requirements due to 3 lists x Transmitted information is formed of single bits x Suits variety of natural images x Perceptual quality not optimal
x Does not use trees x Uses blocks- rectangular regions x Exploits clustering of energy in frequency and space x Employs progressive and embedded transmission x Low computational complexity x Employ quad tree and octave band partitioning x Low memory requirements due to 2 lists x Better PSNR than SPIHT x Supports packet decompositions also x Block based scheme x Modest complexity x Bit stream composed of a collection of quality layers x SNR scalability can be obtained x Less ringing around edges x Superior rendition of textures x Preserves edges lost by SPIHT
x As layers increase, performance decreases x Suits applications involving remote browsing of large compressed images
x Uses ROI concept x Introduced by Tian and Wells x Encodes the location of significant wavelet transform values x Better perceptual image quality than SPIHT x No searching through quad trees as in SPIHT x Suits low resolution medical images at low bpp rate x Less complex than SPIHT x Higher edge correlations than SPIHT x Better preservation of edges than SPIHT x Modified scanning order compared to WDR x Prediction of locations of new significant values x Dynamically adapts to the locations of edge details x Encodes more significant values than WDR x PSNR better than SPIHT and WDR
x PSNR not higher than SPIHT
GVIP Special Issue on Image Compression, 2007
Handbook Ed. K. R. Rao et al. Boca Raton, CRC Press LLC, 2001 [13] Jerome M Shapiro Embedded Image coding using zerotrees of wavelets coefficients”, IEEE Transactions on signal processing, “Vol 41 No. 12 , pp. 3445-3462, Dec 1993, [14] Malavar, H. S. Signal Processing with Lapped Transforms, Norwood, MA, Artech House, 1992 [15] Mallat, S. G. “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation”, IEEE Trans. PAMI, vol. 11, no. 7,pp. 674-693. , July 1989, [16] Marc Antonini, Michel Barlaud, Pierre Mathieu & Ingrid Daubechies, “Image coding using wavelet transform”, IEEE Transactions on image processing, Vol. 1 No. 2,April 1992 [17] Penne baker, W. B.and Mitchell, J. L. JPEG - Still Image Data Compression Standards, Van Nostrand Reinhold, 1993. [18] Rao, K. R.and Yip, P. Discrete Cosine Transforms Algorithms, Advantages, Applications, Academic Press, 1990. [19] Raghuveer Rao & Ajit .S.Bopardikar “Wavelet Transforms-Introduction to theory & applications”, Pearson Education Asia, New Delhi, 2004 [20] K. Sayood, “Introduction to Data Compression”, 2nd Ed., Academic Press, Morgan Kaufmann Publishers, 2000 [21] K.P.Soman & K.I.Ramachandran “Insight into Wavelets from theory to practice”, Prentice Hall India, New Delhi, 2002. [22] Strang, G. and Nguyen, T. Wavelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, 1996, http://www-math.mit.edu/~gs/books/wfb.html. [23] Subhasis Saha, “Image compression- from DCT to Wavelets-AReview” http://www.acm.org/crossroads/ xrds6-3/sahaimgcoding.html [24] Taubman, D.High Performance Scalable Image Compression with EBCOT, submitted to IEEE Transactions on. Image Processing, Mar.1999, http://maestro.ee.unsw.edu.au/~taubman/activities/pre prints/ebcot.zip. [25] Tsai, M. J., Villasenor, J. D., and Chen, F. StackRun Image Coding, IEEE Trans. Circuit and systems for video technology, vol. 6, no. 5, Oct. 1996,pp.519521. [26] Vetterli, M. and Kovacevic, J. Wavelets and Subband Coding, Englewood Cliffs, NJ, Prentice Hall, 1995, http:/ /cm.bell-labs.com/ who/ jelena/ Book/ home.html [27] Volkmer, H.,Onthe regularity of wavelets, IEEE Trans. on Information Theory,38, 872–876, 1992. [28] Walker, J.S.,A lossy image codec based on adaptively scanned wavelet difference reduction, Optical Engineering, in press. [29] Wallace, G.K.,The JPEG still picture compression standard, Comm. of the ACM, 34(4), 30–44, 1991. [30] Witten, I., Neal, R., and Cleary, J., Arithmetic coding for compression, Comm.of the ACM, 30(6), 1278–1288, 1986. [31] Xiong, Z., Ramachandran, K. and Orchard, M. T. Space-Frequency Quantization for Wavelet Image
x Perceptual image quality better than SPIHT and slightly better than WDR x Slightly higher edge correlation values than WDR x Preserves more of the fine details x Suits high CR images like in reconnaissance and medical images
6. Conclusions The various wavelet based image coding schemes are discussed in this paper. Each of these schemes finds use in different applications owing to their unique characteristics. Though there a number of coding schemes available, the need for improved performance and wide commercial usage, demand newer and better techniques to be developed.
7. References [1] Amir Said, and Pearlman, W. A. A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees, IEEE Trans. Circuit and systems for Video Technology, vol. 6, no. 3, pp. 243-250, June 1996,. [2] Asad Islam & Pearlman, “An embedded and efficient low-complexity, hierarchical image coder”,Visual Communication and Image processing’99proceedings of SPIE.,Vol 3653,pp294-305,Jan1999 [3] Boliek, M., Gormish, M. J., Schwartz, E. L., and Keith, A. Next Generation Image Compression and Manipulation Using CREW, Proc. IEEE ICIP, 1997, http://www.crc.ricoh.com/CREW. [4] Brower, B.V., Low-bit-rate image compression evaluations, Proc.of SPIE, Orlando, FL, April 4–9, 1994 [5] Buccigrossi, R., and Simoncelli, E. P. EPWIC: Embedded Predictive Wavelet Image Coder, GRASPLaboratory,TR#414, http://www.cis.upenn.edu/~butch/EPWIC/index.html [6] Calderbank, R. C., Daubechies, I., Sweldens, W., and Yeo, B. L. Wavelet Transforms that Map Integers to Integers, Applied and Computational Harmonic Analysis (ACHA), vol. 5, no. 3, pp. 332-369, 1998, http://cm.bell-labs.com/who/wim/papers/integer.pdf. [7] Coifman, R. R. and Wickerhauser, M. V. Entropy Based Algorithms for Best Basis Selection, IEEE Trans. Information Theory, vol. 38, no. 2, , pp. 713718., Mar. 1992 [8] N. J. Fliege, “Multirate Digital Signal Processing: Multirate Systems - Filter Banks – Wavelets”, Wiley &sons, Jan 2000 [9] Froment , J. and Mallat, S. Second Generation Compact Image Coding with Wavelets, in C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, vol. 2, Academic Press, NY, 1992 [10] Gersho, A and Gray, R. M. Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1991 [11] Hiroshi Kondo and Yuriko Oishi, “Digital Image Compression using directional sub-block DCT” [12] James S Walker "Wavelet-Based Image Compression" The Transform and Data Compression 12
GVIP Special Issue on Image Compression, 2007
Coding, IEEE Trans.on Image processing, vol. 6, no. 5, pp.677-693, May 1997, [32] http://www.jpeg.org/public/welcome.html [33] ISO/IEC/JTC1/SC29/WG1 N390R, JPEG 2000 Image Coding System, Mar. 1997, http://www.jpeg.org/public/wg1n505.pdf. [34] http://www.cipr.rpi.edu/research/SPIHT. [35] http://perso.wanadoo.fr/polyvalens/clemens/ezw/ ezw.html. [36] http://www.scit.wlv.ac.uk/~c9581158/index.html [37]http://ict.ewi.tudelft.nl/old/itcolleges/et1038/hc/hc6/sld001.html [38]http://www.ee.bgu.ac.il/~greg/graphics/compress.ht ml #lossless [39]http://www.datacompression.info/image Compression.shtml [40]http://www.acm.org/crossroads/ xrds6-3/sahaimgcoding.html [41] http://www.cs.sfu.ca/CourseCentral/365/li/material/ notes/Chap4/ Chap4.2 /Chap4.2.html
13
GVIP Special Issue on Image Compression, 2007
14
GVIP Special Issue on Image Compression, 2007
Wavelet Networks Approach for Image Compression C. Ben Amar and O. Jemai REsearch Group on Intelligent Machines (REGIM) University of Sfax, National Engineering School of Sfax, B.P. W, 3038, Sfax, Tunisia
[email protected] and
[email protected] Szu et al. [26, 27] have shown usage of WNs for signals representation and classification. They have explained how a set of WN, "a super wavelet", can be produced and the original ideas presented can be used for the assortment of model. Besides, they have mentioned the big compression of data achieved by such a representation of WN’s. Zhang [1] has proved that the WN’s can manipulate the non-linear regression of the moderately big dimension of entry with the data of training. This paper is organized as follows: Section 2 focuses on the theoretical concept of the wavelet networks. Section 3 is devoted to the approach of image compression using MLP neural networks. Section 4 presents another approach for images compression based on neural network using wavelet coefficient decomposition. Section 5 gives an overview of the approach of image compression using wavelet networks. In the last section, we present some results and tables related to the performances of neural and wavelet network approaches.
Abstract In this paper, we present a direct solution method based on wavelet networks for image compression. Wavelet networks are a combination of radial basis function (RBF) networks and wavelet decomposition, where radial basis functions were replaced by wavelets. The results show that the wavelet networks approach succeeded to improve high performances in terms of compression ratio and reconstruction quality. Keywords: images compression, networks, wavelet networks.
wavelets,
neural
1. Introduction The fast development of computer applications came with an enormous increase of the use of digital images, mainly in the domain of multimedia, games, satellite transmissions and medical imagery. Digital form of the information secures the transmission and facilitates its manipulation. Constant increase of digital information quantities necessitates more storage space and wider transmission lines. This implies more the research on effective compression algorithms. The basic idea of image compression is to reduce the middle number of bits by pixel (bpp) necessary for image representation. The image compression approaches can be divided into two methods: lossless and lossy. Different compression approaches have been studied in the literature, for example, the approaches by transformation. The compression by transformation consists in image decomposition on a basis of orthogonal functions, and quantification of the spectral coefficients such as scalar quantization method. The quantification of the coefficients leads to a loss of information and thus makes the compression irreversible. A binary coding will be applied thereafter to convert information under binary shape. This coding step of quantized data is important because it permits to increase the rate of compression. Wavelet networks (WNs) were introduced by Zhang and Benveniste [1,2] in 1992 as a combination of artificial neural networks and wavelet decomposition. Since then, however, WNs have received only little attention. In the wavelet networks, the basis radial functions in some RBF-networks are replaced by wavelets.
2. Theoretical background 2.1 Wavelet To analyze a signal from its graph is far from permitting to reach all information that it contains. It is often necessary to transform it, that is, to give another representation which clearly shows its features. The baron Jean Baptiste Joseph Fourier suggested that all functions must be able to express themselves in a simple way as a sum of sinus. In "the analytic theory of the heat", Fourier gets the equations to the partial derivatives describing the transfers of heat, and thus developing them in infinite sum of trigonometric functions. The Fourier analysis decomposes the functions as sum of elementary functions. In this case, it is about periodic functions as sinus and cosine functions. Given a periodic function f(t), f(t+T) = f(t), we have : ( )
However, this limitations [3]:
15
(1)
approach
presents
the
following
GVIP Special Issue on Image Compression, 2007
A neural network is a set of the connected neurons forming an oriented graph and permitting the exchange of information through the connections [19]. There are different types of neural networks, among which we can mention two categories: - The model of the Multi - Layers perception known as supervised network because it requires training to provide the desired output (target). The input data are repeatedly presented to the neural network. For every presentation, the output of the network is compared to the target and an error is calculated. This error is then used to adjust the weights so that the error decreases with every iteration and the model becomes closer and closer to the reproduction of the target [5].
- The Fourier analysis does not reveal all the information of the temporal domain: the beginning and the end of the signal are not localized; - The frequency associated with a signal is inversely proportional to its period. Therefore, to get information on a low frequency signal, we have to observe within a long time interval. Inversely, a high frequency signal can be observed on one short time interval. Consequently, it would be interesting to have a method of analysis that can take into account the frequency of the signal to analyze. Another interesting method will be based on analysis of time-frequency representation (wavelet presentation). The last one leads to an approximation of a signal by superposition of functions. The wavelet decomposition of function consists of a sum of functions obtained from simple operations (translations and dilations) done on a main function named "mother wavelet". This "mother wavelet" presents the following properties: o Admissibility Given a function ȥ belonging to L2(IR) and its Fourier transform TF(ȥ), admissibility condition is satisfied if: +∞
³
−∞
TF (ψ (ω ))
ω
2
d ω < +∞
Figure 2. MLP Model
(2)
- The radial basis function networks possess three layers and form a particular class of the Multi - Layers networks. Every neuron of the hidden layer uses a core function called kernel function (for example, the Gaussian) as function of activation. This function is centred around the specified point by the weight vector associated with the neuron. The position and the ''width'' of these curves are learned from the bosses. In general, there are fewer core functions in the RBF network than in inputs. Every neuron of the output implements a linear combination of these functions; the idea is to approximate a function by a set of other ones. From that point of view the hidden neurons provide a set of functions that form a basis representing the inputs in the ''covered space '' by the hidden neurons [5].
o Localization Wavelet is a function that must have fast decrease on the two sides of its defined domain, in other words, or better it must have a compact support. o Oscillation Oscillation is obtained when the zero order moment or the average of the function is null. (3) ³ψ ( t ) dt = 0 ȥ(t) must have an undulatory character; it changes sign at least once. o The Translation and dilation The mother wavelet must satisfy the properties of translation and dilation which can generate other wavelets. 2.2 Neural Networks For some years the scientific community has been interested in the concept of neural networks, and the number of studies is continuously increasing [13, 16, 18]. The first modelling of a neuron occurs in the forties. It was achieved by McCulloch and Pitts [15]. Being inspired by the biologic neurons, they proposed the following model: a formal neuron receives a number of inputs (x1, x2,…, xn); a weight w representative of the strength of the connection is associated with each of these inputs. The neuron makes the weighted sum of its inputs and calculates its outputs by a non linear transformation of this sum [4]. X1
W1
X2 . . .
W2 . . .
Xn
Wn
Inputs
Figure 3. RBF Neural Network
2.3 Wavelet Networks The wavelet networks result from the combination of wavelets and neural networks [20, 21]. First, continuous wavelet transform of function f is defined as the scalar product of f and the mother wavelet ȥ [17]. 1 x−b W ( a, b) = f ( x) ψ ( ) dx (4) ³ a a The reconstruction of the function f from its transform is given by the following expression:
θ
Σ
S f Function
f ( x) =
Weights Figure 1. McCulloch and Pitts Model
16
1 cψ
³R ³R
+
W (a, b)
1 a
ψ(
x −b ) da db a
(5)
GVIP Special Issue on Image Compression, 2007
This equation expresses a function in the term of a sum of all dilations and all translations of the mother wavelet. If we have a finished number Nw of wavelets ȥab obtained from the mother wavelet ȥ, the equation (6) will be considered as an approximation of an inverse transform. f ( x) ≈
algorithm to correct the connection weights by minimizing the propagation error. For this purpose we use following steps (introduced in [8][9][10]): 1. 2. 3.
Nw
¦ c j ψ j ( x)
(6)
j =1
4.
This can also be considered as the decomposition of a function in a weighted sum of wavelets, where each weight cj is proportional to W(aj,bj). This establishes the idea for wavelet networks [6,8]. This network can be considered composed of three layers: a layer with Ni inputs, a hidden layer with Nw wavelets and an output linear neuron receiving the weighted outputs of wavelets. Both input and output layers are fully connected to the hidden layer. From input to output neurons, a feed-forward propagation algorithm is used [7].
5. 6.
4. Image compression using MLP neural networks with wavelets coefficients The previous approach has led to the problem of block effect in the reconstructed image, especially when increasing compression rate. It gives us the idea to exploit the interesting results of wavelet transformation in the imagery field. For this purpose, wavelet coefficients obtained by the image decomposition are taken as the inputs of the neural network instead of the image grey levels.
y Σ c1 Ȍ1 a0 a1
Ȍ2
c2 . . . .
dividing the original image into m blocks l by l pixels and reshaping each one into column vector; arranging all column vectors in a matrix; choosing a suitable training algorithm and defining the training parameters: the number of iterations, the number of hidden neurons and the initial conditions; simulating the network using input data, result matrices and an initial error value; rebuilding the compressed image; finally, terminate the calculation if error is smaller than threshold.
Output linear neuron cNw ȌNw Wavelet layer
We used the same approach described before with the neural networks except that we first applied the wavelet transform decomposition on the original image. Finally, following the training step, we applied the inverse wavelet transform to generate the reconstructed compression image. An improvement has been observed on the quality of the reconstructed image.
aNi
xNi 1 x1 Figure 4. Graphic representation of wavelet network
As mentioned before, the wavelet networks present certain architecture proximity with the neural networks. The main similarity is that both networks calculate a linear combination of non linear functions to adjust parameters. These functions depend on adjustable parameters (dilations and translations). But the essential difference between them results from the nature of the transfer functions used by the hidden cells. This can be developed as follows: - First, contrary to the functions used in the neural networks, wavelet networks use functions that decrease quickly, and stretch toward zero in all directions of the space. - Second, contrary to the functions used in the neural networks, the function of every mono-dimensional wavelet is determined by two adjustable parameters (translation and dilation) that are wavelet structural parameters. -Finally, every mono-dimensional wavelet possesses two structural parameters and every multidimensional wavelet possesses two structural parameters for each variable.
We applied these two approaches to the Barbara image (256x256 pixels). With a compression rate of 75%, we obtained the results shown below:
MLP neural networks with wavelet coefficients
Zoom
Bloc size: 8x8 Hidden Neurons: 16 Iterations: 20 000 PSNR : 39.769 EQM : 6.857 Bloc size: 8x8
Zoom
Hidden Neurons: 16 Iterations: 20 000 PSNR : 18.264 EQM : 969.789
MLP neural networks Figure 5. Reconstructured images with the two approaches
5. Wavelet networks for image compression We would like to construct a system taking as input any image represented in the spatial domain and providing as output the reconstructed image after its compression. Our purpose is to use an artificial neural network and more especially a wavelet network by means of describing the network architecture specialized for the problem of image compression. This architecture includes a layer of input neurons, a hidden neuron layer and a layer of output neurons. Both of input and output layers are fully connected to the hidden layer. The feed-forward propagation algorithm is used to adjust the weights of this network.
3. Image compression using MLP neural networks As mentioned before, the wavelet networks present architecture proximity with the MLP’s networks having only one hidden layer. But the essential difference between these two results from the nature of the transfer functions used by the hidden neurons [12,13]. First, we have developed an MLP neural network with three layers: input layer, hidden layer and output layer. This network uses the back propagation training 17
GVIP Special Issue on Image Compression, 2007
x1
These formulas permit us to use the descent gradient algorithm. For the results presented in this work, the parameters of the wavelet network have been initialized randomly. Finally, the different parameters are updated in accordance with the following rules [11]: ∂E (13) Δω = − ω (t + 1) = ω (t ) + μ ω Δω ∂ω ∂E (14) a (t + 1) = a(t ) + μ a Δa Δa = − ∂a ∂E (15) b(t + 1) = b(t ) + μ b Δb Δb = − ∂b where μw, μa, μb are the training rates of the network parameters.
y1
w11
a b1
w1j
xj
yj w1m
bk ak
xm
ym
Figure 6. Wavelet network architecture
5.1 Method principle In order to compress the image, first, it’s necessary to segment it in a set of m blocks l by l pixels. These blocks are used as inputs for our wavelet network. A three layer feed-forward neural network is used: an input layer with m neurons with l2 bloc pixels, an output layer with m neurons and a hidden layer with a number of neurons smaller than m and based on wavelet functions. Our network is trained in order to reproduce in output the information given in input. We denote the input bloc by X=(x1,….. xm) and the output of the network by Y=(y1,…..., ym). At the end of the training we aim at having Y=X for every block presented to the network.
5.2.2 Application of the algorithm The approach of compression enhancement in the setting of this paper is based on the wavelet networks. Back propagation algorithm has been employed for the training processes. The compression starts with the segmentation of the image in blocks of stationary size (whose value is to chosen by the user). The effect of this operation is shown in figure 7. The training of our network is self adjusted while applying the algorithm of backpropagation already seen in the previous paragraph. Our training data contain the vectors representing one block of the image.
5.2 Network Training 5.2.1 Training algorithm During training stage the weights, dilations and translations parameters, are iteratively adjusted to minimize the network error. We used a quadratic cost function to measure this error. Training aims at minimizing the cost function given by: 2 1 T (7) E= ¦ y d (t ) − y (t ) 2 t =1 where y(t) is the output given by the network and yd(t) the desired output. The network output expression is :
(
y (t ) =
K ¦ w .ψ k
k =1
)
§t −b · ¨ ¸ © a ¹ k
k
Figure 7. Segmentation effect
We will start with initializing the parameters of the network randomly. Thereafter, we will start the training process. It requires a set of prototypes and targets to learn the network behaviour as such. During training stage, the parameters of the network are iteratively adjusted. We can schematize the stages of training by the figure below:
(8)
k
In the basic back-propagation training algorithm the weights are moved in the direction of the negative gradient, which is the direction of the most rapid performance decrease. Iteration of this algorithm can be written as [7]: V t + 1 = V t − ε (t )
∂E
(9)
∂V
where Vt is a vector of current weights, dilations and translations, and e(t) is the step of the pressure gradient for the iteration t. While putting e(t) = yd(t)–y(t), we will have these derived functions [11]: T ∂E (10) = ¦ e(t ) ψ (τ ) ∂ω ij t =1 ∂E ∂ai
∂ψ (τ )
T
=
¦ e (t ) ω t =1
ij
∂ai
T ∂E ∂ψ (τ ) = ¦ e(t ) ω ij ∂bi t =1 ∂bi
with τ =
t − bi
Bloc n°1
Bloc n
1
k
1
(11)
1
ai
m
2
2
m
Figure 8. Representation of training stage (12)
We will repeat the training task until the verification criterion is satisfied. The number of iterations is fixed by the user. 18
GVIP Special Issue on Image Compression, 2007
6. Implementation and results The presence of distortion in the reconstructed image will be minimized. But some elements will be lost as in the case of any compression process. Two sets of criteria are used to evaluate this distortion: the mean square error (EQM) and the peak signal to noise ratio (PSNR) according to the compression rate. The Barbara image was used for the three network models with similar training conditions.
We can deduce that the wavelet networks are more effective when the compression rate is lower then 75%, but less effective when this rate is beyond this limit. At this moment, the results become in favour of the approach of the MLP’s networks using the wavelet coefficients. We can notice the superiority of the models using the wavelet analysis, the model of the wavelet network and the one MLP using the wavelet coefficients.
6.1 Comparison between neural and wavelet networks
6.2 Results and discussion
First, the results related to MLP model and wavelet one will be compared. The obtained results are shown in the table below: MLP with
Comp-
Neural
ression
network
Wavelet
Wavelet network (Mexican hat)
coefficients
rate
The tests are made on three images from the Matlab library, namely: Barbara, Lena and Belmont1. The performances of image compression are essentially based on the two following criteria: the compression rate and the quality of the reconstructed image. These performances depend on several criteria:
PSNR
EQM
PSNR
EQM
PSNR
EQM
25%
19.023
814.121
40.568
5.704
42.500
3.656
50%
18.809
855.235
40.402
5.926
40.454
5.855
75%
18.264
969.789
39.769
6.857
38.382
9.437
87.5%
17.659
1114.714
37.917
10.502
23.01
325.101
93.75%
17.095
1269.08
23.117
317.179
18.022
1025.359
- the type of wavelets used in the hidden layer, - the number of these wavelets, and - the number of iterations. The following tasks consist in modifying the values of these different parameters and observing their effects on the quality of the reconstructed image. The different obtained results are described below. The evolution of the PSNR and the EQM are presented for each case.
Table 1. Performances MLP/Wavelet Network
All the three tests are made for the same number of iterations witch is equal to 20. - For Barbara image Wavelet
PSNR EQM compression rate 40.117 6.329 46.777% 37.135 12.576 73.388% Morlet 33.706 27.697 86.694% 32.324 38.075 90.020% 21.586 451.235 93.347% 40.454 5.855 46.777% 38.382 9.437 73.388% Mexican 23.01 325.101 86.694% Hat 19.089 801.911 90.020% 18.022 1025.359 93.347% 30.596 56.684 46.777% 26.951 131.213 73.388% Slog1 21.102 504.432 86.694% 17.554 1141.862 90.020% 16.474 1464.341 93.347% 44 .325 2.401 46.777% 43.206 3.107 73.388% Rasp3 30.019 64.734 86.694% 18.916 834.46 90.020% 6.257 15393.7 93.347% Table 2. Variation of the number of wavelets and effects on the Barbara image
93 .7 5%
87 .5 0%
75 %
MLP
50 %
50 40 30 20 10 0
6.2.1 Performances according to the number of wavelets
Evolution of PSNR according to rate compression
25 %
PSNR
The performances of the MLP model using wavelet coefficients are near or intermediate between those of wavelet model and those of the MLP classic one. These results show a good behaviour of the MLP model using the wavelet coefficients compared to the wavelet network. The following plots present the evolution of the performances in term of EQM and PSNR according to the compression rate of the three analysed models.
MLP & W. Coeff. Wavelet Network
compression rate Figure 9. Evolution of the PSNR according to the compression rate for the three models of networks Evolution of EQM according to compression rate
93 .7 5%
87 .5 0%
75 %
50 %
25 %
EQM
1400 1200 1000 800 600 400 200 0
compression rate
MLP MLP & W. Coeff. Wavelet network
Figure 10. Evolution of the EQM according to the compression rate for the three models of networks 19
GVIP Special Issue on Image Compression, 2007 Evolution of PSNR according to the compression rate for Lena picture
Evolution of PSNR according to the compression rate for Barbara picture
40
Morlet
30
Mexicain Hat
20
Slog1
10
Rasp3
PSNR
PSNR
50
0
60 50 40 30 20 10 0
Morlet Mexicain Hat
58,46% 79,23% compression rate
46,78% 73,39% 86,69% 90,02% 93,35% compression rate
150 EQM
100 50 Morlet
0
Mexicain Hat
50,15% 58,46% 70,92% 79,23% compression rate
93 ,3 5%
90 ,0 2%
86 ,6 9%
Morlet 73 ,3 9%
46 ,7 8%
EQM
Rasp3
Evolution of mean square erreur according to the compression rate of Lena picture
Evolution of the mean square error according to the compression rate for Barbara picture
compression rate
Slog1 Rasp3
Figure 14. Evolution of the EQM according to the compression rate for Lena image.
Mexicain hat Slog1
- For Bemont1 image
Rasp3
Wavelet
PSNR EQM compression rate 35.437 18.591 47.25% 26.759 137.14 73.625% Morlet 23.235 308.719 86.812% 9.088 8020.503 90.109% 8.819 8533.450 93.406% 45.279 1.928 47.25% 38.853 8.502 73.625% Mexican 38.590 8.995 86.812% Hat 14.302 2414.687 90.109% 13.758 2736.836 93.406% 52.654 0.352 47.25% 50 662 0.558 73.625% Slog1 48.976 0.823 86.812% 25.937 165.696 90.109% 15.497 1833.657 93.406% 44.168 2.49 47.25% 38.462 9.263 73.625% Rasp3 31.598 45.006 86.812% 26.461 146.857 90.109% 12.92 3319.512 93.406% Table 4. Variation of the number of wavelets and effects on the Belmont1 image
Figure 12. Evolution of the EQM according to the compression rate for Barbara image.
-
Slog1
Figure 13. Evolution of the PSNR according to the compression rate for Lena image.
Figure 11. Evolution of the PSNR according to the compression rate for Barbara image.
18000 16000 14000 12000 10000 8000 6000 4000 2000 0
91.692%
For Lena image Wavelet
PSNR EQM compression rate 48.634 0.89 50.153% 35,403 18,738 58.461% 35.049 20.331 70.923% Morlet 32,655 35,283 79.230% 10,734 5490,723 87.538% 8,467 9253,837 91.692% 34,153 24,986 50.153% 33,347 30,085 58.461% 31,774 43,218 70.923% Mexican Hat 27,199 123,929 79.230% 17.984 1034.367 87.538% 15,925 1661,584 91.692% 45.393 1.878 50.153% 43,259 3,069 58.461% Slog1 39,761 6,87 70.923% 28,931 83,165 79.230% 15.475 1843.150 87.538% 14.739 2183.398 91.692% 44,925 2,091 50.153% 43,185 3,122 58.461% 39,997 6,506 70.923% Rasp3 36,707 13,882 79.230% 31,87 42,271 87.538% 30,481 58,202 91.692% Table 3. Variation of the number of wavelets and effects on the Lena image
Evolution of PSNR according to the compression rate for Belmont1 picture
60 50 PSNR
40 30 20 10
Morlet
0
Mexicain hat 47,25%
73,63%
86,81%
90,11%
compression rate
93,41%
Slog1 Rasp3
Figure 15. Evolution of the PSNR according to the compression rate for Belmont1 image
20
GVIP Special Issue on Image Compression, 2007 Evolution of mean square according to the number of iterations for Barbara picture
10000
2500
8000
2000
6000
1500
EQM
EQM
Evolution of the EQM according to the compression rate of Belmont1 picture
4000 2000 0 47,25%
73,63%
86,81%
90,11%
93,41%
1000 500
Morlet
Morlet
0
Mexicain Hat
Mexicain Hat
Slog1
compression rate
3
Rasp3
Figure 16. Evolution of the EQM according to the compression rate for Belmont1 image
6.2.2 Performances according to the number of iterations The behaviour of our wavelet network will be presented depending on the variation of the number of iterations. All the three following tests are made for the same compression rate witch is equal to 73%:
Slog1
5 10 15 20 Number of iterations
Rasp3
Figure 18. Evolution of the EQM according to the number of iterations for Barbara image
- For Lena image Wavelet
PSNR
EQM
Nbr of iterations
22,127 398,418 3 29.613 71.073 5 Morlet 31,972 41,287 10 34,239 27,497 15 35.049 20.331 20 17.236 1228.569 3 17,542 1145,138 5 Mexican 20,541 541 10 Hat 26,247 154,282 15 31,774 43,218 20 17,714 1100,518 3 21,993 410,863 5 Slog1 30,985 51,822 10 33,969 26,067 15 39,761 6,870 20 27.587 113.325 3 31,822 42,743 5 Rasp3 35,723 17,406 10 36,101 15,955 15 39,997 6,506 20 Table 6. Variation of the number of iterations and effects on Lena image
- For Barbara image Wavelet
PSNR EQM Nbr of iterations 25.537 181.679 3 27.161 124.993 5 Morlet 33.141 31.546 10 35.269 19.324 15 37.135 12.576 20 14.486 2314.257 3 17.752 1091.081 5 Mexican Hat 19.894 666.262 10 21.255 487.019 15 38.382 9.437 20 19.459 736,453 3 20.976 599.644 5 Slog1 21.912 418.654 10 23.787 271.869 15 43.361 2.672 20 27.095 126.926 3 31.925 41.738 5 Rasp3 38.856 8.460 10 41.774 4.321 15 43.206 3.107 20 Table 5. Variation of the number of iterations and effects on the Barbara image
Evolution of PSNR according to the number of iterations for Lena picture
50 PSNR
40 30 20 10
Morlet Mexicain Hat
0
Slog1
3 Evolution of PSNR according to the number of iterations for Barbara picture
5 10 15 Number of iterations
20
Rasp3
Figure 19. Evolution of the PSNR according to the number of iterations for Lena image
50 Evolution of the mean square error according to the number of iterations for Lena picture
30 20
Morlet
1500
10
Mexicain Hat
Morlet
0
Mexicain Hat
3
5 10 15 20 Number of iterations
EQM
PSNR
40
Slog1 Rasp3
Figure 17. Evolution of the PSNR according to the number of iterations for Barbara image
Slog1
1000
Rasp3
500 0 3
5
10
15
20
Number of iterations
Figure 20. Evolution of the EQM according to the number of iterations for Lena image
21
GVIP Special Issue on Image Compression, 2007
more efficiency, especially for rates of compression lower than 75%.
- For Belmont1 image Wavelet
PSNR EQM Nbr of iterations 22,127 398,418 3 29.613 71.073 5 Morlet 31,972 41,287 10 34,239 27,497 15 35.049 20.331 20 17.236 1228.569 3 17,542 1145,138 5 Mexican Hat 20,541 541 10 26,247 154,282 15 31,774 43,218 20 17,714 1100,518 3 21,993 410,863 5 Slog1 30,985 51,822 10 33,969 26,067 15 39,761 6,870 20 27.587 113.325 3 31,822 42,743 5 Rasp3 35,723 17,406 10 36,101 15,955 15 39,997 6,506 20 Table 7. Variation of the number of iterations and effects on Belmont1 image Evolution of the PSNR according to the number of iterations for Belmont1 picture
60
PSNR
50 40 30 20 10
Morlet Mexicain Hat
0 3
5 10iterations 15 Number of
20
Slog1 Rasp3
Figure 21. Evolution of the PSNR according to the number of iterations for Belmont1 image
Evolution of the mean square error according to the number of iterations for Belmont1 picture
8000
EQM
6000 4000 2000
Morlet Mexicain Hat
0 3
5 10 15 20 Number of iterations
Slog1 Rasp3
Figure 22. Evolution of the EQM according to the number of iterations for Belmont1 image
7. Conclusion The major contributions of this article arise from the formulation of a new approach, wavelet networks, to the image compression that provide improved efficiency compared to the classical MLP neural networks. By combining wavelet theory and neural networks capability, we have shown that significant improvements in the performance of the compression algorithm can be realised. Both type and number of wavelets used for the network are studied. To test the robustness of our approach, we have implemented and compared the results with some other approaches based on neural networks (MLP). The results show that the wavelet networks approach succeeded to improve high performances and 22
8. Acknowledgements The authors would like to acknowledge the financial support of this work by grants from the General Direction of Scientific Research and Technological Renovation (DGRSRT), Tunisia, under the ARUB program 01/UR/11/02.
9. References [1] Q. Zang, Wavelet Network in Nonparametric Estimation. IEEE Trans. Neural Networks, 8(2):227236, 1997 [2] Q. Zang and A. Benveniste, Wavelet networks. IEEE Trans. Neural Networks, vol. 3, pp. 889-898, 1992. [3] A. Grossmann and B. Torrésani, Les ondelettes, Encyclopedia Universalis, 1998. [4] R. Ben Abdennour, M. Ltaïef and M. Ksouri. un coefficient d’apprentissage flou pour les réseaux de neurones artificiels, Journal Européen des Systèmes Automatisés, Janvier 2002. [5] M. Chtourou. Les réseaux de neurones, Support de cours DEA A-II, Année Universitaire 2002/2003. [6] Y. Oussar. Réseaux d’ondelettes et réseaux de neurones pour la modélisation statique et dynamique de processus, Thèse de doctorat, Université Pierre et Marie Curie, juillet 1998. [7] R. Baron. Contribution à l’étude des réseaux d’ondelettes, Thèse de doctorat, Ecole Normale Supérieure de Lyon, Février 1997. [8] C. Foucher and G. Vaucher. Compression d’images et réseaux de neurones, revue Valgo n°01-02, 17-19 octobre 2001, Ardèche. [9] J. Jiang. Image compressing with neural networks – A survey, Signal processing: Image communication, ELSEVIER, vol. 14, n°9, 1999, p. 737-760. [10] S. Kulkarni, B. Verma and M. Blumenstein. Image Compression Using a Direct Solution Method Based Neural Network, The Tenth Australian Joint Conference on Artificial Intelligence, Perth, Australia, 1997, p. 114-119. [11] G. Lekutai. Adaptive Self-tuning Neuro Wavelet Network Controllers, Thèse de Doctorat, BlacksburgVirgina, Mars 1997. [12] R.D. Dony and S. Haykin. Neural network approaches to image compression, Proceedings of the IEEE, V83, N°2, Février, 1995, p. 288-303. [13] A. D’souza Winston and Tim Spracklen. Application of Artificial Neural Networks for real time Data Compression, 8th International Conference On Neural Processing, Shanghai, Chine, 14-18 Novembre 2001. [14] Ch. Bernard, S. Mallat and J-J Slotine. Wavelet Interpolation Networks, International Workshop on CAGD and wavelet methods for Reconstructing Functions, Montecatini, 15-17 Juin 1998. [15] D. Charalampidis. Novel Adaptive Image Compression, Workshop on Information and Systems Technology, Room 101, TRAC Building, University of New Orleans, 16 Mai 2003.
GVIP Special Issue on Image Compression, 2007
[19] W. S. McCulloch and W. Pitts. A logical calculus of ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115-133, 1943. Reprinted in Anderson and Rosenfeld, 1988. [20] H. Szu, B. Telfer, and S. Kadambe. Neural network adaptive wavelets for signal representation and classification. Optical Engineering, 31:1907-1961, 1992. [21] H. Szu, B. Telfer, and J. Garcia. Wavelet ransforms and neural networks for compression and recognition. Neural Networks, 9:695-708, 1996.
[16] M. Cao, Mang. Image Compression Using Neural Networks, Thesis, Departement of Computer Science and Electrical Endineering, University of Maryland, Baltimore Country, MO, Sept, 1996. [17] S. S. Luengar, E. C. Cho and Vir V. Phoha. Foundations of wavelets networks and applications, Chapman & Hall/CRC, 2000. [18] Y. M. Masalmah and Dr. J. Ortiz. Image Compression Using Neural Networks, CRC Conference, Poster Presentation, 16 Mars 2002.
23
GVIP Special Issue on Image Compression, 2007
24
GVIP Special Issue on Image Compression, 2007
Image Compression Using Wavelet Packet Tree G. K. Kharate, A. A. Ghatol and P.P. Rege K.K.Wagh Institute of Engineering Education and Research,
Nashik – 422003, Maharashtra, India. Ph. No. (+91-0253) 2515236 Mobile: +919890192008 E-mail:
[email protected]
Discrete Wavelet Transform (DWT) has emerged as a popular technique for image coding applications [7]; DWT has high decorrelation and energy compaction efficiency. The blocking artifacts and mosquito noise are absent in a wavelet-based coder due to the overlapping basis functions [7]. The JPEG 2000 standard employs a discrete wavelet transform for image compression due to its merits in terms of scalability, localization and energy concentration [6]. JEPG 2000 suffers from blurring artifacts and ringing artifacts [9]. The performance of discrete wavelet transform based coding also depends on the wavelet decomposition structure. In wavelet decomposition, the approximate component of image is further decomposed, but in wavelet packet the approximation as well as detailed components is decomposed.
Abstract The necessity in image compression continuously grows during the last decade; one of the most powerful and perspective approaches in this area is image compression using discrete wavelet transform. The image compression includes transform of image, quantization and encoding. This paper mainly describes the transform of image and quantization technique. The idea of Wavelet Packet Tree is used to transform the still and colour images. This paper describes the new approach to construct the best tree on the basis of Shannon entropy. Algorithm checks the entropy of decomposed nodes (child nodes) with entropy of node, which has been decomposed (parent node) and takes the decision of decomposition of a node. In addition, we have proposed an adaptive thresholding for quantization, which is based on type of wavelet used and nature of image. The proposed algorithm provides a good compression performance. Results are compared in terms of percentage of zeros; percentage of energy retained and signals to noise ratio.
This paper describes novel technique, in which the approximate and detail components are further decomposed if the sum of information content in decomposed details and approximation components is less than information content of component which has been decomposed. The information content is calculated on the basis of Shannon entropy. This paper is organized as follows. In section 2 – discuss the brief review of image compression, section 3 – describe discrete wavelet transform, section 4 - Wavelet Packet Tree is described, section 5 - experimental results are presented and section 6 - conclusions of a paper is given.
Keywords: Compression, Wavelet, Wavelet Packet Tree, Shannon Entropy, and Adaptive Threshold.
1. Introduction Visual communication is becoming increasingly important with applications in several areas such as multimedia, communication, transmission and storage of remote sensing images, education and business documents, and medical images etc. Since digital images are inherently voluminous efficient data compression techniques are essential for their archival and transmission. The international standard organization (ISO) has proposed the JPEG standard [15] for image compression. These standards employ discrete cosines transform (DCT) to reduce the spatial redundancy presented in images. It is noted that DCT has drawbacks of the blocking artifacts, mosquito noise and aliasing distortions at high compression ratio [16, 9].
2. Image Compression: Brief Review A common characteristic of most of images is that the neighboring pixels are correlated. Therefore most important task is to find a less correlated representation of image. The fundamental components of compression are reduction of redundancy and irrelevancy. Redundancy reduction aims at removing duplication from the image. Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver namely the human
25
GVIP Special Issue on Image Compression, 2007
with respect to design objectives. The wavelet packet tree for 3-level decomposition is shown in Figure 2.
visual system (HVS). Three types of redundancies can be identified: Spatial redundancy, Spectral redundancy and temporal redundancy. In still image, the compression is achieved by removing spatial redundancy and Spectral redundancy. 3. Discrete Wavelet Transform Wavelets are functions that satisfy certain mathematical requirements and are used in representing data or other functions. The basic idea of the wavelet transform is to represent any arbitrary signal ‘X’ as a superposition of a set of such wavelets or basis functions. These basis functions are obtained from a single photo type wavelet called the mother wavelet by dilation (scaling) and translation (shifts).
In this paper Shannon entropy criteria is used to construct the best tree. Shannon entropy criteria find the information content of signal ‘S’.
The discrete wavelet transform dimensional signal can be defined as follows.
Entropy ( S ) ¦ S i log ( S i )
for
§ X b1 Y b2 · 1 ¸ , < ¨¨ a2 ¸¹ a1a2 © a1
w (a1 , a2 , b1 , b2 )
Figure 2: Wavelet Packet Tree Decomposition
two-
2
Information content of decomposed component (approximation and details) may be greater than the information content of components, which has been decomposed. In this paper the sum of information of decomposed component (child node) is checked with information of component which has been decomposed (parent node) and if sum of information of child nodes is less than the parent node, then only parent node is decomposed and child nodes are considered in tree otherwise parent nodes are not decomposed.
(1)
indexes
which is known as mother wavelet. Low frequencies are examined with low temporal resolution while high frequencies with more temporal resolution. A wavelet transform combines both low pass and high pass filtering in spectral decomposition of signals. LL2
HL2
LH2
HH2
The thresholding of wavelet packet coefficients is performed, if the value of the coefficients of the leaf node (except for the approximation) is less than threshold then is then it becomes zero. The value of threshold is calculated based on nature of image and type of wavelet used for decomposition.
HL1
LH1
(2)
i
w(a1 , a2 , b1 , b2 ) are called wavelet coefficients of signal X, and a1 , a 2 are dilation and b1 ,b2 is translation, < is the transforming function, The
2
Threshold =K–Sqrt(mean (wenergy(T)×100))
(3)
The value of K is decided by taking the results on standard 256×256 colour images: wbarb, woman, Leena, wmandril and horizontal, vertical and diagonal line images. If the entropy of the decomposed components is less then the threshold value is high. If the entropy of the decomposed components is more then threshold value is low.
HH1
Figure 1: Two level wavelet decomposition In case of discrete wavelet, the image is decomposed into a discrete set of wavelet coefficients using an orthogonal set of basis functions. These sets are divided into four parts such as approximation, horizontal details, vertical details and diagonal details. Further decomposition of approximation is takes place; we get again four components shown in Figure 1.
5. The Proposed Algorithm The introduced algorithm is described as follows: 1. Let level counter = 1 2. Let the current node = input image. 3. Decompose current node using wavelet packet tree. 4. Find the entropy of current node. 5. Find the entropy of decomposed components viz. CA1, CH1, CV1, CD1. 6. Compare the entropy of parent node with the sum of the entropy of child node. If the sum of the entropy of child nodes is less than that of parent node, then child node will be considered as a leaf node of a tree and repeat the steps 3, 4, 5, 6 for each
4. Wavelet Packet and Wavelet Packet Tree Ideas of wavelet packet is the same as wavelet, the only difference is that wavelet packet offers a more complex and flexible analysis because in wavelet packet analysis the details as well as the approximation are splited. Wavelet packet decomposition gives a lot of bases from which you can look for the best representation
26
GVIP Special Issue on Image Compression, 2007
7.
child nodes considering it as current node. Otherwise parent node acts as a leaf node of a tree. Stop.
6. Experimental Result We have implemented the proposed algorithm, wavelet packet best tree using Shannon entropy. It is tested on standard 256×256 colour images: wbarb, woman, wmandrila, leena and horizontal, vertical and diagonal line constructed images. Results are observed in terms of percentage of zeros, percentage of energy retained and signal to noise ratio. The best results are presented in paper. Figure 3 (a) shows the original Lena and Figure 3 (b) shows the result of proposed method. Figure 4 (a) shows the original Chitah and Figure 4 (b) shows the result of proposed method. Figure 5 (a) shows the original woman and Figure 5 (b) shows the result of proposed method. Figure 6 (a) shows the original Vertical line based image and Figure 6 (b) shows the result of proposed method.
Figure 5.a Figure 5.b Figure 5: Original Woman image and its result
Figure 6.a Figure 6.b Figure 6: Original vertical line and its result The table 1 shows the results of discrete wavelet transform and table 2 shows the result of proposed wavelet packet tree for different images.
Figure 3.a Figure 3.b Figure 3: original Lena image and its result
Name of The images
woman wbarb mandrilla lena chitta vertical horizontal diagonal
Figure 4.a Figure 4.b Figure 4: Original Chitah image and its result
Wavelet Percentage Percentage of zero of Energy in retained 91.90 99.04 95.38 98.98 85.07 98.03 97.32 99.86 96.16 99.70 91.58 99.34 95.85 99.89 88.19 99.88
SNR in db
Table 1: Discrete Wavelet Transform
27
43.88 43.60 37.54 65.32 57.40 48.53 68.46 65.45
GVIP Special Issue on Image Compression, 2007
Name of The images
woman wbarb mandrilla lena chitta vertical horizontal diagonal
11. David F. Walnut, “ Wavelet Analysis”, Birkhauser, 2002, ISBN-0-8176-3962-4. 12. Richard G. Baraniuk, “Nonlinear Wavelet Transforms For Image Coding via Lifting”, IEEE Transactions on Image Processing, August 25, 1999. 13. Miroslav Jalabow, “Fractal Image Compression”, ACM Library, pp. 320-326, 2003 14. Subhasia Saha, “Image Compression – from DCT to Wavelets: A review”, ACM Cross words students magazine, Vol.6, No.3, Spring 2000. 15. Andrew B. Wattson, “ Image Compression Using the Discrete Cosine Transform”, NASA Ames Research Center, Mathematical Journal, Vol.4, No.1, pp. 8188, 1994. 16. Wayne E. Bretl, Mark Finoff, “MPEG2 Tutorial Introduction & Contents Vedio Compression Basics Lossy Coding”, 1999. 17. Sarah Betz, Nirav Bhagat, Paul Murhy & Maureen Stengler, “Wavelet Based Image Compression – Analysis of Results”, ELEC 301 Final Project. 18. Aleks Jakulin “Base line JPEG and JPEG2000 Aircrafts Illustrated”, Visicron, 2002. 19. Paul Watta, Brijesh Desaie, Norman Dannug, Mohamad Hassoun, “ Image Compression using Backprop”, IEEE, 1996. 20. Rob Koenen, “ Overview of the MPEG-4 Version 1 Standard”, ISO/IEC JTC1/SC29/WG11 N4668, MPEG4 Articles, March 2002. 21. Sarah Betz, Nirav Bhagat, Paul Murhy & Maureen Stengler, “Wavelet Based Image Compression: Image Compression Theory”, ELEC 301 Final Project. 22. I. Andreopoulos, Y.A. Karayiannis, T. Stouraitis, “ A Hybrid Image Compression Algorithm Based on Fractal Coding and Wavelet Transform”, IEEE International Symposium on Circuits and Systems, May 2000. 23. Horoshoi Kondo and Yuriko Oishi, “Digital Image Compression using Directional Sub-block DCT”, International Conference on Communication Technology, Vol. 1, pp. 985-992, 2000. 24. A.P. Beegan, L.R. Iyer, A.E.Bell, “ Design and Evaluation of Perceptual Masks for Wavelet Image Compression”, DSP Workshop program, Session 3: Image Processing, Oct – 2002. 25. S. Rout and A.E. Bell, “Color Image Compression: Multiwavelets VS. Scalar Wavelets”, IEEE International Conference on ACOUSTIS Speech and Signal Processing, Vol.4, pp. 3501-3504, May 2002. 26. Jayshree Karlekar, P.G. Poonacha nad U.B. Desai, “ Image Compression using Zerotree and Multistage Vector Quantization”, ICIP, Vol.2, No.2, pp. 610, 1997.
Proposed Wavelet Packet Best Tree Percentage Percentage SNR of zero of Energy in db in retained 94.06 98.92 44.77 96.25 99.02 45.62 87.86 96.95 36.68 97.65 99.87 66.18 96.49 99.72 58.74 92.17 99.74 59.51 96.48 99.87 68.76 87.36 99.76 59.55
Table 2 Results of Wavelet Packet Best Tree 7. Conclusion In this paper, the Wavelet Packet Best Tree using Shannon entropy has been presented. An extensive result has been taken on different images. The results of discrete wavelet transform and the wavelet packet best tree are compared. The results demonstrate significant gain in percentage zero in Wavelet Packet Best Tree with good signal to noise ratio for standard as well as line based images. References 1.
B. Chanda and D. Dutrta Majumder, “Digital Image Processing and Analysis” Prentice-Hall of India, ISBN-81-203-1618-5, 2002. 2. Rafael C. Gonzalez and Richard E. Woods, “Digital Image Processing”, 2nd Edition, Pearson Education, ISBN-81-7808-629-8, 2002 3. C. Wayne Brown and Barry J. Shepherd, “Graphics File Formats” reference and guide, Pentice Hall, ISBN-1-884777-00-7, 1995. 4. Andrew S. Gllassner, “Principles of Digital Image Synthesis”, VOL-2, Morgan Kaufann, ISBN-155860-276-3, 1995. 5. Agostino Abbte, Casimer M. DeCusatis, Pankaj K. Das, “Wavelets and Subbands” Fundamentals and Applications, Birkhauser, 2002, ISBN-0-8176-4136X, 2002. 6. Lokenath Debnath, “Wavelet Transforms and TimeFrequency Signal Analysis”, Birkhauser, ISBN-08176-4104-1, 2001. 7. Rudra Pratap, “ Getting Started with MATLAB” A quick introduction for Scientists and Engineers, Oxford, ISBN-0-19-515014-7, 2003 8. Howard L. Resnikoff, Raymond O. Wells, Jr., “Wavelet Analysis” The scalable Structure of Information, Springer, ISBN-0-387-98383-X, 1998 9. L.Prasad, and S.S. Iyengar, “Wavelet Analysis with applications to Image Processing”, CRC press, ISBN0-8493-3169-2, 1997. 10. Anil K. Jain, “ Fundamental of Digital Image Processing”, Prentice –Hall, 2000, ISBN-81-2030929-4
28
GVIP Special Issue on Image Compression, 2007
Color Image Compression Using Wavelet Transform Bushra K. Al-Abudi and Loay A. George College of Science/University of Baghdad Baghdad-Iraq. E-mail:
[email protected]
Abstract This paper describes a color image compression scheme based on Haar wavelet transform. The vertical and horizontal Haar filters are composed to construct four 2- dimensional filters, such filters applied directly to the image to speed up the implementation of the Haar wavelet transform. Haar wavelet transform was used to map the image to frequency bands, each band was assigned a weighting factor according to its subjective significance. Such weighting factors were included through the computation process of the number of bits required to present the quantized indices of the wavelet coefficients. Zero tree and transmission progressive mechanism were designed and implemented through the encoding and decoding process. The analysis results have indicated that the performance of the suggested method is much better, where the constructed images are less distorted and compressed with higher factor.
are most often provided output image with direct RGB model. The luminance component represents the intensity of the image and look likes a gray scale version. The chrominance components represent the color information in the image[2;3]. In this paper, we proposed compression scheme based on wavelet transform, which involves a discrete wavelet transform (DWT) for an image followed by quantization process for the wavelet coefficients after using a suitable bits allocation scheme [4;5]. The wavelet transform is one of the most exciting developments in the signal-processing field during the past decades. This is especially true when it is utilized in compressing images [6;7]. It is a well-established technique to compress independently the components of colored images and it has attracted a great interest in the area of image compression because of its excellent localization in both spatial and frequency domains [8]. In wavelet transform, the decomposition (analysis) of the image is produced by an analysis filter bank followed by down sampling [9]. The image is decomposed into sub-bands correspond to higher image frequencies and higher sub-bands correspond to lower image frequencies where most of the image energy is concentrated. This is why we can expect the detail coefficients to get smaller as we move from high to low levels [10]. Only the lowest frequency sub-band is fed to the next level of decomposition. The goal of the sub-banding analysis is to transform the source image into alternative representation so that most of the energy is concentrated in the lowest frequency sub-band and in a few coefficients, to reduce the correlation and provide a useful data structure. The sub-band structure is the basis of all the image compression methods that use the wavelet transform. The reconstruction is obtained by applying an inverse operation to that of the decomposition. Since that most of the image energy is concentrated in the lowest frequency sub-band. Therefore, the quality of
Keywords: Color image compression, wavelet transform, zero tree coding, transmission progressive mechanism, YIQ color model
1. Introduction In a digital true- color images, each color component is quantized with 8 bits, and so a color is specified with 24 bits. As a result, there are 224 possible colors for the image. Furthermore, a color image usually contains a lot of data redundancy and requires a large amount of storage space. In order to lower the transmission and storage cost, image compression is desired [1]. Most color images are recorded in RGB model, which is the most well known color model. However, RGB model is not suited for image processing purpose. For compression, a luminance-chrominance representation is considered superior to the RGB representation. Therefore, RGB images are transformed to one of the luminance-chrominance models, performing the compression process, and then transform back to RGB model because displays 29
GVIP Special Issue on Image Compression, 2007
reconstruction of this sub-band has a great influence on the quality of the fully reconstructed image, for this reason this sub-band has to be coded with relatively high fidelity [10;11]. To improve compression performance we used zero tree coding and transmission progressive mechanism for detailed bands[12]. The reminder of this paper is organized as follows: Haar wavelet transform is described in section 2. In section 3, the Suggested compression Scheme is presented. Finally, the results and conclusions are given in section 4
sub-band LL of first level into four sub-bands defines second level, and divided sub-band LL of second level into four sub-bands defines third level [10]. To reconstruct the image, the same four two dimensional filters have been used.
3. The Suggested compression Scheme. The suggested scheme can be divided into process: A-Encoding process. The encoding process implies the following steps: 1-Apply forward (HWT) to produce (L) number of layers. The total number of sub-bands will be (3*L+1) 2-Determine the required total number of bits of the compressed file: 8 u Wid u Hgt Nb Cr Where, Wid is the image width, Hgt is the image height , Cr is the attended compression ratio. 3-Determine the minimum (Mn) and maximum (Mx) values of the approximation sub-band (LL band) and then determine their dynamic range (R). R = Mx – Mn The sub-band number of the approximation band will be (3*L +1). 4-Determine the standard deviation (V k )) of all detailed sub-bands whose sub-band number is between the [1,3*L], where (k) is the sub-band number. 5-Determine the size of each sub-band (S (k)) (i.e., number of coefficients). 6-Determine the weighting factor for each sub-band using one of the following suggested formulas: W0 (k ) 1
2. Haar Wavelet Transform Haar wavelet transform (HWT) consists of both: low pass and high pass filters is the preferred wavelet because it can be readily implemented in hardware [7]. The approximation band (LL) is the result of applying low pass filter in vertical and horizontal directions, the (LH) band is the result of applying horizontal low pass filter and vertical high pass filter, while the (HL) band is the result of horizontal high pass filter and vertical low pass filter, and finally (HH) band is the result of horizontal and vertical high pass filter. In this transform each (2x2) adjacent pixels are picked as group and passed simultaneously through four filters (i.e., LL, HL, LH, and HH) to obtain the four wavelet coefficients, the bases of these 4-filters could be derived as follows: The low and high filters are: L H
1 2 1
(1
1)
(1 - 1) 2 Thus the horizontal low pass followed by the vertical low pass filter is equivalent to: 1 §1· 1 §1 1 · ¨¨ ¸¸1 1 ¨ ¸ LL 2 ©1¹ 2 ¨©1 1¸¹ The horizontal high pass filter followed by vertical low pass filter is: 1· 1§ 1· 1§1 ¨¨ ¸¸1 1 ¨¨ ¸ HL 2 © 1¹ 2 © - 1 - 1¸¹ While the horizontal low pass filter followed by vertical high pass filter is equivalent: 1 §1· 1 §1 - 1· ¨¨ ¸¸1 - 1 ¨ ¸ LH 2 ©1¹ 2 ¨©1 - 1¸¹ and finally, the horizontal high pass filter followed by vertical high pass filter is: - 1· 1§ 1· 1§1 ¨¨ ¸¸1 ¨¨ ¸ HH - 1 1¸¹ 2 © 1¹ 2 © -1
1 L 1 I 1 W 2 (k ) 2L 2I 1 1 W3 ( k ) 3L 3I 1 Where the subscripts (0, 1, 2, and 3) are indices to indicate the weight type, (I) is the sub-band layer number, and it is determined from the sub-band number (k), by using the following formula, « k 1» if k 3 L 1 °« » 1 I ®¬ 3 ¼ °L if k 3L 1 ¯ W1 (k )
where
¬x ¼
denotes the largest integer value
which is less than ( x ). 7-Determine the number of bits allocated, B(k), for each sub-band by using the following equations:
if we apply forward (HWT)on test color image (e.g., luminance component),at the beginning of decomposition, the image will be divided into four sub-bands LL, HL, LH, and HH , this procedure defines first level sub-band coding, while divided
30
GVIP Special Issue on Image Compression, 2007
B(k )
Log 2 ( R) ° ® ° Log ( I V (k )) 2 f ¯
if k
« w( x, y, k ) M n » 0.5» if k 3L 1 °« Q k ( ) ¼ °¬ °« °« w( x, y, k ) Q(k ) / 2 0.5»» 1 if 1 d k d 3L °°« Q(k ) ¼» QI ( x, y, k ) ®¬ & w ! Q(k ) / 2 ° ° ° °0 if 1 d k d 3L ° w d Q(k ) / 2 & ¯°
3L 1
if 1 d k d 3L
where, (If) is the inclusion factor; V (k ) is the standard deviation of the kth band. 8-Compute the total number of bits (Tb) by using the following expression: T S (3L 1)W f (3L 1) B(3L 1) 3L
Tb
T
¦ S (k )W
f
where, w(x,y,k) is the wavelet coefficient at the position (x, y) in sub-band (k), and QI (x,y,k) is the corresponding quantization index. 14-For the detailed bands the quantization indices values QI (x,y,k) will have positive and negative values. Since the encoding process is much easier when the range of coded parameters is positive, thus the determined quantization index coefficients are mapped to the positive range by using the following mapping formula: °2Q I ( x, y, k ) 1 if Q I ( x, y, k ) ! 0 Q Ic ( x, y, k ) ® otherwise °¯2 Q I ( x, y, k )
(k ) B(k )
k 1
where, T is the total number of bits for the sub-band (LL); f is the index of weighting type. 9-Determine the scaling factor (D ) as follows: Nb D TB 10-Redetermine the number of bits required encoding each sub-band by using the following expression: B(k )
¬D W f (k ) B(k ) 0.5¼
where, k takes the values within the interval [1,3L+1). 11-Since the wavelet coefficients for the detailed bands are semi-symmetric and highly peaked around zero mean, thus the number of quantization levels should be odd and their number is more than (2) levels. In such case the number of the required bits for detailed sub-bands must be either more than one bit, or equal to zero, therefore we have apply the following condition: if B(k)< 2 then B(k)=0 for 1 d k d 3L 12-Determine the quantization coefficients by applying the following expressions:
Q(k )
R °° B ( k ) 2 1 ® 2I ( V ° f k) °¯ 2 B ( k ) 1
if k
In this case all positive indices values will mapped to odd indices and all negative indices values will correspond to even indices. 15- Apply the zerotree mechanism and the progressive transmission for all detailed bands, finally save the codewords (quantization indices) whose length (in bits) is not fixed, it depends on the bit allocation and the weighting factor for each band. B. Decoding process: The decoding process implies the following steps: 1-Perform bits decoding process, which is opposite to bits encoding process. 2-For all detailed sub-band, if the quantization index value Q Ic ( x, y, k ) for the coefficient in the considered sub-band is equal to zero then all corresponding coefficients in the next sub-band is equal to zero, otherwise recall the quantization index of the coefficient. 3-For each quantization index value QIc ( x, y, k ) belong to band number (k), use the following inverse mapping equation: Q Ic ( x, y , k ) 1 if Q Ic (x,y,k) is odd °° 2 Q I ( x, y , k ) ® ° Q Ic ( x, y , k ) if Q Ic (x,y,k) is even 2 ¯°
3L 1
if 1 d k d 3L
13-Determine the quantization index (QI) for each coefficient by applying the uniform quantization , as follows:
4-For each quantization index value QI(x,y,k), apply the dequantization process as follows:
31
GVIP Special Issue on Image Compression, 2007
1 °Q (x, y, k)Q(k) Q(k) 1 wc(x, y, k) ® I 2 °QI (x, y, k)Q(k) Mn ¯
produced luminance component and the produces compression ratio). Figures, (3), presents the reconstructed RGB image for YIQ color model. From the above results some remarks related to the behavior and performance of the proposed compression method could be presented as follows: A-The suitable value of inclusion factor is (1.6) to attain high compression ratio with acceptable quality of image. B-The index of weighting type (1) offers better compression performance in comparison with the others types, because in other types the scaling factor will increase which in turn leads to an increase in the bits allocated for each band and this will cause decrease in the attended compression ratio. C-The proposed method offers a compression performance up to (20/1) with little effects will be noticed on the image quality. D-The experimental results show that the threelevel decomposition scheme (i.e., 10 sub- bands) is a good practical choice. E-The increase in number of sub-band layers is necessary to attain high compression ratio.
if 1d k d 3L if k 3L 1
where, wc( x, y, k ) is the reconstructed wavelet coefficient. 5-Apply the inverse wavelet transform on the reconstructed wavelet coefficient to reconstruct the image.
4. Results and Conclusions The proposed method is applied separately on the luminance component and both chrominance components of satellite colored image which has 24b/p and its size is 256X256 pixel(see figure 1). All involved parameters, implied within this scheme, were utilized as control parameters to investigate the compression performance of the proposed method. These parameters are; Index of weighting type, inclusion factor, and number of layers. The effects of these parameters were investigated by considering several cases within the allowable range of their values, as shown in tables (1 -2) , while figure (2) illustrates the effect of the control parameters on the compression performance (including the quality of
Table (1) The effects of the control parameters on the compression performance of the proposed method applied on the luminance component, and the number of sub-band layers =3 Test color Image
Satellite
Index of weighting type
If
0
1
16.118
19.029
1
1
11.515
19.627
2
1
10.735
19.335
0
1.6
23.857
17.659
1
1.6
14.758
19.408
2
1.6
13.218
19.208
0
2
28.349
17.476
1
2
15.963
19.244
2
2
14.288
19.101
32
C.R.
PSNR(dB)
GVIP Special Issue on Image Compression, 2007
Table (2) The effects of the control parameters on the compression performance of the proposed method applied on luminance component, and the number of sub-band layers =2. Test color Image
Satellite
Index of weighting type
If
C.R.
PSNR(dB)
0
1
4.137
23.658
1
1
10.684
19.670
2
1
10.684
19.670
0
1.6
14.595
17.272
1
1.6
11.757
19.157
2
1.6
11.757
19.157
0
2
15.194
17.182
1
2
12.206
19.078
2
2
12.206
19.078
Fig. (1)The original colored image
33
GVIP Special Issue on Image Compression, 2007
Number of layers =3 Index of weighting type =1 Inclusion factor =1 C.R.=11.515 PSNR = 19.627
Number of layers =2 Index of weighting type =1 Inclusion Factor =1 C.R.=10.684 PSNR = 19.670
Fig. (2) The results of applying the suggested scheme on the luminance component .
Number of layers =3 Index of weighting type =1 Inclusion factor =1.6 C.R.=14.758 PSNR = 19.408
Number of layers =2 Index of weighting type =1 Inclusion factor =1.6 C.R.=11.757 PSNR = 19.157
34
GVIP Special Issue on Image Compression, 2007
References [1]C. Yang, J. Lin, and W. Tsai, “Color Image Compression by Moment-preserving and Block Truncation Coding Techniques”, IEEE Trans.Commun., vol.45, no.12,pp.1513-1516, 1997. [2] S. J. Sangwine and R.E.Horne, “The Colour Image Processing Handbook”, Chapman & Hall, 1st Ed., 1998. [3] M. Sonka, V. Halva, and T.Boyle, “Image Processing Analysis and Machine Vision”, Brooks/Cole Publishing Company, 2nd Ed., 1999. [4] B. Olson and B. Pain, “Wavelet Transform Image Compression”, 1998, http://www.comppub.com/publications/ [5] A. P. Beegan, L. R. Iyer, and A. E. Bell, “Wavelet-Based Color and Grayscale Image Compression Using Visual System Models”, 2000, http://www.ecpe.vt.edu/fac… [6] M. El-Sakka, “Advanced Topics in Image Compression”, 2002, http://www.csd.uwo.ca/faculty/elsakka/… [7] P. V. Sander, “Context-Based Color Image Compression”, 2000, http://www.cs.harvard.edu/~pvs/chaanel/channel.pd f [8] C. Paul, “A fifteen Minutes Introduction of Wavelet Transform and Applications”, 1998, http://www.gler/.noaa.gov/pers/liu/… [9] W. Ren and L. Bo, “An Image Compression Algorithm Based on Hybrid Coding”, Proceeding of ICCT, August 21-25,2000,Glodton, Beijing, China. [10] D. Salomon, “Data Compression”, 2nd Ed., Springer, New York, 2000. [11] S. K. Alexander, “Two and Three Dimensional Coding Scheme for Wavelet and Fractal -Wavelet Image Compression”, 2001, http://links.uwaterloo.ca/~simon [12] . M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelet Coefficients”, IEE J Trans. On Signal Proc., vol.37, pp.3445-3461, 1993.
Number of layers =3 Index of weighting type of luminance component=1 Index of weighting type of chrominance component=0 Inclusion factor of luminance component =1.6 Inclusion factor of chrominance component=2 C.R.=24.728 PSNR =20.585
Fig. (3) : The reconstructed RGB image.
35
GVIP Special Issue on Image Compression, 2007
36
GVIP Special Issue on Image Compression, 2007
A Method for Digital Image Compression with IDP Decomposition Based on 2D-SOFM VQ ¹Noha A. Hikal, ²Roumen Kountchev 1 Special Studies Academy 6 Elnasr St. Abas Elaakad, Madent Nasr 2 FCTT, Radiocomm. Dept. Technical University of Sofia, K1. Ohridsky 8, Sofia, Bulgaria E-mails:
[email protected];
[email protected]
The Neural Networks are good alternatives for solving many complex problems. In this paper a new developed algorithm for still image compression based on 2D-SOFM Neural Networks in correspondence with the method of Inverse Difference Pyramid (IDP) decomposition is represented. The new developed algorithm is well suited to be used in Progressive Image Transmission (PIT). The method advantages rely on the learning process and on the adaptation capability of the Neural Networks to reduce the matrices computation complexity found in other methods, to reduce the total number of pyramid levels required for PIT and to maximize the PSNR. In addition to, for image reconstruction no interpolation is needed anymore, which improves the quality of the reconstructed image. Key words: Inverse Difference Pyramidal Decomposition, SOFM Neural Network, Vector Quantization.
hierarchically in the order of importance, from the global image characteristics to its local details. There are two types of data structures for progressive transmission depending on the encoding method employed [2]: (i) Transform-based encoding, (ii) Spatial encoding. In the transform-based encoding, the image is first divided into a set of contiguous non-overlapping blocks, and then each block is transformed into a set of transform coefficients, (e.g, Discrete Cosine Transform (DCT) [3], the coefficients are then quantized before initiating its transmission. On the other hand, the spatial approach, like pyramidal encoding, generates a set of image frames at different resolutions; the image is successively reduced in spatial resolution and size by subsampling or averaging. An approximation of an image can be obtained using a single frame or a combination of frames of the image, therefore sending a set of image frames in pyramid form from top to bottom level naturally constitutes a progressive transmission.
1. Introduction
2. Pyramidal Image Representation
The image compression is an important tool to store and transmit visual information used for several applications. The image compression refers to a process in which the amount of data used to represent an image is reduced to meet a bit rate requirement (below or at most equal to the maximum available bit rate), while the quality of the reconstructed image satisfies a requirement for a certain application and the complexity of the computation involved is affordable for the application. The progressive Image Transmission (PIT) [1] concept is of particular importance in browsing large image files. The progressive transmission of an image permits the initial reconstruction of an approximation followed by a gradual improvement of quality in image reconstruction. A coarse copy of the image is sent first to give the receiver an early impression of image contents, and then subsequent transmission provides image details of progressively finer resolution. The observer may terminate the transmission of an image as soon as its contents are recognized. In order to send the image data progressively, it should be organized
The first pyramidal data structure is the Gaussian-Laplacian pyramid [4]. The Gaussian Pyramid (GP) can be viewed as a set of low pass filtered copies of the original image. The Laplacian Pyramid (LP) is a sequence of error images; each is a difference between two successive levels of the Gaussian pyramid. Various pyramid data structures for progressive image transmission have been proposed like [5]: Mean Pyramid (MP), Reduced Sum Pyramid (RSP), and Reduced Difference Pyramid (RDP). The Least Squared Laplacian Pyramid (LSLP) was developed in [6] for further refinement of the Laplacian pyramid. The corresponding LSLP is generated by adding an extra filter with auxiliary coefficients sequence after the down sampling process, this filter works on minimizing the energy of the Laplacian coefficients. The Reduced Laplacian Pyramid (RLP) [7] is designed with discarding the reduction filter and adopting a halfband interpolator. In [8] a contrast pyramid coding technique, which differs in the way of computing the difference image, was discussed. Instead of using the difference image to
Abstract
37
GVIP Special Issue on Image Compression, 2007
represent the information difference between two successive levels, the coding scheme consists of the generation of the contrast image with a simple nonlinear algorithm and a simple compandor model is used. The efficiency of the contrast pyramid method results from the contrast image coding. The Centered pyramid [9] differs from the other pyramids methods in that each coarser level node is suited exactly in the center of its finer level predecessors which offers an accurate way of up projection and this makes it helpful in contour, multiscale detection and object recognition. The Hierarchy Embedded Differential Image (HEDI) [10] is a technique similar to RDP but expanded and generalized to improve the speed for compressing and decompressing processes. Many pyramidal decomposition techniques showed improvements over the standard Joint Photographers Experts Group (JPEG) [11]. The Inverse Pyramidal Decomposition, based on the multiple Discrete Cosine Transform (IDP/DCT), was proposed in [12]. The IDP decomposition [13] differs in the way of obtaining the pyramid levels, and the word "inverse" refers to the requirement to compute the pyramid levels from pyramid top (level zero) to the bottom. The novelty lies in the modeling performed at each pyramid level, which relies on the DCT of the input subimage. This new pyramid can be compared with subband DCT [14] to notice that the IPD/DCT offers better performances in terms of compression ratio with a fixed PSNR for each pyramid level. The coding and decoding processes are simple and flexible. Therefore, it is well adapted to the needs of users, and it is relatively simple for real time processing. Since the pyramid top now consists of the low frequency coefficients of the DCT, this makes it in correspondence with the requirement for PIT. Although DCT is image independent, which may provide more simplicity than other methods, the number of coefficients obtained using DCT is too big, compared with other transforms, which ensures smaller compression ratio than other methods. NN's are interesting alternatives to classical image processing and compression techniques due to their lower matrices computational complexity and adaptation, and in addition to the NN's structure features such as their massively parallel structure, high degree of interconnections, capability of learning and self-organizing. In [15] an algorithm for image compression based on IDP using Back Propagation NN’s was introduced. The algorithm showed superiority over IDP/DCT in terms of PSNR, compression ratio, and the total number of levels required for the image reconstructing at the receiver side. In this work, a new technique for image compression based on merging IDP and 2-Dimension Self Organizing Feature Mapping Vector Quantization [16] is considered. This technique combines the advantages of both methods IDP and NN's. The adaptation and learning capabilities of NN's could improve the performance of IDP, decrease the number of pyramid levels, increase the quality of the reconstructed image, maximize PSNR and minimize the mean square error. This paper is organized as follows: Sections (3) and (4) introduce a definition for the Vector Quantization technique, and a method for designing a VQ using a 2D-SOFM NN, respectively. Section (5) introduces the new developed algorithm for image coding /decoding using the 2D-SOFM NN based on IDP. Sections (6) and (7) represent the performance measurements criteria and the simulation results respectively. Finally Section (8) gives a conclusion on the results of the new developed algorithm.
3.Vector Quantizer with 2D-SOFM NN The Vector Quantization is a lossy data compression method based on the principle of block coding. It can be viewed as a mapping Q from a K-dimensional vector space RK onto a finite subset C of RK Q: R K ĺ C Where C={ci | i=1,2,...,M} is the set of reproduction vectors and M is the number of vectors in C . Each ci in C is called codeword and C is called codebook for the VQ. As the average number of bits required in representing the codewords (rate of the quantizer) increases, the quality of the reconstructed data increases as well. The average quantization error between the input source signals and their reproduction codewords defines the distortion of the vector quantizer. Increasing the number of codewords in the codebook can decrease the distortion of the vector quantizer and normally will increase the rate also. The major concern for the vector quantization design is the trade-off between the distortion and the rate of quantizer. SOFM has been extensively applied to VQ to solve the main problem associated to the classical VQ techniques, which is rather sensitive to codeword errors. Due to the SOFM capability to form ordered topological feature maps, i.e., after training, the SOFM weight vectors are spatially ordered in an array, the neighboring vectors in the map are more similar than the more remote ones, which results in optimal codebook and partitions design. The SOFM NN [16] is characterized as being unsupervised learning algorithm, in which the output network neurons compete among themselves to be activated; in result only one output neuron or one neuron per layer wins the competition. The output neurons that win the competition are called “winner-take all” neurons. The lateral inhibitory connections between neurons are used for inducing the winner neurons. In 2D-SOFM, the neurons are placed at the lattice nodes; the lattice may take different shapes: rectangular grid, hexagonal, even random topology. The neurons become selectivity tuned on various input patterns in the course of competitive learning process. The locations of the neurons (i.e, the winning neurons) so tuned, tend to become ordered with respect to each other in such a way that a meaningful coordinate system for different input features to be created over the lattice.
4. Procedure for Designing Vector Quantizer based on 2D-SOFM NN Given a 2D input image pattern to be mapped onto a 2D spatial organization of neurons located at different positions (i,j)’s on a hexagonal lattice of size nxn. Thus, for a set of nxn points on the 2D plane, there would be n2 neurons Nij: 1i,jn, and for each neuron Nij there is an associated weight vector denoted as Wij. In SOFM, the neuron with minimum distance between its weight vector Wij and the input vector X is the winner neuron, and it is identified using the following criterion. Finding the (k,l) winner neuron, where: (1) X W kl || min 1d jd n min 1d i d n X W ij ||
>
38
@
GVIP Special Issue on Image Compression, 2007
is an associated weight vector Wi.=[wi1 wi2.......wi22n] (Figure.1).
After the position of the (i,j)th winner neuron is located in the 2D plane, the winner neuron and its neighborhood neurons are adjusted using SOM learning rule as: (2) W t 1 W t D X W ij t
Input Vector
where: Į is the Kohonen’s learning rate to control the stability and the rate of convergence. The winner weight vector reaches equilibrium when Wij(t+1)=Wij(t). The neighborhood of neuron Nij is chosen arbitrary-it can be a square or a circular zone around Nij of arbitrary chosen radius. Algorithm 1 summarizes the training procedure [17]. After the repetition of the selection process of winner neurons followed by weights adaptation of the winner neuron and its neighborhood, the system reaches the equilibrium with all weights vectors being constants. Now, a 2D plane representing a spatial mapping of the neurons (corresponding to the input pattern) is already obtained. All the input patterns will be mapped on the 2D plane, each having its own weight vector Wij. The indices of the winner neurons are then transmitted. At the receiver side, the codebook contains all the indices of the existing neurons. Once a searching phase for winner neurons indices is done the receiver will be able to reconstruct the input pattern again.
Weights
1
Then
Adjust
For i c
>
@
k G : k G l G : l G
W i cjc t 1
W i cjc t D X W i cjc t
End for G
G H * / G is neighborho od radius , H shrinkage factor
End for
Repeat
C
Bias
Step 5: At the equilibrium, there are m winner neurons per block, or m codewords per block. Hence, there are (m x number of blocks) winner neurons (codewords) for the whole image together forming the coefficients of the first pyramid level (p=0). Step 6: The indices of the obtained codewords are then quantized, encoded by RLE and adaptive Huffman encoding methods 18], and then the obtained symbols are transmitted. Step 7: At the transmitter side, the reconstructed image blocks, of same size as the original ones, will be restored from the indices of the codewords (after the decoding process) using the codebook generated before by the 2D-SOFM learning process. The pixel by pixel difference is calculated between the original image and the restored image blocks to obtain a difference image E1 of same size as the original one 2Nx2N. Step 8: For the second pyramid level (p=1), starting with the obtained difference image E1, each block is divided into four blocks, each of size (m/2)x(m/2), and the procedure starts again from Step(2) with the new dimensions. The obtained (m/2) codewords per block multiplied with the number of blocks in the second pyramid level constitute the total number of coefficients for the second pyramid level. Step 9: The procedure continues in the same manner for each new level, starting with the obtaining of the difference image EP followed by dividing each block into four sub-blocks and processing each individually by 2D-SOM NN (Figure 2) The stopping criterion here is when the obtained mean square error between two subsequent levels becomes as small as required.
weights of neighborho od neurons of N kl as
Begin For jc
+
Step 3: The weights vectors are initiated for all the neurons in the lattice with small random values. Step 4: The learning input patterns are applied to the network, considering the image blocks themselves. The Kohonen’s competitive learning process (Algorithm1) tends to identify the winning neurons that best match the input blocks. The best matching criterion described here is equivalent to the minimum Euclidean distance between the vectors. Hence, the mapping process Q that identifies the neuron that best matches the input block X is determined by applying the following condition: (3) QX arg i min X Wi i 1,2,......M
Begin min 1d i d n min 1d jd n X W ij
Nkl 2nX1
Figure.1. Architecture of SOM network
1: n
X W kl
MX2n
MX1
Begin
if
Dist
(2n)2X1
1: n
For j
W
mX1 X
Begin For i
Competitive layer
End for until G d pre assigned quantity ; End
Algorithm 1: 2D-SOFM training loop
5. Algorithm for 2D-SOFM VQ Image compression based on IDP The basic steps involved in the image data compression/decompression in accordance with IDP 2DSOFM VQ are introduced in the following two subsections. A. Coding process Step 1: For the first pyramid level (p=0), the whole image B(i,j) of size 2Nx2N is divided into blocks, each of them of size 2n x2n pixels; n
B. Decoding process
39
GVIP Special Issue on Image Compression, 2007
Step 1: The received symbols are decoded and de-quantized in order to obtain the transmitted codewords indices values. Step2: The original image is reconstructed at the receiver side in accumulation way, as illustrated in Figure 2 based on twodimensional vector quantization (VQ) with self-organizing neural network (NN 2D-VQ SOFM) and the final ˆ ( 2 n ) from all (n+1) levels is defined reconstructed image B with the relation: n 1 ~ n ~ ˆ (2 n )] [B [B ( 2 )] [E p1 (2 n )] [E n 1 (2 n )] (4) 0
r ~ ~ n n ˆ ( 2 n )] [ B for r = 1,2,..,n-2, (10) [B r 0 ( 2 )] ¦ [ E p 1 ( 2 )] p 1
ˆ (2 n )] is a matrix, which is a block of the where [B r ˆ (2 N )] . corresponding restored image [B r
p=0 Input B(i,j)
¦
The decomposition components in Eq.(4) are matrices, defined as follows:
Q & Enc. Transmit
+
NN
(5)
where VQ 2 D {$} and VQ 21D {$} are correspondingly the operators for direct and inverse two-dimensional vector quantization (VQ2D) based on NN SOFM. The last is trained using a sequence of blocks [Bq(2n)] for q = 1, 2, . . , Q, when Q=4N-n - number of blocks in the image for pyramid level p=0; ~ x For the level ɪ = 1, 2, . . , n-1 the matrix [E p1 (2 n )] is an
~4p 2p 2 np [Ep1 (2 )]
-
NN
p=2 E2(i,j)
+
Q & Enc. Transmit
+
NN
~ E 2 (i, j)
ˆ (i, j) B
approximation of [E p1 (2 n )] , as follows:
+
~ E1 (i, j)
+
~ [B0 (2 n )] VQ21D {VQ2D {[B(2n )]}} ,
Fig.2. Scheme of the 3-level IDP-SOFM NN decomposition
~p [E2p1(2np)] º » ~ p1 [E2p1 (2np)]» (6) » » ~p [E4p1(2np)]»¼
Communication channel
p=n
D
C
En-1(i,j)
En-1(i,j)
Bn(i,j)
E1(i,j)
+
+
p=2
SOFM NN
SOFM NN
E1(i,j)
2 n -p u 2 n -p and is defined in similar way, as the one, presented in Eq. (5): ~k k [E p-p1(2 n -p )] VQ 2D1 {VQ 2D {E pp 1 (2 n p )}} , (7)
E0(i,j)
PProcessing Unit 2
Pyramid
E1(i,j)
+ p=1
SOFM NN
B2(i,j)
+
Coefficients
+ _
SOFM NN B1(i,j)
E0(i,j) B0(i,j)
+
k
where [E p-p1(2 n -p )] is a difference sub-matrix, defined with the ~ n [B(2n )]-[B for p 1; 0 (2 )] ° ® ~kp1 n-p °[Ekpp21 (2np )][E p2 (2 )] - for p 2,3,..,n -1, ¯
Decoding
Encoding
~k Here [E p-p1(2 n -p )] for kp = 1,2,..,4p are the sub-matrices of the ~ matrix [E p1 (2 n )] , obtained in result of its quad-tree division ~k in 4p sub-blocks. Each sub-matrix [E p-p1(2 n -p )] is with size
k [Epp1(2np )]
NN
NN
approximation of [B(2n)], defined with:
relation:
~ B0 (i, j)
-
p=1 E1(i,j)
~ n x For the level p = 0 the matrix [B 0 (2 )] is a zero
~ [E2p1(2np)] ~p [E2p12(2np)]
NN
+
p 1
~1 np ª[E p1(2 )] « ~2p 1 np [E (2 )] ~ [Ep1(2n)] «« p1 « ~4p 2p 1 np «¬[Ep1 (2 )]
Q & Enc. Transmit
NN
_
SOM
+
D E0(i,j)
+
SOFM NN
p=0
SOM
C
SOFM NN
D
B(i,j)
(8)
B0(i,j)
SOM
(5)
B0(i,j)
Fig.3. Generalized scheme of IDP-SOFM NN: Encoding, Decoding.
k
The code books for the VQ of the sub-matrices [E p-p1(2 n -p )] are obtained training the corresponding SOFM networks with k
the sequences [ [E pp 1,q (2 n p )] for q = 1, 2, . . , Qp, when
6. Evaluation of the method efficiency ˆ (2 N )] for r-level The quality of the restored image [B r decomposition could be evaluated using the Peak Signal-toNoise ratio (PSNR) in dB, as follow:
Q p 4 N n p - number of sub-matrices in pyramid level p; x For the level p = n the matrix [E n1 (2 n )] of the residual image is defined with:
PSNR (r ) 10 lg10 [B 2max / İ 2 (r )]
~ [E n1(2 )] [E n 2 (2 )] [E n2 (2n )] n
n
(9) For the case, when the decomposition in Eq. (4) is truncated up to the level r, the reconstructed image is obtained as:
40
r = 0, 1, . . , n-1,
(11)
GVIP Special Issue on Image Compression, 2007
where Bmax is the maximum level for every pixel from the 2
N
original image [B(2 )], and İ (r) is the mean square error of
ˆ (2 N )] : the restored image [B r n
2
H (r )
n
1 Q 2 2 2 ¦¦¦ İ r,q (i,j) 4n Q q 1 i 1 j 1 n
running time was 02:07:84, but for the resized “Flowers” 128x128 pixels, the total time was 00:56:22, including the training phase (the training phase have to be done the first time only, after that the codebook for each image can be saved and be used directly for compression without training).
(12)
n
1 Q 2 2 ˆ (i,j)] 2 [B q (i, j) B ¦¦¦ r,q n 4 Qq1i1 j1 Here İ r.q (i, j) is the (i,j)-th element of the error matrix for the block q = 1,2,..,Q:
ˆ (2n )] [İ r,q (2n )] [Bq (2n )] [B r,q n 1
¦
[E p1,q (2 n )] [E n 1,q (2 n )]
p=0; PSNR=23.2 dB; 0.016 bpp
(13)
p=1; PSNR=27 dB; 0.009 bpp
p r 1
The compression ratio at certain level (r) is defined with the relation: CR(r) = 4Nb/Nc(r) (14) Where: b is the number of the bits used to represent every pixel of the original image matrix [B(2N)], and Nc is the total number of transmitted bits at level (r) and is computed as follows: (15) N c (r ) Q r log 2 N cb N cb 4 n r b ,
p=2; PSNR=31.4 dB; 0.009 bpp p=3; PSNR=43.8 dB; 0.02 bpp Fig.4. Qualitative results obtained for original “Lena” using 2D-SOFM VQ
Where Qr = 4N-n+r is the number of blocks in level r; Ncb number of codevectors in codebook for each level; 4n-r number of components in one codevector for level r; b number of bits for one component of codevector. The total compression ratio over the whole number of levels regarding Eq.(15) follows: r
CR(r)
4 N b/ ¦ N c i
(16)
i 0
7. Simulation Results
p=0; PSNR=21.3dB; 0.016 bpp
A MATLAB Version 6.5, computer simulation program was designed for simulating the IDP/2D-SOFM VQ for image compression. Experiments have been performed on grayscale and color images as well. For color images, the described algorithm can be performed applying it on the matrix of every primary color component: R, G, B. In order to obtain higher compression ratio, the R, G, B components of every pixel (i,j) were transformed in Y,Cr,Cb format. A codebook consisting of 32 codewords and arranged over a 2D hexagonal lattice was designed. The results for PSNR, compression ratio, and number of levels for a number of test images were reported and compared. Fig.4 shows four consequent pyramid levels for “Lena”, Fig.5 represents the first level and the last level for different test images. PSNR and Bit Rate were computed using relations (11)-(15). A comparison results between IDP/DCT, IDP/BPNN [15], and IDP/2D SOFM NN for PSNR, number of levels, and bit rate are shown in Table 1. The superiority of the new developed algorithm in terms of PSNR and bpp can be noticed. Table 2 shows the a comparative results of the developed algorithm with the standard JPEG2000 for the same compression ratio (results of JPEG are obtained with Lura Smart Compress software) , the results obtained from the developed algorithm can be further compressed by applying encoding techniques.The total
p=3; PSNR=38.3dB; 0.02 bpp
p=0; PSNR=20.3 dB; 0.016 bpp
p=4; PSNR=55 dB; 0.02 bpp
p=0; PSNR=21.2 dB; 0.126 bpp
p=3; PSNR=34 dB; 0.2313 bpp
Fig.5. Results obtained for original images “peppers”, “text” and ”flowers” using 2D-SOFM VQ
41
GVIP Special Issue on Image Compression, 2007
obtained in result of the NN training. Depending on the method application, these parameters will be selected so that to ensure the necessary quality and compression ratio.
Table 1 Modeling results for IDP-DCT compared with BPNN-IDP, and 2D-SOFM VQ, testing original "Pepper" 512x512 pixels Comparison criteria
IDP-DCT
IDP-BPNN
Total number of levels required Transformation
9
2
Number of transmitted coefficients/level PSNR at last level [dB] bpp at last level
DCT 4/Block
2D-SOFM NN VQ
The basic contributions of this work are the development of the new method for image compression, described with the Eqs. (4)-(10), the relations (11)-(16), which are used for the evaluation of the coding efficiency, and the comparative results obtained with the method modeling. These results show the ability for the future method improvement and development.
4
31.06
BP learning Square root (blocksize) /Block 36.13
2D-SOFM Indices of the codevectors /Block 37.3
0.52
0.125
0.02
9. Acknowledgements We would like to acknowledge the financial support of the NSF of Bulgaria (BU-ɆI-104), and the Special Studies Academy at Cairo.
Table 2 Comparison between the standard JPEG2000 and 2D-SOFM VQ for the same compression ratio. Test Images Lena Barbra Peppers Text Circles Crosses Flowers
Compression Ratio (K)
30.73 30.73 30.73 30.73 8.3 8.3 47.88
CR (bpp)
0.26 0.26 0.26 0.26 0.96 0.96 0.50
JPEG2000 PSNR (dB)
2D-SOFM PSNR (dB)
32.30 28.9 34.84 18.17 55.33 50.83 23.17
25.39 26.68 37.3 23.2 56.3 55.2 16.2
10. References [1] K. Tzou, "Progressive Image Transmission: A review and comparison of techniques" Opt. Eng., Vol 26, July 1987, pp. 581589. [2] A. Poularikas. "The Transforms and Applications Handbook", CRC Press, 2000. [3] K. Rao, P. Yip. "The Transform and Data Compression Handbook". CRC Press, 2000. [4] P. Burt. E. Adelson. "The Laplacian Pyramid as a Compact Image Code", IEEE Trans. Commun., Vol 31, No. 4, April 1983, pp. 532-540. [5] L. Wang, M. Goldberg. "Comparative Performance of Pyramid Data Structures for Progressive Image Transmission", IEEE Trans. Commun. Vol. 39, No.4, April 1991, pp.540-548. [6] M. Unser. "An Improved Least Square Laplacian Pyramid for Image Compression" IEEE Signal processing 27, 1992, pp. 187-203. [7] B. Aiazzi, et al. "A Reduced Laplacian Pyramid for Lossless and Progressive Image Communication". IEEE Trans. Commun. Vol. 44 No.1, 1996, pp. 18-22. [8] Tian Hu Yu. "Novel Contrast Pyramid Coding of Images", Proceedings of the 1995, ICIP. [9] P. Brigger et al. "Centered Pyramids" IEEE Trans. on Image Processing, Vol. 8 No 9, Sep. 1999, pp. 1254-1264. [10] W. Kim, et al. "Hierarchy Embedded Differential Image for Progressive Transmission using Lossless Compression" IEEE Trans. Circints and systems for Video technology. Vol. 5, No.1, Feb. 1995, pp. 2-13. [11] W. Kou, ”Image compression Algorithms and Standards”. Kluwer Academic Publishers, Boston, 1995. [12] R. Kountchev, et al. "Inverse Pyramidal Decomposition with Multiple DCT". Signal processing. Image communication, Elsevier 17, 2002, pp. 201-218. [13] R. Kountchev, et al. “Lossless image compression with
8. Conclusion Regarding another famous pyramidal decompositions (for example RLP, RDP, LSLP, etc..), the basic advantages of IDP are the lower computational complexity when the image data is transferred consecutively in correspondence with the pyramid levels processing. This approach permits to obtain a coarse image approximation at first, which becomes better and better with every consecutive pyramid level. The inverse pyramid decomposition permits to optimize the basic parameters in order to find the necessary trade-offs between the high compression ratio and the restored image quality. This optimization is obtained relatively easy using VQ with NN 2D-SOFM, applied on the sub-matrices in every level of the IDP decomposition. The basic disadvantage of the IDP decomposition based on 2D-SOFM VQ is the necessity to train the VQ NN, which requires some time. This disadvantage could be overcome to a significant degree with proper selection of the implementation of the calculation algorithm, for example with recursive data processing in the different pyramid levels, with applying parallel data processing, etc. It is possible as well, to perform the NN training once for the images from certain class, and to train the NN again only when the image class is changed. The noise, obtained in result of the VQ, decreases the quality of the restored image, which is seen in the results obtained with the method modeling. In order to decrease this noise is possible to apply adaptive filtration of the decoded image, which will be one of the aims of the future research work. The basic parameters of the new algorithm, which define the compression ratio and the restored image quality, are: the number of pyramid levels, the number of neurons in the VQ codebooks, the size of the image sub-blocks, the error
IDP and Adaptive RLC”. Proc. of Intern. Conf. on Imaging Science, Systems and Technology, Las Vegas, USA, June 2004, pp. 608-612. [14] Y. Wu, et al. "Low Bit Rate Subband DCT Image Compression" IEEE Trans. Consumer Electionic, 43 (2) May 1997, pp. 134-140. [15] N. A. Hikal, R. Kountchev. ”BPNN for image compression
based on IDP Decomposition”. Proceeding of ICEST, June 2005, pp. 15-18. [16] Y. H. Hu, J. N. Hwang "Handbook of Neural Networks and Signal Processing”, CRC Press 2002. [17] MATLAB version 7, Neural Networks toolbox. [18] D. Salomon "Data Compression: The complete reference" 3rd Ed., Springer-Verlag, New York, Inc. 2004.
42
GVIP Special Issue on Image Compression, 2007
A New Image Compression and Decompression technique based on 8x8 MRT M. S. Anish Kumar, Rajesh Cherian Roy, R. Gopikakumari Division of Electronics Engg, School of Engineering, Cochin University of Science and Technology, Kochi-22, Kerala, India [anishdoe, rajeshcroy, gopika]@cusat.ac.in
http://www.cusat.ac.in/
2. MRT
Abstract
Let the elements of X n1,n 2 ,0 d n1 , n 2 d N 1
Data compression is important in storage and transmission of information. Especially digital image compression is important due to the high storage and transmission requirements. Various compression techniques have been proposed in recent years to achieve good compression. This paper presents a new Image compression technique, based on MRT [1].
a
data
matrix
be
and the MRT coefficients be Yk( p,k) , 0 d k1 , k 2 d N 1 and 0 p M-1, M=N/2 1 2
where Yk( p,k)
1 2
Keywords: MRT, image compression
¦
( n1 ,n2 ) z p
X n1 ,n2
¦
( n1 ,n2 ) z p M
z = ((n1 k1 + n2 k2))N
1. Introduction A picture may be worth thousand words, but it requires far more memory, bandwidth and transmission time. The present technology demands the transmission of images with small size, reduced bandwidth and high speed. So we need to compress the image with maximum compression ratio and the best quality. Transform coding using the 2-D block discrete cosine transform (DCT) is a proven method for image compression and widely used by both the academic and industrial image processing communities [2]-[3]. In transform coders [4], the image to be compressed, is subject to an invertible transform with the aim of converting the statistically dependent image coefficients to independent coefficients by redundancy reduction. Further compression is achieved through irrelevancy reduction [5]. Data that are considered irrelevant and do not convey significant information about the image are discarded. The computation time is an important factor in choosing image compression / decompression technique which in turn depends on the selection of transform. In this work redundancy reduction is achieved through the transform named MRT, which involves only real addition and irrelevancy reduction by applying threshold.
X n1 ,n2
(1)
(2)
Equation (1) maps the NxN data matrix into M matrices of size NxN in the frequency domain using real additions only. The inverse transform relation is as follows 1 1 (q) (3) X n1 ,n2 ¦ X n ,n , 0n1, n2 N-1 N2 q 0 1 2 Where Xn( q,)n
1 2
¦
Yk( p,k)
( k1,k2 , p ) j q
1 2
¦
( k1,k2 ,p ) j qM
j = (( ((-((n1k1 + n2k2))N ))N + p))N
Yk( p,k)
1 2
(4)
(5)
On analysis of the MRT matrices it is found that a majority of its coefficients are redundant. The MRT of an 8x8 data block will have 256 MRT coefficients of which only 64 are unique and have fixed position. Thus only 64 unique coefficients are required for reconstruction of the original data. This property of MRT is exploited in image compression. Both MRT and MRT based Image compression are evolving subjects.
43
GVIP Special Issue on Image Compression, 2007
3. MRT Based Image Compression The MRT based Image compression utilizes the 64 unique MRT coefficients to represent each 8x8 block of the image for encoding. Certain coefficients among the unique coefficients are irrelevant for reproducing the original image block with some acceptable error [5]. This irrelevancy is removed by applying threshold to the unique coefficients. The value of the threshold can be chosen with a compromise between compression and quality. The coefficients that overcome the threshold in each block and their corresponding positions in the MRT matrices are stored in separate linear arrays and encoded using arithmetic coding. The steps involved in the compression technique are shown in figure 1.
Fig. 2. Lenna (a) original and (b) reconstructed
Threshold 100 150 200 250 300 350 400
Fig. 1, Block diagrm of a standard transform coder
In the decompression process, both position vector and coefficient vector are decoded using arithmetic decoding. The MRT matrices are formed for each block by placing the unique coefficients from the coefficient vector with reference to the positions in the position vector. The inverse MRT is applied and the output image block is placed in the corresponding position. This process is repeated for all the blocks.
% compression 79.54 86.60 89.97 92.39 93.65 94.77 95.48
bpp PSNR(dB) 1.64 1.07 0.80 0.61 0.51 0.42 0.36
30.22 27.88 26.29 25.14 24.31 23.59 22.98
Table 2
4. Simulation Results The simulation is performed for different types of images with fixed threshold, table 1, and on lenna image with variable threshold, table 2 and figures 3 to 6, to analyse their effect on bits per pixel (bpp), peak signal to noise ratio (PSNR) and percentage compression. Figure 2(a) shows the original lenna image and figure 2(b) its reconstructed image with bpp 1.07 and PSNR 27.88. The table 1 and figure 2 shows that the MRT based compression technique gives good quality image with very high percentage compression for variety of images.
Image Lenna Cameraman Ic Eight Bacteria Tire
Thresh bpp old 150 150 150 150 150 150
1.07 0.91 0.78 0.52 0.73 0.90
Fig. 3, bpp versus Threshold
PSNR
% Compression
27.88 28.91 30.14 30.90 30.23 28.90
86.60 88.57 90.26 93.46 90.91 88.78
Table 1
Fig. 4, Percentage compression versus Threshold
Fig. 5, PSNR(dB) versus Threshold.
44
GVIP Special Issue on Image Compression, 2007
Table 2 and figures 3 to 5 shows that bpp, PSNR and percentage compression are dependent on threshold, which can be chosen depending on whether quality or compression is important. As threshold increases PSNR & bpp decreases and compression increases. Figure 6 illustrates the variation of bits per pixel as a function of PSNR.
6.
Acknowledgements
The authors wish to thank Indian Space Research Organization (ISRO) and Kerala State Council for Science, Technology and Environment (KSCSTE) for providing support for this work.
7.
References
[1] Rajesh Cherian Roy, and R. Gopikakumari “A new transform for 2-D signal representation (MRT) and some of its properties”, 2004 IEEE International Conference on Signal Processing & Communications (SPCOM), pp. 363- 367 Dec. 2004, Bangalore, India. [2] “CCITT SG XV working party XV/4, Specialist group on coding of visual telephony, Description of reference model 8 (RM8),” Doc. 525, June 1989. Fig. 6, PSNR (dB) versus bpp.
[3] K. Rose, A. Heinman, and I. Dinstein, “DCT/DST alternate transform image coding,” IEEE Trans. Commun., vol. 38, pp. 94-101, Jan. 1990.
A comparison between DCT and MRT based image compression techniques is presented in table 3. The DCT based image compression is implemented as in [4]. The table illustrates that the MRT based image compression technique gives better compression with better PSNR for nearly equal or less bpp over traditional DCT based approach. MRT based approach has the advantage over that of DCT in compromising between quality and percentage compression by varying threshold. The MRT based image compression technique, proposed here, takes more computation time compared to DCT based approach. The computational overhead can be reduced by computing the 64 unique coefficients only instead of all the 256 MRT coefficients corresponding to each 8x8 data block. Image
Threshold
bpp
% comp.
[4] R. C Gonzales and R. E Woods, “Digital Image Processing” Reading, MA: Addison-Wesley, 1992. [5] H. Musmann, P.Pirsch, and H. Grallert, “Advances in picture coding” , proc, IEEE, pp. 523542, Apr. 1982 M S Anish Kumar received M.Sc. degree in Electronics from the Cochin University of Science And Technology, Kochi, Kerala, India in 2003. He is currently enjoying the junior research fellowship of Kerala State Council for Science Technology and Environment (KSCSTE) and pursuing PhD degree at the Cochin University of Science And Technology. His research interests include image compression, image filtering etc.
PSNR (dB)
DCT MRT DCT MRT DCT MRT
26.40
Lenna
195
0.84 0.82 89.45 89.74 27.81
Cameraman
200
0.73 0.70 90.87 91.22 26.61 27.34
Ic
180
0.71 0.698 91.09 91.27 27.05 29.41 0.53 0.52 93.39 93.46 30.87 30.90
Eight
150
Bacteria
170
0.69 0.68 91.39 91.49 31.28 29.72
Tire
170
0.83 0.82 89.64 89.78 31.77 28.33
Rajesh Cherian Roy received M.Tech. degree in Microwave Electronics from the Cochin University of Science And Technology in 2001. He is currently working as lecturer at the Rajagiri School of Engineering & Technology, Kochi, Kerala, India, and pursuing PhD degree as a part-time research scholar at the Cochin University of Science And Technology. His research interests include image compression.
Table 3
5.
Conclusions
The paper demonstrates the potential of the MRT based image compression technique. The advantage of this method is the flexibility in determining the percentage compression at the expense of image quality by choosing appropriate threshold. This work motivates many more investigations in image compression based on MRT.
R. Gopikakumari received B. Sc (Engg.) degree from the Kerala University and M. Tech & PhD degree from the Cochin University of Science & Technology in the year 1984, 1987 & 1999 respectively. She is working in Cochin University of Science & technology since 1988 and currently she is the Head of the Division of Electronics Engineering. Her fields of interest are Digital Signal Processing, Image Processing, Neural Network etc..
45
GVIP Special Issue on Image Compression, 2007
46
GVIP Special Issue on Image Compression, 2007
IMAGE COMPRESSION USING CONTOURLET TRANSFORM AND MULTISTAGE VECTOR QUANTIZATION S.Esakkirajan1, T.Veerakumar2, V. Senthil Murugan3, R.Sudhakar4 Department of Electrical and Electronics Engineering, PSG College of Technology 2,4 Department of Electronics and Communication Engineering, PSG College of Technology Peelamedu, Coimbatore-641 004, Tamilnadu,India
[email protected],
[email protected],
[email protected] ,
[email protected] 1,3
applications such as TV transmission, video conferencing, facsimile transmission of printed material, graphics images, fingerprints and drawings. Compression can be achieved by transforming the data, projecting it on a basis of functions, and then encoding this transform. In this paper, we examine the design of image coder by integrating contourlet transform [2] with Multistage Vector Quantization (MSVQ) [3]. Vector quantization (VQ) is a quantization technique [4] applied to an ordered set of symbols. The superiority of VQ lies in the block coding gain, the flexibility in partitioning the vector space, and the ability to exploit intra-vector correlations. Multistage VQ divides the encoding task into several stages. The first stage performs a relatively crude encoding of the input vector using a small codebook. Then, the second stage quantizer operates on the error vector between the original vector and the quantized first stage output. The quantized error vector provides a refinement to the first approximation. The indices obtained by multistage vector quantizer are then encoded using Huffman coding. Contourlets have the property of preserving edges and fine details in the image; the encoding complexity in multistage vector quantization is less when compared to tree structured vector quantization. This motivates us to develop a new coding scheme by integrating contourlet transform with multistage vector quantization.
Abstract This paper presents a new coding technique based on contourlet transform and multistage vector quantization. Wavelet based Algorithms for image compression results in high compression ratios compared to other compression techniques. Wavelets have shown their ability in representing natural images that contain smooth areas separated with edges. However, wavelets cannot efficiently take advantage of the fact that the edges usually found in natural images are smooth curves. This issue is addressed by directional transforms, known as contourlets, which have the property of preserving edges. The contourlet transform is a new extension to the wavelet transform in two dimensions using nonseparable and directional filter banks. The computation and storage requirements are the major difficulty in implementing a vector quantizer. In the full-search algorithm, the computation and storage complexity is an exponential function of the number of bits used in quantizing each frame of spectral information. The storage requirement in multistage vector quantization is less when compared to full search vector quantization. The coefficients of contourlet transform are quantized by multistage vector quantization. The quantized coefficients are encoded by Huffman coding to get better quality i.e., high peak signal to noise ratio (PSNR). The results obtained are tabulated and compared with the existing wavelet based ones. Keywords: Contourlet Transform, Directional Filter bank, Laplacian Pyramid, Multistage Vector Quantization.
The remainder of the paper is organized as follows: Section 2 focuses on contourlet transform, Section 3 emphasizes on multistage vector quantization, Section 4 deals with the proposed image compression scheme and finally conclusions are drawn in Section 5.
1. Introduction
2. Contourlet Transform
A fundamental goal of image compression [1] is to reduce the bit rate for transmission or data storage while maintaining an acceptable fidelity or image quality. Image compression is essential for
The Contourlet Transform is a directional transform, which is capable of capturing contours and fine details in images. The contourlet expansion is composed of basis function oriented at various
47
GVIP Special Issue on Image Compression, 2007
directions in multiple scales, with flexible aspect ratios. With this rich set of basis functions, the contourlet transform effectively capture smooth contours that are the dominant feature in natural images. In contourlet transform, the Laplacian pyramid does the decomposition of images into subbands and then the directional filter banks analyze each detail image as illustrated in Fig. 1. The pyramidal directional filter bank (PDFB) [5], was proposed by MinhDo and Vetterli, which overcomes the block-based approach of curvelet transform by a directional filter bank, applied on the whole scale also known as contourlet transform (CT). The grouping of wavelet coefficients suggests that one can obtain a sparse image expansion by first applying a multi-scale transform and then applying a local directional transform to gather the nearby basis functions at the same scale into linear structures. In essence, first a wavelet-like transform is used for edge (points) detection, and then a local directional transform for contour segments detection. With this insight, one can construct a double filter bank structure (Fig.2 (a)) in which at first the Laplacian pyramid (LP) is used to capture the point discontinuities, and followed by a directional filter bank (DFB) to link point discontinuities into linear structures [6]. The overall result is an image expansion with basis images as contour segments, and thus it is named the contourlet transform. The combination of this double filter bank is named pyramidal directional filter bank (PDFB).
In general, the contourlet construction allows for any number of DFB decomposition levels ‘lj’ to be applied at each LP level ‘j’. For the contourlet transform to satisfy the anisotropy scaling relation, one simply needs to impose that in the PDFB, the number of directions is doubled at every other finer scale of the pyramid. Fig. 2(b) graphically depicts the supports of the basis functions generated by such a PDFB. As can be seen from the two shown pyramidal levels, the support size of the LP is reduced by four times while the number of directions of the DFB is doubled. Combine these two steps, the support size of the PDFB basis functions are changed from one level to next in accordance with the curve scaling relation. In this contourlet scheme, each generation doubles the spatial resolution as well as the angular resolution. The PDFB provides a frame expansion for images with frame elements like contour segments, and thus is also called the contourlet transform.
Fig 2. (a) Block diagram of a PDFB, and (b) Supports for
Fig 1 A flow graph of the Contourlet Transform
Contourlets
Fig. 2(a) shows the block diagram of a PDFB. First a standard multi-scale decomposition into octave bands is computed, where the low pass channel is subsampled while the high pass is not. Then a directional decomposition with a DFB is applied to each high pass channel. Fig. 2(b) shows the support shapes for contourlets implemented by a PDFB that satisfies the anisotropy scaling relation. From the upper line to the lower line, the scale is reduced by four while the number of directions is doubled. PDFB allows for different number of directions at each scale/resolution to nearly achieve critical sampling. As DFB is designed to capture high frequency components (representing directionality), the LP part of the PDFB permits subband decomposition to avoid “leaking” of low frequencies into several directional subbands, thus directional information can be captured efficiently.
A. Laplacian Pyramid One way of achieving a multiscale decomposition is to use a Laplacian pyramid (LP), introduced by Burt and Adelson [7]. The LP decomposition at each level generates a down sampled lowpass version of the original and the difference between the original and the prediction, resulting in a bandpass image as shown in Fig. 3(a). In this figure, ‘H’ and ‘G’ are called analysis and synthesis filters and ‘M’ is the sampling matrix. The process can be iterated on the coarse version. In Fig. 3(a) the outputs are a coarse approximation ‘a’
48
GVIP Special Issue on Image Compression, 2007
Fig 3. Laplacian pyramid scheme (a) analysis, and (b) reconstruction.
and a difference ‘b’ between the original signal and the prediction. The process can be iterated by decomposing the coarse version repeatedly. The original image is convolved with a Gaussian kernel [8]. The resulting image is a low pass filtered version of the original image. The Laplacian is then computed as the difference between the original image and the low pass filtered image. This process is continued to obtain a set of band-pass filtered images (since each one is the difference between two levels of the Gaussian pyramid). Thus the Laplacian pyramid is a set of band pass filters. By repeating these steps several times a sequence of images, are obtained. If these images are stacked one above another, the result is a tapering pyramid data structure, as shown in Fig. 4 and hence the name. The Laplacian pyramid can thus be used to represent images as a series of bandpass filtered images, each sampled at successively sparser densities. It is frequently used in image processing and pattern recognition tasks because of its ease of computation. A drawback of the LP is the implicit oversampling. However, in contrast to the critically sampled wavelet scheme, the LP has the distinguishing feature that each pyramid level generates only one bandpass image (even for multidimensional cases), which does not have “scrambled” frequencies. This frequency scrambling happens in the wavelet filter bank when a highpass channel, after downsampling, is folded back into the low frequency band, and thus its spectrum is reflected. In the LP, this effect is avoided by downsampling the lowpass channel only.
Fig 4. Laplacian pyramid structure.
wedge-shaped frequency partition as shown in Fig. 5. The original construction of the DFB in [9] involves modulating the input signal and using diamondshaped filters. Furthermore, to obtain the desired frequency partition, an involved tree expanding rule has to be followed. As a result, the frequency regions for the resulting subbands do not follow a simple ordering as shown in Fig. 4 based on the channel indices. The DFB is designed to capture the high frequency components (representing directionality) of images [1]. Therefore, low frequency components are handled poorly by the DFB. In fact, with the frequency partition shown in Fig. 5, low frequencies would leak into several directional subbands, hence DFB does not provide a sparse representation for images. To improve the situation, low frequencies should be removed before the DFB. This provides another reason to combine the DFB with a multiresolution scheme. Therefore, the LP permits further subband decomposition to be applied on its bandpass images. Those bandpass images can be fed into a DFB so that directional information can be captured efficiently. The scheme can be iterated repeatedly on the coarse image. The end result is a double iterated filter bank structure, named pyramidal directional filter bank (PDFB), which decomposes images into directional subbands at multiple scales. The scheme is flexible since it allows for a different number of directions at each scale. Fig. 6, 7 and 8 shows the contourlet transform of the images Lena, Fingerprint and Barbara respectively. For the visual clarity, only two-scale decompositions are shown. Each image is decomposed into a lowpass subband and several bandpass directional subbands.
B. Directional Filter Bank In 1992, Bamberger and Smith [9] introduced a 2-D directional filter bank (DFB) that can be maximally decimated while achieving perfect reconstruction. The directional filter bank is a critically sampled filter bank that can decompose images into any power of two’s number of directions. The DFB is efficiently implemented via a l-level treestructured decomposition that leads to ‘2l’ subbands with
Fig 5. DFB frequency partitioning
49
GVIP Special Issue on Image Compression, 2007
codebook with no imposed constraints in its structure. The resulting encoding and storage complexity, of the order of 2kr, may be prohibitive for many applications. A structured VQ scheme which can achieve very low encoding and storage complexity is multistage VQ (MSVQ). In MSVQ, the kr bits are divided between L stages with bi bits for stage ‘i’. The storage L
complexity of MSVQ is
¦ 2b i
vectors, which can
i 1
Fig 6. Contourlet Transform of “Lena” image.
be L
much
2bi
less
than
the
complexity
of
2 kr vectors for unstructured VQ. MSVQ
i 1
[10] is a sequential quantization operation where each stage quantizes the residual of the previous stage.
Fig 7. Contourlet Transform of “Fingerprint” image.
Fig 9. Encoder block diagram of MSVQ
The structure of MSVQ encoder [11] consists of a cascade of VQ stages as shown in Fig. 9. For an Lstage MSVQ, an l th –stage quantizer Ql ,
Fig 8. Contourlet Transform of “Barbara” image.
l =0,1,2… L 1 is associated with a stage codebook Cl contains K l stage code vectors. The set of stage quantizers {Q0,Q1,.......,QL 1} are equivalent to a single quantizer Q , which is referred to as the direct-
It can be seen that only contourlets that match with both location and direction of image contours produce significant coefficients. Thus, the contourlet transform effectively explores the fact, that the edges in images are localized in both location and direction. One can decompose each scale into any arbitrary power of two’s number of directions, and different scales can be decomposed into different numbers of directions. This feature makes contourlets a unique transform that can achieve a high level of flexibility in decomposition while being close to critically sampled. Other multiscale directional transforms have either a fixed number of directions or are significantly over complete.
sum vector quantizer. MSVQ Encoder In the MSVQ encoder as shown in Fig.9, the input vector ‘X’ is quantized with the first stage codebook producing the first stage code vector Q0(X), a residual vector y0 is formed by subtracting Q0(X) from ‘X’. Then y0 is quantized using the second stage codebook, with exactly the same procedure as in the first stage, but with ‘y0’ instead of ‘X’ as the input to be quantized. Thus, in each stage except the last stage, a residual vector is generated and passed to the next stage to be quantized independently of the other stages. MSVQ is an error refinement scheme, inputs to a stage are residual vectors from previous stage and they tend to be less and less correlated as the process proceeds.
3. Multistage Vector Quantization In vector quantization, an input vector of signal samples is quantized by selecting the best matching representation from a codebook of ‘2kr’ stored code vectors of dimension k. VQ is an optimal coding technique in the sense that all other methods of coding a random vector in ‘k’ dimensions with a specific number b=kr of bits are equivalent to special cases of VQ with generally suboptimal codebooks. However, optimal VQ assumes single and possibly very large
MSVQ Decoder The decoder as shown in Fig. 10 receives for each stage an index identifying the stage code vector selected and forms the reproduction X by summing
50
GVIP Special Issue on Image Compression, 2007
the identified vectors. The overall quantization error is equal to the quantization residual from the last stage. Sequential searching of the stage codebooks renders the encoding complexity to the storage complexity
In this work we do not design a different codebook for each individual layer. The same codebook is applied to all layers. 6. The indices obtained from Multistage Vector Quantization are encoded by Huffman coding. The proposed scheme uses static Huffman coding where the same Huffman table is used for different images. This way overhead of sending Huffman tables along with coded data is eliminated
L
¦ 2bi
i 1
5. Results and Discussion We present the encoding results of 256 x 256, 8 bit resolution ‘LENA’, ‘BARBARA’ and ‘FINGERPRINT' images. We have tested our algorithm for the class of natural image that do not contain large amounts of high frequency or oscillating patterns which is nothing but Lena image. The same algorithm is applied to the test image that exhibit large amounts of high frequency and oscillating patterns, which is Barbara image. Other than low and high frequency image, the algorithm is also applied to the image, which has both high, and low frequency part, which is fingerprint [13]. For simplicity, we have considered only two stages in the multistage vector quantization. The same algorithm can be extended to many stages. As a trial, we have incorporated three stages in MSVQ for Lena image and found that the quality of the reconstructed image is good, but the execution time is more when compared to two stages in MSVQ. During transmission of images, the impact of different types of noises in the test image should be taken into account. In our work, the prime motive is compression and not transmission hence the impact of noise is not taken into account. The codes are run on a Pentium IV PC with 256Mb RAM.
Fig 10. Decoder block diagram of MSVQ
4. Proposed Scheme The proposed algorithm is summarized below. 1. To decorrelate the relationship between the pixels, contourlet transform is applied first to all the test images taken. Different directional and pyramidal filter banks are considered for decomposition. This is the initialization stage in the proposed algorithm. 2. Group neighboring contourlet coefficients into one vector. 2 X 2 contourlet coefficients are grouped into a vector. 3. Take the absolute values of all vector components since signs and absolute values of vector components are encoded separately in our algorithm, we consider only the magnitude of each vector component in the refinement process. 4. Find the training vectors for the first layer codebook. This can be done by two different ways. One is to include all training vectors of the first layer, i.e., symbols with norms larger than the first threshold T1. Another is to manipulate the components of vectors, e.g. multiplied by 2 or 4, so that all the vectors fall in the subspace of the first layer. The latter approach contains a much larger training set and richer patterns than the former one. We choose the second method in our coding scheme. 5. Perform multistage codebook training. The codebook training includes: find the centroids of the training set, and the residual codewords of the first stage, second stage, and etc. The training method is Lloyd-Max iteration, which is often referred to as Linde, Buzo and Gray (LBG) [12].
Table I gives the result of the proposed scheme against wavelet based multistage vector quantization for Fingerprint image and the corresponding plot is shown in Fig. 11. Table II and III gives the result of PSNR values for different pyramidal and directional filters when applied to the fingerprint image and the corresponding plots are shown in Fig. 12 and 13 respectively. Table-I PSNR values for Wavelet and Contourlet Transform of Fingerprint image Bits per Dimensions (bpd) 0.125 0.25 0.5 0.75 1.0
‘Haar’ Wavelet
25.8820 38.7565 50.5928 54.3465 59.9010
P-filter: ’Haar’ D-filter: ’9-7’ 27.7315 40.1037 51.9884 55.5676 62.6486
P-filter: ’Haar’ D-filter: ’pkva’ 27.7584 40.0999 51.9807 55.5614 62.6456
P-filter: ’Haar’ D-filter: ’5-3’ 26.4390 39.3212 51.3174 54.8794 62.1502
From the Fig. 20 we can infer that the proposed scheme outperforms the wavelet based multistage
51
GVIP Special Issue on Image Compression, 2007
vector quantization. In the case of fingerprint image, the ‘Haar’ and ‘9-7’ as the pyramidal and directional filter combination gives better PSNR result when compared to other pyramidal and directional filter combinations.
Table IV gives the result of the proposed scheme against wavelet based multistage vector quantization for Lena image and the corresponding plot is shown in Fig. 14. Table V and VI gives the result of PSNR values for different pyramidal and directional filters when applied to the Lena image and the corresponding plots are shown in Fig. 15 and 16 respectively.
Wavelet Vs Contourlet for Fingerprint image 65
Wavelet=Haar Contourlet=Haar + 9-7 Contourlet=Haar + 5-3
60 55
Contourlet for Fingerprint image
PSNR--->
50
45
45
40
40
P:Filter=9-7,D:Filter=5-3 P:Filter=PKVA,D:Filter=5-3 P:Filter=5-3,D:Filter=PKVA
PSNR--->
35 30 25 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
35
30
Bits per dimensions------> 25
Fig 11. Plot of PSNR Vs bit rate for fingerprint image
Table-II PSNR values for Contourlet Transform of Fingerprint image Bits per Dimensions (bpd) 0.125 0.25 0.5 0.75 1.0
P-filter: ’9-7’ D-filter: ’pkva’ 21.6754 28.4383 35.1472 38.7381 41.2961
P-filter: ’pkva’ D-filter: ’9-7’ 21.1154 27.8619 34.3806 37.8090 40.1303
20 0.1
P-filter: ’5-3’ D-filter: ’9-7’ 22.8744 30.1356 31.9153 35.1001 37.3613
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig 13. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Fingerprint image
Table-IV PSNR values for Wavelet and Contourlet Transform of Lena image
Contourlet for Fingerprint image
PSNR--->
0.3
Bits per dimensions------>
45
P:Filters=9-7,D:Filters=PKVA P:Filters=PKVA,D:Filters=9-7 P:Filters=5-3,D:Filters=9-7
40
0.2
Bits per Dimens -ions (bpd)
‘Haar’ Wavelet
P-filter: ’Haar’ D-filter: ’9-7’
P-filter: ’Haar’ D-filter: ’pkva’
P-filter: ’Haar’ D-filter: ’5-3’
0.125
25.9767
27. 4774
27.4773
26.7345
0.25
38.8644
40.3459
40. 3470
39.5909
0.5
50.6572
52. 3395
52.3274
51.6232
0.75 1.0
54.2564 59.7543
55.7400 62. 3710
55.7370 62.3666
55.0487 61.9104
35
30
25
20 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bits per dimensions------>
Fig 12. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Fingerprint image
Table-III PSNR values for Contourlet Transform of Fingerprint image Bits per Dimensions (bpd) 0.125 0.25 0.5 0.75 1.0
P-filter: ’9-7’ D-filter: ’5-3’ 21.4573 28.3826 35.1316 38.7230 41.2917
P-filter: ’pkva’ D-filter: ’5-3’ 21.1001 27.8567 34.3795 37.8072 40.1301
P-filter: ’5-3’ D-filter: ’pkva’ 22.8840 30.1386 31.9156 35.1000 37.3612
52
GVIP Special Issue on Image Compression, 2007
Table VII gives the result of the proposed scheme against wavelet based multistage vector quantization for Barbara image and the corresponding plot is shown in Fig. 17. Table VIII and IX gives the result of PSNR values for different pyramidal and directional filters when applied to the Barbara image and the corresponding plots are shown in Fig. 18 and 19 respectively. From the Fig. 22 we can infer that the proposed scheme outperforms the wavelet based multistage vector quantization. In the case of Barbara image, the ‘5-3’ and ‘pkva’ as the pyramidal and directional filter combination gives better PSNR result when compared to other pyramidal and directional filter combinations.
Wavelet Vs Contourlet for Lena image 65
Wavelet=Haar Contourlet=Haar + 9-7 Contourlet=Haar + 5-3
60 55
PSNR--->
50 45 40 35 30 25 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bits per dimensions------>
Fig 14. Plot of PSNR Vs bit rate for Lena Image
Table-V PSNR values for Contourlet Transform of Lena image P-filter: ’9-7’ D-filter: ’pkva’ 25.7677 38.8576 51.2431 54.7321 63.2737
P-filter: ’pkva’ D-filter: ’9-7’ 27.5363 39.4803 51.5084 54.9806 63.5641
Contourlet for Lena image 70
P-filter: ’5-3’ D-filter: ’9-7’ 28.8109 42.0035 54.0942 57.5803 66.1124
60 55 50
25 0.1
Bits per Dimen s-ions (bpd) 0.125 0.25 0.5 0.75 1.0
PSNR--->
55 50 45 40 35 30 0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.6
0.7
0.8
0.9
1
P-filter: ’Haar’ D-filter: ’9-7’
P-filter: ’Haar’ D-filter: ’pkva’
P-filter: ’Haar’ D-filter: ’5-3’
25.9767 38.8644 50.6572 54.2564 59.7543
27.6510 40.1235 52.0361 55.6162 62.6668
27.6584 40.1111 52.0314 55.6130 62.6654
26.6174 39.3832 51.3565 54.9234 62.1643
Wavelet Vs Contourlet for Barbara image Wavelet=Haar Contourlet=Haar + 9-7 Contourlet=Haar + 5-3
60 55
PSNR--->
Table-VI PSNR values for Contourlet Transform of Lena image P-filter: ’pkva’ D-filter: ’5-3’ 27.9687 39.7848 51.8074 55.2784 63.8706
0.5
65
Fig 15. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Lena image
P-filter: ’9-7’ D-filter: ’5-3’ 26. 3517 39.3317 51. 5548 55. 0525 63. 5971
0.4
Bits per dimensions------>
‘Haar’ Wavelet
1
Bits per dimensions------>
Bits per Dimensions (bpd) 0.125 0.25 0.5 0.75 1.0
0.3
Table-VII PSNR values for Wavelet and Contourlet Transform of Barbara image
Contourlet for Lena image
0.2
0.2
Fig 16. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Lena image
P:Filter=9-7,D:Filter=PKVA P:Filter=PKVA,D:Filter=9-7 P:Filter=5-3,D:Filter=9-7
25 0.1
40
30
70
60
45
35
From the Fig. 21 we can infer that the proposed scheme outperforms the wavelet based multistage vector quantization. In the case of Lena image, the ‘5-3’and ‘pkva’ as the pyramidal and directional filter combination gives better PSNR result when compared to other pyramidal and directional filter combinations.
65
P:Filter=9-7,D:Filter=5-3 P:Filter=PKVA,D:Filter=5-3 P:Filter=5-3,D:Filter=PKVA
65
PSNR--->
Bits per Dimensions (bpd) 0.125 0.25 0.5 0.75 1.0
P-filter: ’5-3’ D-filter: ’pkva’ 28.8971 42.0332 54.1043 57.5952 66.1352
50 45 40 35 30 25 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Bits per dimensions------>
Fig 17. Plot of PSNR Vs bit rate for Barbara Image
53
1
GVIP Special Issue on Image Compression, 2007
filters chosen are ‘5-3’ and ‘pkva’ respectively. From the figures, it is obvious that as the bit rate increases, the visual quality of the reconstructed image increases which is in accordance with Rate-Distortion theory.
Table-VIII PSNR values for Contourlet Transform of Barbara image Bits per Dimensions (bpd)
P-filter: ’9-7’ D-filter: ’5-3’
P-filter: ’pkva’ D-filter: ’5-3’
P-filter: ’5-3’ D-filter: ’pkva’
0.125 0.25 0.5 0.75 1.0
26.6045 39.4620 51.5451 55.0620 63.6032
27.2880 39.4224 51.4194 54.9239 63.4831
29.1200 42.0637 54.1801 57.6369 66.2182
Original image
Reconstructed image
Contourlet for Barbara image 70
P:Filter=9-7,D:Filter=5-3 P:Filter=PKVA,D:Filter=5-3 P:Filter=5-3,D:Filter=PKVA
65 60
(a) Reconstructed image
PSNR--->
55
(b) Reconstructed image
50 45 40 35 30 25 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bits per dimensions------>
Fig 18. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Barbara image
(c)
(d)
Fig 20. Original and decoded 256 x 256 Finger print image (a) Original image (b) bpd=0.125,(c) bpd=0.25, (d) bpd=1.0 using P-filter = ‘5-3’ and D-filter = ‘pkva’
Table-IX PSNR values for Contourlet Transform of Barbara image Bits per P-filter: P-filter: P-filter: Dimens’9-7’ ’pkva’ ’5-3’ ions D-filter: D-filter: D-filter: (bpd) ’pkva’ ’9-7’ ’9-7’ 0.125 27.6801 27.3461 27.2880 0.25 40.2529 39.4977 40.5984 0.5 52.2609 51.4890 52.8883 0.75 55.7804 55.0027 56.3485 1.0 64.3248 63.5480 64.9423
Original image
(a) Reconstructed image
Reconstructed image
(b) Reconstructed image
Contourlet for Barbara image 65
P:Filter=9-7,D:Filter=PKVA P:Filter=PKVA,D:Filter=9-7 P:Filter=5-3,D:Filter=9-7
60 55
PSNR--->
50 45 40 35 30 25 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(c)
1
Bits per dimensions------>
(d)
Fig 21. Original and decoded 256 x 256 Lena image (a) Original image (b) bpd=0.125, (c) bpd=0.25, (d) bpd=1.0 using P-filter= ‘5-3’ and D-filter= ‘pkva’
Fig 19. Comparison of bit rate vs. PSNR between different pyramid and directional filters using Contourlet transform with Multistage Vector quantization for Barbara image
Fig. 20, 21 and 22 shows the original and reconstructed images of fingerprint, Lena and Barbara at different bit rates. The pyramidal and directional
54
GVIP Special Issue on Image Compression, 2007
Original image
Reconstructed image
(a)
(b)
(a) Reconstructed image
(b)
Reconstructed image
(c) (d) Fig 24. Original and decoded 256 x 256 Lena image, bpd at 0.25 (a) Original image (b) Single stage MSVQ, (c) Two Stage MSVQ (d) Three Stage MSVQ using P-filter = ‘5-3’ and D-filter = ‘pkva’
(c) (d) Fig 22. . Original and decoded 256 x 256 Barbara image (a) Original image (b) bpd=0.125, (c) bpd=0.25, (d) bpd=1.0 using P-filter= ‘5-3’ and D-filter = ‘pkva’
Fig. 24 shows the reconstructed images of ‘Lena’ for different stages of MSVQ. We have compared the execution time and the quality of the reconstructed image by incorporating three stages in multistage vector quantization. The results are shown in Table X. From the table it is clear that as the number of stages in multistage vector quantization increases, the quality of the reconstructed image also increases at the expense of execution time. This is evident from the plot, shown in Fig. 23. In Table X, ‘bpd’ stands for bits per dimension.
Fig 23. Plot of PSNR Vs Execution time for Lena Image
Table X Contourlet transform with Different stages in MSVQ for Lena image P: Filter = ‘5-3’ and D: Filter = ‘pkva’
bpd
Single Stage VQ
Two Stage VQ
Three Stage VQ
PSNR in dB
Execut-ion time in seconds
PSNR in dB
Execut-ion time in seconds
PSNR in dB
Execut-ion time in seconds
0.125
15.6237
4.5320
28.8971
8.2180
42.0332
12.4540
0.25
22.2691
4.7810
42.0332
9.1880
60.1697
12.1870
0.5
28.8972
4.8130
54.1043
9.3750
78.0523
12.7500
0.75
32.7703
4.9060
57.5952
9.4060
81.6577
13.3900
1.0
35.4600
5.3600
66.1352
10.1560
90.7476
13.4690
55
GVIP Special Issue on Image Compression, 2007
[8] M. N. Do, “Directional Multiresolution Image Representations,” Ph.D.Thesis, EPFL, Lausanne, Switzerland, Dec. 2001. [9] R. H. Bamberger and M. J. T. Smith, “A filter bank for the Directional decomposition of images: theory and design,” IEEE Trans. on Signal Processing, vol. 40, no. 4, pp. 882-893, Apr. 1992. [10] Jayshree Karlekar, P.G. Poonacha and U.B. Desai, “Image Compression using Zerotree and Multistage Vector Quantization”, ICIP, Vol.2, pp.610, 1997 [11] Hosam Khalil, Kenneth Rose, “Multistage vector quantizer optimization for packet networks,” IEEE Trans. Signal Proc. Vol. 51, No.7, pp.18701879, July 2003. [12] Y. Linde, A. Buzo and R.M.Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol.28, pp.84-95, Jan.1980 [13] R. Sudhakar, R. Karthiga and S. Jayaraman, “Fingerprint compression using Contourlet Transform with Modified SPIHT algorithm”, Iranian Journal of Electrical and Computer Engineering (IJECE), vol.5, No.1, pp.3-10, WinterSpring 2006.
6. Conclusion In this paper, compression of images using contourlet transform and multistage vector quantization has been presented. An extensive result has been taken on different images. It can be seen that the PSNR obtained by contourlet transform is higher than that of wavelet transform. Hence, a better image reconstruction is possible with less number of bits, by using contourlet transform. Here, only four filter combinations are considered. We are currently pursuing with other filter combinations. The experimental results reveal the fact that MSVQ is suitable for low bit rate image coding. The proposed scheme shows output of good quality around 0.5 bits per dimension (bpd) and very good results at around 1 bpd. This scheme can easily be extended to include more stages in MSVQ to improve the output image quality.
7. Acknowledgements The authors wish to thank their teachers Dr. S. Jayaraman, Dr. N. Malmurugan for their continued support. They also thank their present institution where they are working, for the encouragement.
8. References [1] M.Antonini, M.Barlaud, P. Mathieu, and I.Daubechies, “Image coding using wavelet transform”, IEEE Trans. Image Proc., 205-220, Apr.1992. [2] M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional multiresolution image representation,” IEEE Trans. Of Image Processing, vol.14, no.12, pp. 2091-2106, Dec. 2004. [3] B.H.Juang and A.H.Gray, “Multiple stage vector quantization for speech coding”, Proc. IEEE Int.Conf.Acoust., Speech, Signal Processing (Paris, France), pp. 597-600, Apr.1982. [4] A.Gersho and R.M. Gray, Vector Quantization and Signal Compression. Boston, MA: Kluwer, 1992.
[5] M. N. Do and M.Vetterli, “Pyramidal directional filter banks and curvelets,” in Proc. Of IEEE Int. Conf. on Image Proc., vol.3, pp.158-161, Thessaloniki, Greece, Oct.2001. [6] D.D. Y. Po and M. N. Do, “Directional multiscale modeling of images using the contourlet transform,” IEEE Trans. on Image Processing, to appear, Jun. 2006. [7] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. on Commun., vol. 31, no. 4, pp. 532-540, 1983.
56
GVIP Special Issue on Image Compression, 2007
An Improved Image Compression approach with Self-Organizing Feature Maps using Cumulative Distribution Function S. Anna Durai B.E.M.E., M.E1 & E. Anna Saro MCA, M. Phil2 Govt. College of Engineering, Tirunelveli-627 007, Tamilnadu, India. 2 Dept. of Computer Science, Sri Ramakrishna College of Arts & science for women, Coimbatore-641044, Tamil Nadu, India. Email Id:
[email protected] 1
compression algorithm for image data should preserve most of the characteristics of the data while working in a lossy manner and maximize the gain and be of lesser algorithmic complexity. In general almost all the traditional approaches adopt a two-stage process, first, the data is transformed into some other domain and or represented by the indices of the codebook, followed by an encoding of the transformed coefficients or the codebook indices. The first stage is to minimize the spatial correlation or to make use of the correlation so as to reduce the data. Most commonly adopted approaches rely on the transformed techniques and or the use of Vector Quantization. Most of the algorithms exploit spatial correlations. Discrete cosine transform (DCT) is used practically in almost all image compression techniques. Wavelets Transform has been proven to be very effective and has achieved popularity over discrete cosine transform transform. Discrete cosine transform followed by predictive coding is used in another approach. However, inter-pixel relationship is highly non-linear and un-predictive in the absence of a prior knowledge of the image itself. So predictive approaches would not work well with natural images. Transform based approaches have been used by most of the researchers in some combination or the other. Among the non-transformed approaches Vector Quantization based techniques encodes a sequence of samples rather than encoding a sample and automatically exploits both linear and non-linear dependencies. It is shown that Vector Quantization is optimal among block coding techniques, and that all transform coding techniques can be taken as a special case of Vector Quantization with some constraints. In Vector Quantization, approximating a sequence to be coded by a vector belonging to a codebook performs encoding. Creation of a straight and unconstrained codebook is a computationally intensive and the complexity grows exponentially with the block size. Artificial Neural Networks have been applied to image compression problems, due to their superiority
Abstract: In general the images used for compression are of different types like dark image, high intensity image etc. When these images are compressed using SelfOrganizing Feature Maps, it takes longer time to converge. The reason for this is, the given image may contain a number of distinct gray levels with narrow difference with their neighborhood pixels. If the gray levels of the pixels in an image and their neighbors are mapped in such a way that the difference in the gray levels of the neighbors with the pixel is minimum, then compression ratio as well as the convergence of the network can be improved. To achieve this, a Cumulative distribution function is estimated for the image and it is used to map the image pixels. When the mapped image pixels are used, the Self-Organizing Feature Maps yield high compression ratio as well as it converges quickly. Keywords: Self-Organizing Feature Maps, Cumulative Distribution Function, Learning Vector Quantization, Correlation, Convergence, Pixel value.
1. Introduction Image compression research aims at reducing the number of bits needed to represent an image. In lossless compression schemes, the reconstructed image, after compression, is numerically identical to the original image. However lossless compression can only achieve a modest amount of compression. Lossy schemes are capable of achieving much higher compression. Image compression algorithms take into account the psycho visual features both in space and frequency domain and exploit the spatial correlation along with the statistical redundancy. Almost all practically used algorithms adopt a Quantization stage, which makes the algorithms lossy, and thus achieve a desired compression ratio. However, usages of the algorithms are dependent mostly on the information contained in images. A practical
57
GVIP Special Issue on Image Compression, 2007
over traditional methods when dealing with noisy or incomplete data. Neural networks seem to be well suited to image compression, as they have the ability to preprocess input patterns to produce simpler patterns with fewer components. This compressed information preserves the full information obtained from the external environment. Not only can Artificial Neural Networks based techniques provide sufficient compression rates of the data in question, but also security is easily maintained. This occurs because the compressed data that is sent along a communication line is encoded and does not resemble its original form. Many different training algorithms and architectures have been used. The two different types of image compression approaches adopting Artificial Neural Networks are direct pixel-based encoding and decoding and pixel-based encoding and decoding based on a modular approach. Different types of Artificial Neural Networks have been trained to perform image compression. Feed-forward networks, Self-Organizing Feature Maps, Learning Vector Quantization network, Auto-associated networks have been applied to image compression. These networks contain at least one hidden layer, with fewer units than the input and output layers. The network is then trained to recreate the input data. Its bottleneck architecture forces the network to project the original data onto a lower dimensional manifold from which the original data should be predicted. The most advanced approaches are based on specialized compression modules. These approaches either combine different Artificial Neural Networks to obtain the best possible image compression rate or they combine more traditional statistical methods with one or more Artificial Neural Networks. These approaches compete with well-established compression techniques such as JPEG .The major advantage of Artificial Neural Networks is that their parameters are adaptable, which may give better compression rates when trained on specific image. The Self-Organizing Feature Map is an ingenious Neural Network built around a one or twodimensional lattice of neurons for capturing the important features contained in the input. The aim of the Self-Organizing Feature Map algorithm is to store a large set of input vectors by finding a smaller set of weights so as to provide a good approximation to the original output space, which is the motivation for adopting Self-Organizing Feature Map for image compression. The Kohonen model is capable of performing image Compression. Use of Neural Network in Vector Quantization design assures good performance both in quality of reconstructed image and computational effort. Kohonen model of the SelfOrganizing Feature Map has formed a basis for applying Vector Quantization codebook design problem. The Self-Organizing Feature Maps are most simple to implement and hence a well known algorithm for compression of images. Images are of
various types, like natural images, medical images, satellite images etc. As the size of these images increases more number of vectors are needed for training the Self-Organizing Feature Maps and consequently the Network takes longer time to converge. In addition the image samples will be of redundant in nature. The Self-Organizing Feature Maps are capable of grouping nearby similar pixels. Though the Network is able to detect the Coding and Psycho-visual redundancies in the image it cannot directly detect the inter-pixel redundancies. To overcome these difficulties a new approach has been proposed. The image is mapped by estimating the Cumulative Distribution Function. Mapping results in reduction of the inter-pixel redundancies and more number of similar pixels. This will augment the formation of clusters in the Network, which helps in quick convergence and to achieve high compression ratios. The remainder of the paper is organized as follows: The background theory of Kohonen SelfOrganizing Feature Maps, Vector Quantization adopting Self- Organizing Feature Maps for compression of images are discussed in Section 2. The Algorithm for compression of Gray scale images using Self- Organizing Feature Maps is detailed in section 3. Mapping of image pixels by estimating the Cumulative distribution function of the image to improve the compression ratio and convergence time of the network is explained in Section 4. The experimental results are discussed in section 5, followed by conclusion in section 6.
2. Theory Self Organizing Feature Maps Self Organizing Feature Maps [10] are special class of artificial neural networks based on competitive learning. Self- Organizing Feature Map is characterized as unsupervised learning algorithm, in which the output network neurons compete among themselves to be activated; in result only one output neuron or one neuron per layer wins the competition. The output neurons that win the competition are called “winner-take all” neurons. The lateral inhibitory connections between neurons are used for inducing the winner neurons. In two-dimensional Self Organizing Feature Maps, the neurons are placed at the lattice nodes; the lattice may take different shapes: rectangular grid, hexagonal, even random topology. The neurons become selectivity tuned on various input patterns in the course of competitive learning process. The locations of the winning neurons so tuned, tend to become ordered with respect to each other in such a way that a meaningful coordinate system for different input features to be created over the lattice. Kohonen [11] has developed an algorithm with self-organizing properties for a network of adaptive elements. These elements receive an input signal and the signal representations are automatically mapped onto a set of output responses so that these
58
GVIP Special Issue on Image Compression, 2007
responses acquire the same topological orders as the input signal. The algorithm proposed by Kohonen follows two basic equations: matching and finding the winner neuron determined by the minimum Euclidean distance to the input using equation (1) and the update of the position of neurons inside the cluster using equation (2). (1) ||x (t) - mc (t)|| = min ||x (t) – mi (t)|| (2) mi (t+1) = mi (t) + Į (t) [x (t) - mi (t)] i ȯ mi (t+1) = mi (t) i ȯ Nc (3) Where, for time t and a network with n neurons: x is the input Nc is the neighborhood of the winner Į is the gain sequence 0 < Į < 1 mi is the node, 1 < i < n, and mc is the winner The main properties [4] of the Self Organizing Algorithm are; it does not need to classify the training image blocks and the algorithm has the ability to form ordered topological feature maps. A good approximation to the original input space can be provided by finding a smaller set of prototypes from a large set of input vectors. This property is the basis for data compression. The spatial location of a neuron in the lattice corresponds to a particular domain of input patterns. This topological ordering property has induced in adopting Self Organizing Feature Maps for image compression. Another interest of these topological properties is robustness against transmission errors. Quantization is performed in the gray-level pixel space and hence and the visual aspect of images is preserved. This is very important for heavily textured images. The density matching property implies that if a particular region of the input space contains frequently occurring stimuli a larger area in the feature than a region of the input space where the stimuli occur less frequently will represent it. This property helps in reducing the reconstruction errors and preserves finer details.
codebook of the quantizer and its members are called code words. Learning Vector Quantization [7] is a supervised learning technique developed by Kohonen to improve the quality of the classifier decision regions of the Self Organizing Feature Maps. The architecture of Learning Vector Quantization is similar to the Kohonen Self Organizing Feature Maps without the topological structure for the output units. The algorithm for Learning Vector Quantization Network is to find the output unit that has a matching pattern in the input vector.
3. Algorithm for Image Compression The algorithm for compression of gray scale images using Kohonen Self Organizing Feature Maps [1], [3] is detailed below: Preprocessing of the image
The input images of size say 256x256 is decomposed into blocks (4x4 or 8x8 or 16x16). Initializing weights for the network The architecture of the Self Organizing Feature Maps for Vector Quantization depends on the size of the codebook. Each code vector in the codebook is the weight matrix of each node of the competitive layer of the Self Organizing Feature Maps. The codebook size is chosen to be 256. Thus 256 code vectors that represent the input image block are produced as output by this network. There is no separate output layer. Only the weights of the neurons of the competitive layer are taken as output. Since in Image Compression the weight values after training has to represent the input level gray levels, assigning random values ranging from 0 to 255 does initializing of weights. The architecture of the Self Organizing Feature Map is shown in figure 1.
Self-Organizing Feature Maps applied to Vector Quantization. Vector Quantization is a clustering technique by which an input space is divided into a number of distinct regions and for each region a reconstruction vector is defined. Self Organizing Feature Maps have been extensively applied to Vector Quantization [5], [6] to solve the main problem associated to the classical Vector Quantization techniques, which are rather sensitive to codeword errors. Due to the capability of Self Organizing Feature Maps to form ordered topological feature maps, i.e., after training, the Self Organizing Feature Maps weight vectors are spatially ordered in an array, the neighboring vectors in the map are more similar than the more remote ones, which results in optimal codebook and partitions design. By using an encoded version of this reproduction vector for storage or transmission considerable savings in storage can be realized. The collection of possible reproduction vector called the
Fig.1. Self Organizing Feature Map Architecture
The input layer consists of 16 nodes and the Kohonen layer consists of 256 nodes arranged in a 16x16 array. The input layer takes as input the gray level values from all the 16 pixels of the gray level block. The weights assigned between node j of the Kohonen layer and the input layer represents the weight matrix Wji = [wj0, wj1, wj2,…wj15]. Similarly for all the 256 nodes we have Wji for j = 0,1…255 and i = 0,1…15. Once the
59
GVIP Special Issue on Image Compression, 2007
weights are initialized randomly network is ready for training.
4. Proposed Approach –Mapping of pixels by estimating the Cumulative Distribution Function of the image
Training The image blocks are given as input to the network. These input vectors are mapped with the network weight vectors to choose a neuron in the competitive layer as a winner. This winner is a neuron whose weight vector is much similar to the input vectors. In other words it is the neuron having the minimum Euclidean distance from the input vector. The input vector, say x is simultaneously applied to all nodes. The similarity between x and weight wi is measured in terms of spatial neighborhood Nm. The weights affecting the currently winning neighbor hood undergo adoption at the correct learning step other weights remain unaffected. The neighborhood, Nm is found around the best matching node m such that (4) ||x – wm || = min [||x – wi||] The radius of Nm will be decreasing as the training progresses. Towards the end of training the neighborhood may involve no cells other than the central winning one. The weight-updating rule for Self Organizing Feature Map is defined as (5) ǻ wi (t) = Į [x (t) – wi (t)] for i ȯ Nm (t) Where Nm(t) denotes the current spatial neighborhood and Į denotes the learning rate. After training the weight vectors of each neuron of the Kohonen layer acts as code vectors.
Computational complexity is involved in compression of raw pixels of an image in spatial domain or the mathematically transformed coefficients in frequency domain using Neural Networks. The efficiency of such networks is pattern dependent. An image may contain a number of distinct gray levels with narrow difference with their neighborhood pixels. If the gray levels of the pixels in an image and their neighbors are mapped in such a way that the difference in the gray levels of the neighbor with the pixel is minimum, then compression ratio as well as the convergence of the network can be improved. To achieve this, the Cumulative distribution function [8], [9] is estimated for the image and it is used to map the image pixels. When the mapped image pixels are used, the Self Organizing Feature Map yield high compression ratio as well as it converges quickly. Consider an image as a collection of random variables, each with cumulative distribution and density of (6) Fx (x) = Prob {X x} d
(7) px (x) = dx Fx (x) Now consider forming the following random variable. Y = Fx (x) (8) Here Y is the result of mapping the random variable x through its own cumulative distribution function. The cumulative distribution of Y can be easily computed. Fy (y) = Prob {Fx (x) y} = Prob {X Fx-1 (y)} = Fx (Fx-1 (y)) 0 for y < 0 = y for 0 y 1 1 for y > 1 (9) This shows that y has a uniform distribution on the interval (0,1) Therefore; the histogram of an image can be equalized by mapping the pixels through their cumulative distribution function Fx(x). In order to implement histogram equalization, the cumulative distribution function for the image is estimated. It is done using the image histogram. Let h(i) be the histogram of the image formed by computing the number of pixels at gray level i. Typically, the pixels take on the values i=0, …,L-1 where L = 256 is the number of discrete levels that a pixel can take on. The cumulative distribution function can then be approximated by
Encoding image using code book After the codebook is generated the same Self Organizing Feature Map is used for encoding and decoding the input image. Encoding of the input image is done in the following steps. The input image is divided into blocks i. The image blocks are given as input to the Self Organizing Feature Map. ii. Inner neuron is selected as the neuron having the minimum Euclidean distance. iii. The index of winner neuron for each input block is stored consequently. iv. The set of indices of all the winner neurons for the blocks along with the codebooks gives the compressed form of the image. Decoding image using code book and index The codebook as well as the index of the winner node for all the blocks is the compressed form of the image. The image is decoded using the following steps i. For each index value find the neuron in the Self Organizing Feature Map. ii. Generate the weight vector of that neuron to the input layer neurons. This value of the neuron weights is the gray level for that block. iii. Display the gray level value thus obtained as pixels
1
j=1
h (j)
Fx(i) = h(L-1)
(10)
j=0
Here the normalization term assures that Fx(L-1)=1. By applying the concept of equation (6), a pixel of Xs
60
GVIP Special Issue on Image Compression, 2007
equalized at the position s ȯ S where S is the set of position in the image. Ys= Fx (Xs) (11) However, Ys has a range from 0 to 1 and may not extend over the maximum number of gray levels. To correct these problems, we first compute the minimum and maximum values of Ys.
Ymin = min s ȯ SYs
is
Ymax = max Ys
(13)
And then we use these values to from Zs, a renormalized version of Ys Fx(Xs) - Ymin
Zs = (L-1) Ymax
(14) - Ymin The transformation form Xs to Zs. is Histogram Equalization.
(12)
sȯȰ
Fig.2 Histogram of sample image
Fig.3 Histogram of Equalized image
Histogram equalization does not introduce new intensities in the image. Existing values will be mapped to new values resulting image with less number of the original number of intensities. Mapping of the pixels by estimating the cumulative Distribution function of the image results in correlation of the pixels and due to the presence of similar pixel values within the small blocks of images, the radius of neighborhood gets reduced at the start of the training stage itself resulting in quick convergence. Further the frequency of occurrence of gray levels in the image will be more are less equal or
rather uniform by the mapping as could be seen in fig.3.Due to this most of the image blocks will be similar and hence the learning time gets reduced. In the encoding phase, due to correlation of data in the image blocks, the number of indices for the winner neurons will be less, which further increases the net compression ratio. Similarly in the decoding phase since indices are less, the time taken for generation of weights and consequently the time for decompression also gets reduced. As such the total simulation time is very much reduced. In this neural network for image compression, since both the weight matrix and the
61
GVIP Special Issue on Image Compression, 2007
equalized input pattern matches with respect to the range of gray levels, the topology of the input pattern is maintained without much reorganization during the process and hence the reconstruction errors are less and the finer details of the image are preserved. The Quantization Error is traditionally related to all forms of Vector Quantization and clustering algorithms. However since mapping distributes the data sparsely, the Quantization Error is reduced. The quality of the decompressed image and the convergence time has been experimentally proved to be better than achieved by conventional methods and by the same algorithm without mapping the image by Cumulative Distribution function. The correlation of pixel values plays a vital role in augmenting the convergence of the neural network for image compression and this is a simple mechanism compared to other existing transformation methods, which are computationally complex and lengthy process.
image. PSNR values from 24 dB to 24.7 dB have been achieved by the above method. The new approach using Cumulative Distribution Function for mapping the image has achieved PSNR value of 28.91dB for the Lena image. Further the proposed approach is computationally simple when compared to the hybrid method referred above for comparison. The original image, image mapped by Cumulative Distribution Function and the decompressed image for the samples of images tested are illustrated. The visual quality of all the decompressed images irrespective of the type and nature of the images are better than achieved by other conventional methods. The Experimental Results and the test images are appended separately after the References.
6. Conclusion The Self-Organizing Feature Maps for Image Compression has been used to compress various types of Gray scale images. The PSNR value obtained is 26.89 dB and the time taken for convergence is 320seconds. By adopting the proposed approach the PSNR achieved is 29.79 dB and the time taken for convergence is 185 seconds. The time taken for simulation has been reduced to nearly 50%. The performance of the Self-Organizing Feature Maps has been substantially improved by the proposed approach. Mapping the image by Cummulative Distribution Function has helped the Self-Organizing Feature Map to converge easily and to achieve high compression ratios compared to previous Techniques. The method can be extended for progressive compression of images by dynamically estimating the Cumulative distribution function of the image in the regions of interest and thereafter compressed using the Self-Organizing Feature Maps.
5. Experimental Results The performance of the Self-Organized Feature Maps for image compression with the concept of mapping the image by Cumulative Distribution function has been tested in various types of standard images as well as medical MRI scan images and satellite images. More than 15 experiments were conducted by varying the values of the parameters namely Gain sequence and Neighborhood in the Network. The Neighborhood was initially chosen to be 5 X 5. It was reduced to 4 X 4 , 3 X 3, 2 X 2 and 1 X 1. The corresponding gain sequence was noted. It was observed that the gain sequence value of 0.35 produced good results and also resulted in quick convergence of the Network. Those parameters that produced the optimum values in respect of time of convergence, compression ratio and quality of image has been adopted to test the new approach in various types of images. The results achieved for some samples of images are illustrated in Table 1. The corresponding values obtained without adopting the new approach are illustrated in Table 2. The proposed approach resulted in quick convergence of the Network. The quality of the decompressed images is also high and PSNR values up to 29.79dB have been obtained by the proposed approach for compression ratio of 4. Further for PSNR values of 26dB the compression ratios ranging from 6 to 7 were obtained which is one and half times the compression ratio that has been obtained for the same PSNR value without adopting the new approach of mapping by Cumulative Distributive Function. To present an idea of the performance of the proposed approach the results are compared with those obtained from a similar compression scheme based on Discrete cosine transform of the original image, vector Quantization by Kohonen map, differential coding by first-order predictor, and entopic coding of the differences. [2]. The simulation has been carried out on the Lena
References [1] C.Amerijckx, J.D.Legatand M.Verleysen,”Image Compression Using Self- Organizing Maps”, Microelectronics Laboratory, University catholique de Louvain, 3 place du Levant, B-1348 Louvain-laNeuve, Belgium, (March 2002) [2] Christopher Amerijckx, Associate Member, IEEE, Michel Verleysen, Member, IEEE, Philippe Thissen, Member, IEEE, and Jean-Didier Legat, Member, IEEE.” Image Compression by Self-Organized Kohonen Map”, IEEE Transactions On Neural Networks, Vol. 9, No. 3, May 1998 [3] O. T. -C. Chen, B. J. Sheu, and W. -C. Fang, “Image compression using self-organization networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 480–489, Oct. 1994. [4] Erwin, E., Obermeyer, K. and Schulten, K.: Selforganizing maps: ordering, convergence properties and Energy functions, Biological Cybenetics, 67 (1992), 47-55, Helsinki University of Technology, Espoo, Finland, (1994). [5] A. Laha, N.R. Pal, and B. Chanda, “Design of Vector Quantizer for image compression using Self
62
GVIP Special Issue on Image Compression, 2007
Organizing Feature Map and Surface Fitting” [6] N. Nasrabadi and Y. Feng, “Vector quantization of images based upon the Kohonen Self-Organizing Feature Maps,” in IEEE Int. Conf. Neural Networks, San Diego, CA, vol. 1, 1988, pp. 101–108. [7] Panu Somervuo (
[email protected]) and Teuvo Kohonen,” Self-Organizing Maps and Learning Vector Quantization for Feature Sequences”, Helsinki University of Technology, Neural Networks Research Center, P.O. Box 2200, FIN-02015-HUT, Finland, 1998
[8] Rafael C. Gonzalez, Richard E. Woods, Steven L.Eddins, Digital Image Processing Using Matlab. Pearson Prentice Hall (2004). [9] Rafael C. Gonzalez, Richard E.Woods, “Digital Image Processing”, 2nd Ed., PHI, 2005 [10] Simon Haykin, Neural Networks A Comprehensive Foundation. Second Edition, Pearson Education (2001) [11] Teuvo Kohonen, Self-Organizing Maps. Springer, Berlin, Heidelberg, Third Extended Edition (2001)
Appendix Table 2: Experimental results without Mapping
Table 1: Experimental results of the Proposed approach
S.No
Image
CR
1 2 3 4 5 6 7 8 9
Cameraman Lena Pepper Fruits Boat Mandrill Abdomen(Mri) Thorax (Mri) Satellite
4.1 4:1 4:1 4:1 4:1 4:1 4:1 4:1 4:1
PSNR (dB) 29.78 28.91 29.04 29.79 29.12 29.68 29.42 29.61 29.11
S.No
TIME (Sec) 148 182 188 185 178 161 140 132 160
1 2 3 4 5 6 7 8 9
Image
CR
Cameraman Lena Pepper Fruits Boat Mandrill Abdomen(Mri) Thorax (Mri) Satellite
4.1 4:1 4:1 4:1 4:1 4:1 4:1 4:1 4:1
PSNR (dB) 26.89 26.11 26.52 26.96 26.75 27.89 26.05 26.08 26.32
(4a) (4b) (4c) (4a) Original image of Cameraman (4b) Image mapped by CDF (4c) Decompressed image by proposed method, PSNR = 29.78dB, Convergence Time = 148 Sec.
(5a) (5b) (5c) (5a) Original image of Lena (5b) Image mapped by CDF (5c) Decompressed image by proposed method, PSNR = 28.91dB, Convergence Time = 182 Sec
63
TIME (Sec) 320 365 368 361 342 318 290 265 315
GVIP Special Issue on Image Compression, 2007
(6a) (6b) (6c) (6a) Original image of Pepper (6b) Image mapped by CDF (6c) Decompressed image by proposed method, PSNR = 29.04dB, Convergence Time = 188 Sec
(7a) (7b) (7c) (7a) Original image of Fruits (7b) Image mapped by CDF (7c) Decompressed image by proposed method, PSNR = 29.79dB, Convergence Time = 185 Sec.
(8a) (8b) (8c) (8a) Original image of Boat (8b) Image mapped by CDF (8c) Decompressed image by proposed method, PSNR = 29.12dB, Convergence Time = 178 Sec
(9a) (9b) (9c) (9a) Original image of Mandrill (9b) Image mapped by CDF (9c) Decompressed image by proposed method, PSNR = 29.68dB, Convergence Time = 161 Sec
(10a) (10b) (10c) (10a) Original image of Abdomen (10b) Image mapped by CDF (10c) Decompressed image by proposed method, PSNR = 29.42dB, Convergence Time = 140 Sec
64
GVIP Special Issue on Image Compression, 2007
(11a) (11b) (11c) (11a) Original image of Thorax (11b) Image mapped by CDF (11c) Decompressed image by proposed method, PSNR = 29.61dB, Convergence Time = 132 Sec
(12a) (12b) (12c) (12a) Original image of Satellite (12b) Image mapped by CDF (12c) Decompressed image by proposed method, PSNR = 29.11dB, Convergence Time = 160 Sec
65
GVIP Special Issue on Image Compression, 2007
66
GVIP Special Issue on Image Compression, 2007
An Image compression Scheme based on Predictive and Interpolative Absolute Moment Block Truncation Coding K.Somasundaram and I.Kaspar Raj Department of Computer Science and Applications, Gandhigram Rural Institute – Deemed University Gandhigram -624 302, Tamilnadu, India
[email protected],
[email protected]
on BTC are continuing to reduce the bit rate and computational complexity by keeping the image quality to acceptable limit. In this paper we propose an image compression scheme based on AMBTC. This scheme is a combination of three techniques, prediction, bit plane omission, and interpolative bit plane coding. Experimental results show that the proposed scheme gives good image quality with low computational complexity and low bit rate. In Section 2 we briefly outline the BTC and AMBTC methods. We describe the proposed image compression scheme in Section 3. The experimental results are discussed in Section 4. Finally we conclude this paper in Section 5.
Abstract In this paper an image compression scheme based on Absolute moment block truncation coding (AMBTC) by combining prediction, bit plane omission and interpolative techniques has been proposed. Experimental results show that the proposed image compression scheme achieves a low bit rate with better picture quality and lower computational complexity than the similar schemes proposed recently. Keywords: Block Truncation Coding, Image compression, lossy compression, bit plane
1. Introduction
2. Block Truncation Coding and AMBTC
Block Truncation Coding (BTC) is a simple image compression technique, introduced by Delp and Mitchell [1]. BTC is based on the conservation of statistical properties. Although it is a simple technique, BTC has played an important role in the history of image compression. Many image compression techniques have been developed based on BTC [2]. Block truncation coding is a lossy compression method. It achieves 2 bits per pixel (bpp) with low computational complexity. It is easy to implement compared to vector quantization [3] and transform coding [4], [5]. Lema and Mitchell[6] presented a variant of BTC called Absolute Moment Block Truncation Coding (AMBTC). It preserves the higher mean and lower mean of the sub image blocks before and after compression. However the bit rate achieved by both BTC and AMBTC is 2 bpp. In order to reduce the bit rate several techniques such as median filtering [7], vector quantization [8], interpolation [9] and prediction [10] have been used to code the two statistical moments and the bit plane of BTC. Yung-Gi Wu [11] proposed a probability based block truncation image bit plane coding. Yu-Chen Hu [12] presented low bit-rate image compression scheme based on AMBTC giving a bit rate up to 0.40 bpp. Thus the improvements
In the BTC method, the given image is divided into a number of non-overlapping small blocks (normally 4 x 4 pixels). For each block the statistical moments the mean x and the standard deviation V are calculated using :
x
V
1 n
n
¦x
i
i 1
,
(1)
1 n ¦ ( x i xi ) 2 ni1
, (2) where xi represents the ith pixel value of the image block and n is the total number of pixels in that block. The two values x and V are termed as quantizers of BTC. Taking x as the threshold value a two-level bit plane is obtained by comparing each pixel value xi with the threshold. If xi x then the pixel is represented by ‘0’, otherwise by ‘1’. By this process each block is reduced to a bit plane. The bit plane along with x and V forms the compressed data. For example, a block of 4 x 4 pixels will give a 32 bit compressed data, amounting to 2 bpp.
67
GVIP Special Issue on Image Compression, 2007
In the decoder an image block is reconstructed by replacing ‘1’ s in the bit plane with H and the ‘0’s with L, which are given by:
H
p q
x V
,
block and replacing the current block by a ( 3 bit) code of the matched block. The positions of the neighboring blocks for a current block are shown in Fig.1. To find the similarity of the current block with a neighboring block the squared Euclidean distance between the two blocks given by :
(3)
n
d ( x, y ) L
x V
,
(4)
xL
1 nk
yi ) 2 ,
(7)
where xi and yi are the corresponding the ith pixels of
where p and q are the number of 0’s and 1’s in the compressed bit plane respectively. Lema and Mitchell presented a simple and fast variant of BTC, named Absolute Moment BTC (AMBTC) that preserves the higher mean and lower mean of a block. In AMBTC method, an image is divided into a number of non-overlapping blocks. The average gray level x of the pixels in a block is calculated using equation (1). Pixels in that image block are then classified into two ranges of values. The upper range is those gray levels which are greater than or equal to the block average gray level ( x ) and the remaining are brought into the lower range. The mean x H of higher range and x L of the lower range are calculated as:
1 n ¦ xi k xi t x
i
i 1
q p
xH
¦ (x
the respective blocks, is used. Upper left Block
Upper Block
Left Block
Current Block
Upper Right Block
Fig.1 Position of blocks used for prediction ,
(5) Bit plane omission technique
n
¦x
i
,
(6)
In bit plane omission technique, the bit plane obtained in an AMBTC method is omitted in the encoding procedure if the absolute difference between the higher mean x H and the lower mean x L of that block is less than a threshold value only the block mean retained. At the time of decoding the bit plane omitted blocks are replaced by a block filled with the block mean.
xi x
where k is the number of pixels whose gray level is greater than or equal to x . Then a two level quantization is performed for all the pixels in that block to form a bit plane b so that 1 is stored for pixel values grater than or equal to x and the rest of the pixels are presented by “0”. The encoder writes x H , x L and b to a file. If x H and x L are represented by 8 bits each then the total number of bits required for a block is 8+8+16 =32 bits. Thus, the bit rate for the AMBTC algorithm is 2 bpp. In the decoder, an image block is reconstructed by replacing the `1’s with x H and the ’0’'s by x L In AMBTC, we need 16 bits to code the bit plane which is same as in the BTC. But, AMBTC requires less computation than BTC.
Interpolative technique In this technique half (8 bits) of the number of the bits in the bit plane of AMBTC is dropped at the time of encoding as in Fig. 2. In decoding phase the dropped bits are recovered by taking the arithmetic mean of the adjacent pixel values as given in eqn.(9).
X1 X2 X3 X4
3. Proposed Scheme The proposed compression scheme makes use of AMBTC, prediction, bit plane omission, and interpolative bit plane coding technique. So this scheme is named as PBI_AMBTC image compression scheme.
X5 X6 X7 X8
X9 X10 X11 X12
X13 X14 X15 X16
Prediction technique Fig.2 The pattern of dropping bits. The bold faced bits are dropped
This technique makes use of correlation existing among the neighboring image blocks. The prediction technique finds a near similar block, among the neighboring left, upper , upper left and upper right blocks for the current 68
GVIP Special Issue on Image Compression, 2007
xˆ i xˆ i xˆ i xˆ i
xˆ i xˆ i xˆ i
1 ( xi 1 xi 1 xi 4 ) or i 2 3 1 ( xi 1 xi 4 ) for i 4 2 1 ( xi 4 xi 1 xi 4 ) for i 5 3 1 ( xi 4 xi 1 xi 1 xi 4 ) 4 for i 7,10 1 ( xi 4 xi 1 xi 4 ) for i 12 3 1 ( xi 4 xi 1 ) for i 13 2 1 ( xi 1 xi 1 xi 4 ) for i 15 3
7. Construct the bit plane by replacing the pixels with values greater than or equal to the mean x by ‘1’ and the rest of the pixels by ‘0’. 8. Drop a pattern of bits as shown in Fig.2. Encode the block by the remaining bits along with the lower mean x L and higher mean x H , put ‘11’ as a indicator bit as prefix code. 9. Go to step 3 until all the blocks are processed. The numbers of bits needed to represent a block of 4x4 pixel in each technique used in this scheme are given in the Table 1. Only one technique is operative in any given block.
(9)
Table 1: Bits needed to store an image block in different Techniques
This technique requires only 8 bits to store the bit plane. The steps involved in the compression scheme are as follows:
Field
Predictive
Indicator Block mean Bit plane H &L means Total Bits
3 3
Bit length Bitplane omission 2 8 10
Interpolative 2 8 6+6 22
In the bit plane coding using interpolative technique we used 6 bits to code the higher and lower mean of AMBTC.
1. Divide the given image into a set of non overlapping blocks, x of size n = 4 x 4 pixels 2. Encode the blocks starting from left to right and top to bottom sequence. 3. If the block is in first row or first column then go to step 5. 4. To check the similarity of the current block(cb) with lower block(lb), upper block(ub), upper left block(ulb) and upper right block(urb), calculate the squared Euclidean distance dl = d(cb,lb) , du = d(cb,ub) dur = d(cb,urb) and dul = d(cb,ulb) using eqn (7). Set the threshold values as Th1. Find dmin = minimum( du , dl,dur,dul).
4. Results and Discussion To evaluate the performance of the proposed image compression scheme, we took six standard monochrome images of size 512 X 512 pixels ”Jet”, “Truck”, “Girl”,” Lena”, “Peppers”, and ”Zelda” which are shown in Fig.3(a) to Fig. 3(f).
IF dmin < Th1 THEN Case dmin = dl encode the current block as 000 Case dmin = du encode the current block as 001 Case dmin = dul encode the current block as 010 Case dmin = dur encode the current block as 011 End Case Go to step 9 Else Go to step 5 End IF 5. Compute the block mean x , lower mean x L and
(a) Jet
(b) Truck
(c) Girl
(d) Lena
(e) Peppers
(f) Zelda
Fig. 3 Standard images used for experiment To implement this algorithm suitable threshold values for Th1 and Th2 are to be selected. Using the results in Ref. [14], we have taken Th2 = 20 for bit plane omission technique. To find out optimum threshold value for Th2 for prediction technique, we experimented with Th2= 20 and different values of Th1 on the six standard images.
higher mean x H for the block. 6. Fix a threshold value Th2. If x H x L <= Th2 , encode the current block with the block mean x , put ‘10’ as an indicator bit as prefix code and go to step 9. 69
GVIP Special Issue on Image Compression, 2007
The results obtained in terms of bpp and PSNR are given in the Table 2. We found that the reconstructed image quality of the compressed image using large threshold is worse than that for a small threshold. So using the results given in Table 2 we decided to take Th1 = 800 for prediction technique and Th2 = 20 as the threshold value for bit plane omission technique, so as to get a reasonably good quality picture with low bpp. For performance comparison, we also used the AMBTC method, the scheme of Yu–Chen Hu [12] (YCH [12]) and the scheme of Yu-Chen Hu [13] (YCH [13]). The computed PSNR and bpp values for the different methods are given in the Table 3. We note from Table 3 that PBI_AMBTC gives better PSNR values and lower bpp than that of YCH schemes. We observe that our PBI_AMBTC at threshold values 1600, 20 gives higher PSNR values than YCH[12] method with the same bpp. Even though the difference is small, the reconstructed images YCH[12] have blocky appearance because of the quad tree segmentation technique which starts at 16X16 pixel block. But the reconstructed images of PBI_AMBTC have no blocky appearance as the block size is 4x4 pixels. Bit plane coding using 64 edge patterns was used as final stage technique in YCH[12] scheme. But in our PBI_AMBTC scheme we used interpolative technique in the final stage which improves the quality of the reconstructed image. The PBI_AMBTC method also achieves better quality images, in terms PSNR values, with lower bit rate at the thresholds values Th1,Th2 =1600,20 than the YCH[13] scheme. We observe that the PBI_AMBTC method has an average bit rate of 0.45 with an average PSNR value of 31.46 at the threshold values Th1,Th2 =800, 20. But at the same time the YCH [13] has an average bpp of 0.57 with an average PSNR value of 29.97 at the threshold values Th1,Th2 =800 and 3. Hence one can obtain better quality image and with better compression when compared to the YCH schemes. The YCH[13] is based on BTC and it also used bit map coding using 64 predefine bit plane patterns. But our PBI_AMBTC is based on AMBTC and it used interpolative technique which improves the quality with low bit rate compared to YCH[13] scheme. We note from Table2 that PBI_AMBTC gives a lower bit rate of 0.39 bpp and higher PSNR value for girl image than the AMBTC method. Therefore appropriate
selection of threshold values in this scheme, a compression up to 0.29 bpp can be achievable. An average image quality of with PSNR value 0f 31.46 with an average bit rate of 0.45 bpp can be achievable using this PBI_AMBTC image compression scheme. The reconstructed images of PBI_AMBTC image compression scheme is given in Fig. 4(a) to Fig. 4(f), with Th1, Th2= 800,20.
(a) Jet
(b)Truck
(c) Girl
(d) Lena
(e) Peppers
(f) Zelda
Fig. 4 The reconstructed Standard Images using PBI_AMBTC
5. Conclusion A low bit rate image compression scheme by combining prediction, bit plane omission, and interpolative techniques and by extending the AMBTC has been proposed. Experimental results of our scheme on standard images show that an average bit rate of 0.45 bpp giving an average PSNR value of 31.46 is achievable. Our scheme gives lower bpp and higher PSNR values than the methods of YCH[12] and YCH[13].Even though the reconstructed image quality of the proposed schemes is lower than the JPEG or JPEG200 at the same bit rate, our scheme has low computational complexity. nd Therefore it may be useful for the low cost mobile devices which have low computation power to handle images.
Table 2: PSNR and bpp values for different threshold values Threshold Images Jet Truck Girl Lena Peppers Zelda Average
Th1,Th2=200,20 Bpp PSNR 0.54 30.53 0.69 30.97 0.45 34.63 0.61 31.00 0.62 30.54 0.57 33.31 0.58 31.83
=400,20 Bpp PSNR 0.50 30.45 0.63 30.92 0.39 34.37 0.54 30.93 0.52 30.47 0.48 33.13 0.51 31.71
=800,20 Bpp PSNR 0.47 30.30 0.56 30.79 0.34 33.89 0.49 30.77 0.45 30.29 0.40 32.72 0.45 31.46 70
=1200,20 Bpp PSNR 0.45 30.15 0.52 30.63 0.31 33.42 0.45 30.59 0.42 30.10 0.37 32.31 0.42 31.20
=1600,20 Bpp PSNR 0.44 29.87 0.48 30.44 0.30 32.85 0.43 30.41 0.40 29.88 0.35 31.94 0.40 30.90
=2000,20 Bpp PSNR 0.43 29.72 0.45 30.21 0.29 32.47 0.42 30.19 0.38 29.71 0.33 31.62 0.38 30.65
GVIP Special Issue on Image Compression, 2007
Table 3: PSNR and bpp values for different methods on Standard images Method Images Jet Truck Girl Lena Peppers Zelda Average
AMBTC BPP 2.0 2.0 2.0 2.0 2.0 2.0 2.0
PSNR 31.42 34.42 33.95 33.25 33.44 36.74 33.87
YCH[12] Th1,Th2,Th3=10,15,20 BPP PSNR 0.43 28.78 0.51 29.82 0.26 32.85 0.44 29.10 0.42 29.28 0.35 31.41 0.40 30.21
YCH[13] Th1,Th2=800,3 BPP PSNR 0.51 27.69 0.79 29.91 0.42 33.19 0.57 28.47 0.61 28.54 0.54 32.03 0.57 29.97
PBI_AMBTC =1600,20 BPP BPP 0.44 29.87 0.48 30.44 0.30 32.85 0.43 30.41 0.40 29.88 0.35 31.94 0.40 30.90
PBI_AMBTC =800,20 BPP PSNR 0.47 30.30 0.56 30.79 0.34 33.89 0.49 30.77 0.45 30.29 0.40 32.72 0.45 31.46
Engineering, SPOIE Vol. 42 No. 7 pp 1964-1975, 2003. 13. Yu-Chen Hu , ”Predictive moment preserving block truncation coding for gray level image compression”, Journal of electronics Imaging, Vol.13(4), pp. 871-877, 2004. 14. K.Somasundaram and I.Kaspar Raj, “Low Computational Image compression scheme based on Absolute Moment Block Truncation Coding”, Enformatika Transactions on Engineering, Computing and Technology Vol. 13, pp.184-190, May 2006, ISSN 1305-5313.
References 1.
E.J. Delp, O.R. Mitchell, “Image Compression using Block Truncation Coding”, IEEE, Trans. Communications , Vol . 27, pp.1335-1342, September 1979 2. P. Franti, O.Nevalainen and T.Kaukoranta, “Compression of digital images by block truncation coding: a survey.”, The computer Journal, Vol. 37 issue 4, pp 308-332, 1994 3. N.M. Nasrabadi, R.B. King, Image coding using vector quantization: a review, IEEE Transactions on Communications COM-36, pp. 957-971, 1998. 4. B. Pennebaker, and J.L.Mitchell, JPEG Still Image Data compression Standard., New York, Van Nosttrand Reinhold,1993. 5. M. Rabbani and R. Joshi, ‘‘An overview of the JPEG 2000 still image compression standard,’’ Signal Process. Image Commun. Vol. 17, pp. 3–48, 2002. 6. M.D.Lema , and O.R.Mitchell, “Absolute Moment Block Truncation Coding and its Application to Color images, “ IEEE Trans. On Communications, Vol. 32, pp. 1148-1157,1984 7. Arce and N.C. Jr.Gallagaher, “BTC image coding using median filter roots”, IEEE Transaction on communications, 31, (6), pp. 784-793, 1983. 8. Udpikar, and J.Raina, “BTC image coding using vector Quantization”, IEEE Transactions on Communications., Vol. 35, pp. 352-56,1987 9. B.Zend and Y.Neuvo, “Interpolative BTC image coding with Vector Quantization”, IEEE Transations on Communications, Vol. 41 pp. 14361438 ,1993. 10. Y.V.Ramana , and C. Eswaran ,” A new algorithm for BTC image bit plane coding “ IEEE Trans on Communications. Vol. 43, No.6, pp. 2010-2011, June 1995. 11. Yung-Gi Wu ,”Block Truncation image Bit plane coding , SPOIE, Optical Engineering Vol. 41(10), pp. 2476-2478, 2002
K.Somasundaram received his Ph.D. degree from Indian Institute of Science, Bangalore, India, in 1984, the M.Sc. degree from Madras University, Tamilnadu, India in 1976, and the PGDCM from Madurai Kamaraj University, Tamilnadu,India in 1989. He is a Professor and Head of the Department of Computer Science and Applications of the Gandhigram Rural University, Gandhigram, Tamilnadu,India. His areas of research include Multimedia, Data Compression, Multimedia for Teaching, PC-Based instrumentation and Computational Plasma Physics. I.Kaspar Raj received M.Sc Degree in Statistics from Madras University. Madras, Tamilnadu, India in 1984 and received his PGDCM from Madurai Kamaraj Universiy, Tamilnadu ,India in 1989. He is currently doing his Ph.D in the Department of Computer Science and Application, Gandhigram Rural University, Gandhigram, TamilNadu, India. His area of research is image compression.
12. Yu-Chen Hu, “Low complexity and low bit-rate image compression scheme based on Absolute Moment Block Truncation Coding”, Optical
71
GVIP Special Issue on Image Compression, 2007
72
GVIP Special Issue on Image Compression, 2007
Image Compression using Lifting and Vector Quantization A.Vasuki* and P.T.Vanathi** * Dept. of Electronics and Communication, Kumaraguru College of Technology, Coimbatore – 641006. India. ** Dept. of Electronics and Communication, PSG College of Technology, Coimbatore – 641004. India.
[email protected],
[email protected] discarded during the compression process [1]. In the case of lossless compression, the recovered data is identical to the original, whereas, in the case of lossy compression, the recovered data is a close replica of the original. For data like text (ex. bank records), even a change of a single character can be disastrous. Similarly, for medical or satellite images, if there is any loss during compression, it can lead to artifacts in the reconstruction that may give wrong interpretation. Therefore, such applications require lossless compression. Lossy compression can be used for signals like speech, natural images, etc., where the amount of loss in the data determines the quality of the reconstruction and does not lead to change in the information content. More compression is achieved in the case of lossy compression than lossless compression. Any compression algorithm is acceptable provided a corresponding decompression algorithm exists.
Abstract Image compression is an important aspect of multimedia data transmission, especially through bandlimited, time-varying channels. The cost and limitation in bandwidth of wireless channels has made data compression a necessity. Moreover, due to the constraints of wireless channels, progressive image transmission has gained widespread acceptance. Wavelet-based image coding lends itself naturally for progressive transmission, where the important low frequency information in the image is transmitted first, followed by the less important high frequency details. In this paper, the coding of images for progressive transmission is addressed, by applying a lifting wavelet transform, and vector quantization of the transform coefficients. Lifting wavelet transform has its advantages over the ordinary wavelet transform by way of reduction in memory required for its implementation. This is made possible because lifting transform uses in-place computation. The lifting coefficients replace the image samples present in the respective memory locations. The Haar wavelet has been used here because of its simplicity, but it can be extended to other wavelets that may give better results. The performance of the system has been evaluated based on the compression ratio, bits-per-pixel and the peak signal-to-noise ratio with different codebook sizes for the various subbands.
1. Introduction
The different compression algorithms can be compared based on certain performance measures. Compression Ratio (CR) is defined as the ratio of the number of bits required to represent the data before compression to the number of bits required after compression. Rate is the average number of bits per sample or pixel (bpp), in the case of image. Distortion is quantified by a parameter called Mean Square Error (MSE). MSE refers to the average value of the square of the error between the original signal and the reconstruction. The important parameter that indicates the quality of the reconstruction is the peak signal-to-noise ratio (PSNR). PSNR is defined as the ratio of square of the peak value of the signal to the mean square error, expressed in decibels.
Compression refers to the representation of any data in compact form. It is achieved by exploiting the redundancy present in the data. Transmission and storage of multimedia data like text, speech, images, etc. is made efficient by compression. Compression can be classified into lossless or lossy, depending on whether all the information is retained or some of it is
Images contain high redundancy due to the correlation between adjacent pixels and this makes them suitable for compression [2]. Redundancy removal in images can be achieved by using a transform that operates on the image, resulting in a matrix of coefficients that are not correlated with
Keywords: Image Compression, Progressive, Wavelets, Lifting, Haar, Vector Quantization, Compression Ratio, Bits-Per-Pixel, Peak Signal-toNoise Ratio.
73
GVIP Special Issue on Image Compression, 2007
each other. The image is sparsely represented by the coefficients of the transformation, which leads to compression.
In this paper, images have been compressed using lifting wavelet transform with VQ of the resulting coefficients. The idea behind this is that progressive compression of images is made possible as well as more compression can be achieved by using VQ, since some loss can be tolerated. The lifting wavelet transform is a subband coding technique, and it exploits the fact that all subbands do not carry information of equal importance. Lifting uses inplace computation; so reduces memory requirements. It is a fast method, since all operations within a lifting step can be done in parallel. Also, integer-to-integer transform is possible. Therefore it is suitable for realtime applications, especially for hardware implementations, as well as lossless compression. The existing wavelet-based progressive compression algorithms like Embedded Zero Tree Coding and Set Partitioning in Hierarchical Trees take more time for encoding as compared to lifting with VQ. For applications that require faster encoding / decoding, this algorithm is suitable. Moreover, adaptive wavelet transforms are possible using lifting.
Wavelet-based techniques are the latest development in the field of image compression. It offers multiresolution capability that is not available in any of the other methods. The wavelet transform analyzes a signal in time and scale. The low frequency components in the signal that are spread out in time, as well as the high frequency components that are localized in time are captured by a variable-length window [3]. The window is shifted by different units of time in a discrete manner, thus covering the entire signal. Lifting is a technique used in constructing second generation wavelets, entirely in the spatial domain [4]. The first generation wavelets are translates and dilates of a single mother wavelet, and Fourier techniques are essential in their construction. The lifting scheme does not use the Fourier technique [5]. It is a very fast method compared to the first generation wavelets. Moreover, the inverse transform is obtained by replacing addition with subtraction and reversing the operations in the forward transform.
In the paper by M.Antonini, et al. [8], images have been coded using two-level wavelet decomposition with VQ of the resulting coefficients. A multiresolution codebook has been designed with noise-shaping bit allocation among the various subbands. The test image has been coded at a rate of 0.78bpp, achieving a PSNR of 32.1dB. In the paper by Gersho and Ramamurthy [9], images have been compressed using unstructured VQ, achieving a bitrate of 0.5 – 1.5bpp. Ho and Gersho [10] have used multistage VQ for progressive image coding, with a PSNR of 30.93dB at 0.36bpp using 4 stages. R.L.Claypoole et al. [11] have coded images using nonlinear wavelet transform via lifting and obtained 30.5dB at 0.6bpp. An adaptive lifting scheme with perfect reconstruction has been used in [12].
The goal of quantization is to encode the data from a source, with some loss, so that the best reproduction is obtained. Vector quantization (VQ) achieves more compression then scalar quantization [6], making it useful for band-limited channels. The algorithm for the design of optimal VQ is commonly referred to as the Linde-Buzo-Gray (LBG) algorithm, and it is based on minimization of the squared-error distortion measure. The LBG algorithm starts with an initial codebook and iteratively partitions the training sequence into the Voronoi regions to obtain a new codebook that produces a lower distortion. Once the final codebook is obtained, it can be used on new data outside the training sequence with the optimum nearest neighbor rule. If the training sequence is sufficiently long, it yields good performance for future data produced by the source.
This paper is organized as follows. Section (2) gives the basics of wavelet transform. Section (3) deals with the lifting technique and Section (4) covers Vector Quantization. Section (5) describes the image compression methodology used in this paper. The results are given in Section (6) and the paper concludes with Section (7).
In Discrete Cosine Transform (DCT) based methods, the image is transformed to pack the energy into few coefficients. But the image has to be subdivided, which produces blocking artifacts. The Principal Component Analysis (PCA) is another transformation technique that is optimal in minimizing the MSE [7], but is dependent on the image and is more computationally intensive. In this proposed method, lifting wavelet transform is used that does not subdivide the image and hence no blocking artifacts appear in the reconstruction. Moreover, it is computationally efficient compared to the other transform methods. The novelty of this paper is that it has combined the speed and computational efficiency of lifting with the compression achieved by VQ.
2. Wavelet Transform The wavelet transform is a powerful tool for analysis and compression of signals. It uses two functions – scaling and wavelet functions – for the transform. The wavelet function is equivalent to a high pass filter and produces the high frequency components (details) of the signal at its output [13]. The scaling function is equivalent to a low pass filter and passes the low frequency components (approximations) of the signal. The wavelet coefficients are retained and represent the details in the signal. The scaling coefficients are decomposed further using another set
74
GVIP Special Issue on Image Compression, 2007
of low pass and high pass filters. This sort of decomposition is called pyramidal decomposition [14]. The wavelet transform can be implemented with different types of wavelets, where the type of wavelet determines the filter design [15].
the average component of the signal being the input to the next stage till the number of samples reduces to one. This last sample represents the average component of the entire image. The total number of coefficients after the transform equals the number of coefficients before the transform.
Wavelet transform coefficients represent information in specific locations of the image [16]. The building blocks (wavelet bases) must be smooth in order to code smooth regions of an image. If the wavelet contains discontinuities, coefficient quantization errors cause the discontinuities to appear as artifacts in the smooth regions of the decoded image. The proper choice of wavelet for the application can reduce these artifacts to some extent. It is possible to build wavelets that have a smooth, compact support. Edges or sharp changes in the image produce wavelet coefficients with large amplitudes. So the wavelet should have a small support and the filter length must be kept small. The choice of optimal wavelet bases depends on the smoothness, accuracy of approximation, size of support, filter frequency selectivity, etc.
Approximations sj-1 +
Evenj-1 Input sj Split
Predict
Update
Oddj-1
Details dj-1
Figure 1(a). Forward Lifting Transform
The inverse transform gets back the original signal by exactly reversing the operations of the forward transform; finally, with a merge in place of a split operation. The number of samples in the input signal must be a power of two, and in each succeeding step, it is reduced by half, till the last step produces one sample. Evenj-1 = sj-1 - U(dj-1) Oddj-1 = dj-1 + P(Evenj-1) Finally, sj = Merge(Evenj-1, Oddj-1)
3. Lifting The simplest lifting scheme is the lazy wavelet transform, where the input signal is first split into even and odd indexed samples. (Oddj-1, Evenj-1) = Split(sj) When the samples are correlated, it is possible to predict one from the other. The odd samples are predicted from the even samples, which, in the case of the Haar transform, are the even values themselves. The difference between the actual odd samples and the prediction becomes the wavelet coefficients. The operation of obtaining the differences from the prediction is called as a lifting step. The update step follows the prediction step, where the even values are updated from the input even samples and the updated odd samples. They become the scaling coefficients, which will be passed on to the next stage of the transform. This is the second lifting step. The forward and inverse lifting transform is shown in Figure 1(a) and (b). dj-1 = Oddj-1 – P(Evenj-1) sj-1 = Evenj-1 + U(dj-1) The odd elements are replaced by the differences and the even elements by the averages. The computations in the lifting scheme are done in-place, which saves a lot of auxiliary memory and computation time. The lifting scheme can be used to produce integer coefficients and so it is exactly reversible. The averages are the coarser representations of the signal and the differences are the extra information required to go from the coarser representation to the original [17]. If the signal is smooth, the detail components will be small and they can be ignored. Then the coarse representation is an approximation of the original and also a compressed version of it with some distortion. The process can be repeated with
Approximations sj-1 -
Evenj-1
Output sj Update
Predict
Merge
+ Details dj-1
Oddj-1 Figure 1(b). Inverse Lifting Transform
The Haar wavelet transform uses predict and update operations of order one; it exploits the correlation between adjacent samples. Using different predict and update operations of order higher than one, other wavelet transforms can be built using the lifting scheme.
4. Vector Quantization In VQ, k successive source samples are quantized together, since more compression can be achieved by coding vectors. VQ is characterized by a kdimensional partition, k-dimensional codebook and an assignment of binary codewords to the cells of the partition [18]. The design and performance of the quantizer depends significantly on the method used to
75
GVIP Special Issue on Image Compression, 2007
extract the vectors. In images, the vector is obtained by parsing the image into rectangular blocks of size mn where k = m x n. For a fixed dimension k, the squared error distortion measure is defined as,
distributions the algorithm always converges to a fixed-point quantizer in a finite number of steps. In this algorithm, the distortion will not increase from one iteration to the next, but it may not yield an optimum solution. The convergence of the algorithm depends on the initial conditions. So a splitting technique has been proposed for initializing the design algorithm. A one-level vector quantizer is designed with a codebook of size one, which is the average of all the training vectors in the set. The initial codebook for a two-level vector quantizer is obtained by retaining the original output, and the second one obtained by adding a fixed perturbation vector H to it. After convergence, the two-level VQ is used to yield a four-level VQ in a similar manner. The above procedure is repeated till a codebook of the required size is obtained. The codebook size increases exponentially with the rate of the quantizer. Increasing the size of the codebook increases the quality of the reconstruction, but the encoding time also increases due to the increase in the computations required to find the closest match. So it is a tradeoff between the codebook size and PSNR obtained.
k
¦ | x i xˆ i |2
d(x, xˆ )
i 1
where x is the source vector and xˆ is the reproduction vector. The input to a k-dimensional VQ is x = (x1, x2, ….. xk) from some alphabet A Rk . The three components of a quantizer are: lossy encoder D: A o I, where the index set I is a collection of consecutive integers, reproduction decoder E: I o Aˆ where Aˆ Rk is the reproduction alphabet, and a lossless encoder J: I o M, an invertible mapping into a collection M of binary codewords. The set M consists of 2R binary codewords. Then R = ªlog2Mº bits. The rate of the quantizer is R bits per vector or R/k bits per pixel. The quantization rule is the function q(x) = E(D(x)) or q(x) = E(i), where x Si. {Si; i I} is the partition of the source A, where Si = {x: D(x) = i}.
5. Present Work
The codebook design algorithm for the case where the training set is available is as follows:
In this paper, the given image, whether it is color or monochrome, is converted into an indexed image with linear colormap. The indexed image is transformed using the lifted Haar wavelet, up to three levels of decomposition. The resulting transform coefficients are encoded using unstructured, fullsearch VQ. The codebook has been designed using the iterative LBG algorithm and it has been initialized using the splitting technique. The images ‘coastline’ and ‘boat’ have been used for testing.
(1) Start with a set of training vectors ^x n `nN 1 and an initial set of reconstruction values M
y (0) ½ . Set the iteration number q = 0, ® i ¾ ¿i 1 ¯ (0) D = 0 and select threshold . M
(2) The quantization regions ®Vi(q) ½¾ are given ¯ ¿i 1 by, Vi(q)
The lifting coefficients at each resolution for every level of decomposition are encoded with a separate codebook. At each resolution, three subcodebooks have been designed for encoding the horizontal, vertical and diagonal details separately. The codebooks have been designed with a sample training set of ten images. The minimum size of the subcodebook designed is 4 and the maximum size is 2048. The size of the codebook becomes optimal, when there is not much increase in PSNR with increase in codebook size. The number of bits required for each code vector is ªlog2Mº bits where M is the size of the respective subcodebook. The decoder uses the same set of codebooks as at the encoder. The encoder-decoder will work for any image within the training set used to design the codebook. If the training set is extended, it can be used for other images outside the set also. Designing the same codebook at the receiver makes it unnecessary to transmit the codebook along with the image. The decoder uses table look up to find the reproduction code vector, with the index received as
{xn : d(xn , y i ) d(xn , y j ) j z i}
i = 1, 2, …..M (3) Compute the average distortion D(q) between the training vectors and the representative reconstruction values. (4) If
(D(q) D(q 1) ) D(q)
,
stop.
Otherwise,
continue. (5) q = q+1.
Find new reconstruction values
M
y (q) ½ that are the averages of the ® i ¾ ¿i 1 ¯ elements in each of the quantization regions (q 1) Vi .
Go to step 2.
The algorithm is continued till the distortion falls below some small threshold . The threshold can be selected arbitrarily to be some small value. In the original paper by Linde et al. [19], the distortion threshold can be any value 0. The necessary condition [20] for a quantizer to be optimal is that it be a fixed-point quantizer. For finite alphabet
76
GVIP Special Issue on Image Compression, 2007
the address of the codebook. The reproduction code vectors for each of the indices transmitted are concatenated to form the reproduction matrix of the lifting coefficients of the image. The inverse lifting transform is then applied to reconstruct the image. The most important low frequency information present in the approximation subband is coded first. Then the horizontal, vertical and diagonal details are coded, in that order. This process is repeated for every resolution in the case of level 2 and level 3 decompositions. The encoding / decoding can be terminated at any time, with the best reproduction obtained up to that point. This is made possible because of the progressive nature of the coding algorithm.
compression for the same quality. When equal number of bits are allotted for every subband, compression increases with the level of decomposition only above 0.5bpp. As shown in Figure 2(a), the three curves cross at approximately 0.5bpp. Table 1 Level 1 decomposition Codebook Size Approx. Details 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 1024 512 1024 256 1024 128 512 1024 256 1024 128 1024
The quality of the reproduced image and the performance of the system has been evaluated using the codebook size, CR, bpp and PSNR. The performance evaluation has been done for three levels of decomposition – level 1, level 2 and level 3. The results for each of the above three levels are compared.
CR
bpp
36.57 24.38 18.29 14.63 12.19 10.45 9.14 8.13 7.31 7.64 8.00 8.39 7.76 8.26 8.83
0.2188 0.3281 0.4375 0.5469 0.6563 0.7656 0.8750 0.9844 1.0938 1.0469 1.0000 0.9531 1.0313 0.9688 0.9063
PSNR (dB) 18.61 22.51 23.95 24.72 25.40 26.08 26.82 27.74 29.00 28.21 27.66 27.28 28.41 27.81 27.17
Table 2 Level 2 decomposition
6. Results The results of coding the coastline image are given below. The size of the image taken is 256 x 256. The size of the vector for the approximation coefficients is 2x2 and detail coefficients is 4x4. The CR, bpp and PSNR(dB) of the decoded image for different codebook sizes has been tabulated in Table 1, Table 2 and Table 3 for levels 1, 2 and 3 respectively. The graph, PSNR(dB) vs bpp, for each of the three levels of decomposition, with equal number of bits allotted for every subband is shown in Figure 2(a). Figure 2(b) shows the PSNR(dB) vs bpp graph, with unequal number of bits allotted for the various subbands. The graphs have been plotted for the coastline image. The original and reconstructed image, coastline, for level 3 decomposition, with each subcodebook of size 2048 is shown in Figure 3(a) & (b). Similarly, Figure 4(a) & (b) shows the original and reconstruction for the boat image.
Codebook Size Level 2 Level 1 Details Approx. Details 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128 256 256 256 512 512 512 1024 1024 1024 2048 2048 2048 1024 1024 512 1024 1024 128 1024 1024 32 1024 512 512 1024 256 512 1024 128 512 512 1024 1024 256 1024 1024 128 1024 1024
The quality of the reconstructed image depends on the level of decomposition and size of the codebook. It also depends on the code vector dimension. Increasing the vector dimension for the scaling coefficients introduces much higher distortion than for the detail coefficients. More the size of the codebook, better the quality, but it correspondingly increases the rate. More compression is achieved as the level of decomposition is increased, provided the number of bits distributed among the various subbands is optimal. As the level of decomposition is increased, the energy of the image is compacted into fewer coefficients. It has been found that allotting more number of bits for the highest resolution subbands, with varying, lesser number of bits for the other subbands, produces the least distortion or more
CR
bpp
PSNR (dB)
53.89 35.93 26.95 21.56 17.96 15.40 13.47 11.98 10.78 9.80 11.51 13.30 15.75 11.70 11.91 12.12 11.01 11.25 11.51
0.1484 0.2227 0.2969 0.3711 0.4453 0.5195 0.5938 0.6680 0.7422 0.8164 0.6953 0.6016 0.5078 0.6836 0.6719 0.6602 0.7266 0.7109 0.6953
18.25 21.27 22.42 23.10 23.73 24.38 25.16 26.22 27.86 30.14 27.26 26.50 25.98 26.60 26.04 25.67 27.34 26.85 26.20
The comparison of results of DCT with VQ and Lifting with VQ are presented in Table 4. The performance has been compared for the coastline image with 8x8 vectors for the DCT coefficients and 2x2 vectors for approximation, 4x4 vectors for details for level 3 lifting decomposition. DCT with VQ is not progressive, unlike lifting with VQ. The codebooks have been designed with a distortion threshold of 0.1. If the distortion threshold is reduced, the PSNR may slightly increase or decrease, since the codebook designed is nearly optimal. Table 5 shows the increment in PSNR(dB) for the coastline image, for increments in bpp in the different subbands. The increment in bpp corresponds to an increase in the size of the corresponding
77
GVIP Special Issue on Image Compression, 2007
(sub)codebook from 2n to 2n+1. The codebook size increases exponentially for an increase in the index by one bit, i.e. ‘n’ to ‘n+1’.
32 30
The software programming environment used is MATLAB 7, Release 14, running on Win XP platform.
psnr(dB)
28
Table 3 Level 3 decomposition Codebook Size L3 L2 D A D 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128 256 256 256 512 512 512 1024 1024 1024 2048 2048 2048 1024 512 1024 1024 256 1024 1024 128 1024 1024 1024 512 1024 1024 256 1024 1024 128 1024 1024 1024 1024 1024 1024 1024 1024 1024 512 1024 1024 256 1024 1024 128 1024 1024
CR L1 D 4 8 16 32 64 128 256 512 1024 2048 1024 1024 1024 1024 1024 1024 512 128 32 1024 1024 1024
bpp
61.13 40.76 30.57 24.45 20.38 17.47 15.28 13.59 12.23 11.12 12.28 12.34 12.39 12.45 12.68 12.92 13.17 15.57 19.05 12.30 12.37 12.45
PSNR (dB)
0.1309 0.1963 0.2617 0.3271 0.3926 0.4580 0.5234 0.5889 0.6543 0.7197 0.6514 0.6484 0.6455 0.6426 0.6309 0.6191 0.6074 0.5137 0.4199 0.6504 0.6465 0.6426
26 24 22 level 1 level 2 level 3
20
17.41 20.19 21.05 21.85 22.40 23.06 24.21 25.71 27.75 29.93 27.22 26.37 25.64 27.01 26.37 25.98 27.14 26.40 25.88 27.20 26.49 25.64
18
0.2
0.4
0.6 0.8 bpp
DCT Lifting
Codebook Size 2048 2048
CR
bpp
PSNR(dB)
46.55 11.12
0.17 0.72
26.61 29.93
Table 5 Change in PSNR with bit-rate Level of Decomposition Increase in bit-rate (bpp) Approx. (r1) 0.05 Level 1 Details (r1) 0.05
Level 2
Level 3
Figure 3(a). Original Image
Increase in PSNR (dB) 0.6
0.5
Approx. (r2) Detail (r2) Details (r1)
0.01 0.01 0.1
0.6 0.6 0.8
Approx. (r3) Detail (r3) Details (r2) Details (r1)
0.003 0.002 0.01 0.09
0.8 0.8 0.6 0.6
1.2
Figure 2(a). PSNR vs bpp (equal rate allocation)
Table 4 Comparison of DCT with Lifting Method
1
Figure 4(a). Original Image
78
GVIP Special Issue on Image Compression, 2007
7. Conclusion The test image ‘coastline’ has been progressively compressed using lifting wavelet decomposition and unstructured VQ of the coefficients. The coding has been done up to three levels of decomposition and it has been found that increasing the level of decomposition leads to better compression, with unequal rate allocation among the different subbands. The bit-rate depends on the size of the multiresolution codebook. The quality of the reconstruction depends on the rate allocation among the different subbands and also the code vector dimension. The algorithm can be applied to a standard set of images, if the same codebook is designed at the transmitter and receiver. It can also be used for other images, provided the training sequence is sufficiently long. The codebook need not be transmitted with every image, reducing the overhead. It is advantageous over other wavelet-based progressive techniques since it takes less encoding and decoding time. The encoding time is in the order of a few minutes, depending on the size of the codebook; decoding time is of the order of seconds. The algorithm is suitable for real-time applications, especially in hardware, because of the simplicity and savings in memory achieved by lifting.
Figure 3(b). Reconstructed Image bpp = 0.72, PSNR = 29.93 dB
The exponential growth in size of the codebook is one of the disadvantages of VQ. The work can be further extended by increasing the dimension of the code vectors for the details in different levels of the decomposition. This may lead to more compression at the cost of slight increase in distortion. Also, an optimization algorithm can be used for rate allocation among the different subbands that will minimize the distortion or maximize the PSNR. Using wavelets other than Haar may also lead to better image quality. Figure 4(b). Reconstructed Image bpp = 0.72, PSNR = 28.87 dB
32
8. References
30
[1] Khalid Sayood, Introduction to Data Compression, Harcourt India Private Limited, New Delhi, 2nd edition, 2000. [2] Subhasis Saha, Image Compression – from DCT to Wavelets : A Review, ACM Crossroads Student Magazine, 2000. [3] C.S.Gargour and V.Ramachandran, A Scheme for Teaching Wavelets at the Introductory Level, ASEE / IEEE Frontiers in Education Conference, 1997. [4] W. Sweldens, Wavelets and the lifting scheme: A 5 minute Tour, Zeitschrift für Angewandte Mathematik und Mechanik, Vol. 76 (Suppl. 2), pp. 41-44, 1996. [5] Wavelets: Seeing the Forest and the Trees, Beyond Discovery: The Path from Research to
psnr(dB)
28 26 24 22 level 1 level 2 level 3
20 18
0.2
0.4
0.6 0.8 bpp
1
Figure 2(b). PSNR vs bpp (unequal rate allocation)
79
GVIP Special Issue on Image Compression, 2007
Human Benefit, December 2001. http://www.nationalacademies.org/publications/ [6] I. Daubechies and W. Sweldens, Factoring Wavelet Transforms into Lifting Steps, Journal of Fourier Analysis and Applications, Vol. 4, No. 3, pp. 245-267, 1998. [7] Robert M. Gray, Vector Quantization, IEEE Acoustics, Speech and Signal Processing Magazine, Vol. 1, pp. 4-29, April 1984. [8] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Pearson Education Pvt. Ltd., New Delhi, 2nd Edition. [9] M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies, Image Coding using Wavelet Transform, IEEE Transactions on Image Processing, Vol. 1, No. 2, pp. 205-220, April, 1992. [10] A.Gersho, B.Ramamurthy, Image Coding Using Vector Quantization, Proceedings of IEEE International Conference On Acoustics, Speech and Signal Processing, pp. 428-431, May 1982. [11] Y.Ho, A.Gersho, Variable-Rate Multistage VQ for Image Coding, Proceedings of IEEE International Conference On Acoustics, Speech and Signal Processing, pp.1156-1159, 1988. [12] R.L.Claypoole, Jr., G.M.Davis, W.Sweldens, R.G.Baraniuk, Nonlinear Wavelet Transforms for Image Coding via Lifting, IEEE Transactions on Image Processing, Vol. 12, No.12, pp. 1449 – 1459, Dec. 2003. [13] G.Piella, H.J.A.M.Heijmans, An Adaptive Update Lifting Scheme with Perfect Reconstruction, IEEE Transactions on Signal Processing, Vol. 50, No. 7, pp. 1620-1630, July, 2002. [14] M.A. Cody, The Fast Wavelet Transform, Dr. Dobb’s Journal, pp. 16-28, April 1992. [15] Amara Graps, An Introduction to Wavelets, IEEE Computational Science and Engineering, Vol. 2, No. 2, pp. 50-61, June 1995. [16] http://www.mathworks.com [17] Geoff Davis, Aria Nosratinia, Wavelet-Based Image Coding : An Overview, Applied and Computational Control, Signals and Circuits, Vol. 1, No. 1, Spring 1998. [18] Wim Sweldens and Peter Schröder,
[21] R.M.Gray, J.C.Kieffer, Y.Linde, Locally Optimal Block Quantization for sources without a statistical model, Stanford University Information Systems Lab Technical Report No. L-904-1, Stanford, CA, May 1979.
Building your own wavelets at home, Wavelets in Computer Graphics, ACM SIGGRAPH Course Notes, 1996. [19] Robert M. Gray and David L.
Neuhoff, Transactions
Quantization, IEEE on Information Theory,
Vol. 44, No. 6, pp. 2325– 2384, October 1998. [20] Yoseph Linde, Andre`s Buzo, Robert M.Gray, An Algorithm for Vector Quantizer Design, IEEE Transactions on Communications, Vol. COM-28, No. 1, pp. 8495, January 1980.
80
GVIP Special Issue on Image Compression, 2007
P.T.Vanathi she obtained her Bachelor’s degree in Electronics and Communication Engineering and Master’s degree in Computer Science from PSG College of Technology in the year 1985 and 1991 respectively. She has a Ph.D in the area of Speech Coding and has so far published 8 papers in Journals and 45 papers in National and International Conferences. Her areas of interest are VLSI Design and Speech Recognition and she has over 20 years of teaching experience. She is currently working as Asst. Professor in the Dept. of ECE, PSG College of Technology, Coimbatore. India. .
A.Vasuki She obtained her B.E. degree in Electronics and Communication Engineering from PSG College of Technology in the year 1989. She obtained her Master’s degree in Applied Electronics from Coimbatore Institute of Technology in 1991. She has published 13 papers in conferences and her research interest is in the field of Image Compression. She is currently working as Asst. Professor in the Dept. of ECE, Kumaraguru College of Technology, Coimbatore. India.
81
GVIP Special Issue on Image Compression, 2007
82
GVIP Special Issue on Image Compression, 2007
Designing Quantization Table for Hadamard Transform based on Human Visual System for Image Compression K.Veeraswamy* , S. Srinivaskumar#, B.N.Chatterji§ *Research scholar, ECE Dept., JNTUCE, Kakinada, A.P, India. #Professor ,ECE Dept., JNTUCE, Kakinada, A.P, India. § Former Professor, E&ECE Dept., IIT, Kharagpur,W.B, India. [
[email protected],
[email protected],
[email protected]]
the transform domain [1]. Wavelet Transform (WT) [2,3] and Contourlet Transform (CT) [4] are used as a method of information coding. The ISO/CCITT Joint Photographic Experts Group (JPEG-2000) has selected the WT for its baseline coding technique [5]. The Discrete Cosine Transform has attracted widespread interest as a method of information coding. JPEG has selected the DCT for its baseline coding technique [6]. In this work, Hadamard transform is used for image compression. The elements of the basis vectors of the Hadamard Transform take only the binary values r 1 and are, therefore, well suited for digital hardware implementations of image processing algorithms. Hadamard Transform offers a significant advantage in terms of a shorter processing time as the processing involves simpler integer manipulation (compared to floating point processing with DCT) and the ease of hardware implementation than many common transform techniques. So it is computationally less expensive than many other orthogonal transforms. Integer transforms are very much essential in video compression. Hadamard Transform is used for video compression [7]. The quantization table is central to the compression. Several approaches have been tried in order to design quantization tables for particular distortion or rate specifications. The most common of these is to use a default table and scale it up or down by a scalar multiplier to vary the quality and compression. Methods for determining the quantization table are usually based on rate-distortion theory. These methods do achieve better performance than the JPEG default quantization table [8, 9]. However, the quantization tables based on rate distortion theory methods are image-dependent and the complexity of the encoder is rather high. In this work, quantization table is designed based on the human visual system (HVS) [10]. HVS model is easy to adapt to the specified resolution for viewing. The mind does not perceive everything the eye sees. This knowledge is used to design new quantization table for Hadamard Transform.
Abstract This paper discusses lossy image compression using Hadamard Transform (HT). Quantization table plays significant role in image compression (lossy) that improves the compression ratio without sacrificing visual quality. In this work, Human Visual system (HVS) is considered to derive the quantization table, which is applicable for Hadamard Transform. By incorporating the human visual system with the uniform quantizer, a perceptual quantization table is derived. This quantization table is easy to adapt to the specified resolution for viewing. Results show that this quantization table is good in terms of improving peak signal to noise ratio (PSNR), Normalized Cross correlation (NCC) and to reduce blocking artifacts. This work is extended to test the robustness of watermarking against various attacks. Keywords: Image compression, Hadamard Transform, human visual system, quantization table, watermarking.
1. Introduction There are many applications requiring image compression, such as multimedia, internet, satellite imaging, remote sensing, and preservation of art work, etc. Decades of research in this area has produced a number of image compression algorithms. Most of the effort expended over the past decades on image compression has been directed towards the application and analysis of orthogonal transforms. The orthogonal transform exhibits a number of properties that make it useful. First, it generally confirms to a parseval constraint, in that the energy present in the image is same as that displayed in the image’s transform. Second, the transform coefficients bear no resemblance to the image and many of the coefficients are small and, if discarded, will provide image compression with nominal degradation of the quality of reconstructed image. Third, certain sub scenes of interest, such as some targets or particular textures, have easily recognized signatures in
83
GVIP Special Issue on Image Compression, 2007
This paper is organized as follows. Human visual system is discussed in section 2. The 2D-Hadamard Transform is discussed in section 3. The proposed method is presented in section 4. Experimental results are given in section 5. Concluding remarks are given in section 6.
The human visual system has been investigated by several researchers [11, 12]. Moreover, the simplicity and visual sensitivity and selectivity to model and improve perceived image quality are the requirements for the design of HVS. The HVS is based on the psychophysical process that relates psychological phenomena (contrast and brightness etc.) to physical phenomena (light sensitivity, spatial frequency and wavelength etc.). The HVS is complicated, it is a nonlinear and spatial varying system. Putting its multiple characteristics into single equation, especially one that is linear, is not an easy task. Mannos and Sakrison’s work [13] may be first breakthrough to incorporate the HVS in image coding. HVS as a nonlinear point transformation followed by the modulation transfer function (MTF) is given by
d
v 1 'N
(3)
Converting these to radial frequencies, and scaling the result to cycles/visual degree for a viewing distance (dis) in millimeters gives
2. Human Visual System
H f a b cf exp c( f )
u 1 , f v 'N
f u
S
f u, v
2
1
180arcsin
f u f v
1 dis2 where, u , v 1,2...N
2
(4)
Finally, to account for variations in visual MTF as a function of viewing angle, ș, these frequencies are normalized by an angular-dependent function, s T u , v , such that
f u, v
(1)
f u, v s T u , v
(5)
where, s T u , v is given by Daly as
where, f is the radial frequency in cycles/degree of the visual angel subtended and a, b, c and d are constants. HVS model proposed by Daly [14] is applied for generating the quantization table. This HVS model is a modified version of Mannos and Sakrison’s work with a=2.2, b=0.192, c=0.114 and d=1.1 respectively. The MTF of HVS has been successfully applied to the optimal image half toning and image compression.
s T u, v with
1Z 1 Z cos4T u, v 2 2
Z being a symmetry parameter, and § f u ·
¸¸ T u , v arctan¨¨ © f v ¹ Equation (6) indicates that as Z
The MTF as reported by Daly [14] is
(6)
(7) decreases, s T u, v
decreases.
H u , v
§ · ° 2 . 2 ¨ 0 . 192 0 . 114 f u , v ¸ ¸¸ ° ¨¨ ¹ ° © 1 .1 ° · § § ® . exp ¨ ¨ 0 . 114 f u , v ·¸ ¸ ¨ © ° ¹ ¸¹ © ° if f u , v ! f ; ° ° otherwise 1 .0 ¯
3. 2D-Hadamard Transform The 2D-HT has been used in image processing and image compression [15]. Let x represents the original image and [T] the transformed image. The 2D-Hadamard transform is given by [16]
>@
(2)
>T @
H n >x @H n N
(8)
where H n represents an NxN Hadamard matrix, N=2n,
f u , v is radial spatial frequency in cycles/degree and f is the frequency of 8 cycles/degree at which the
n=1,2,3,…, with element values either +1 or -1. The Hadamard transform H n is real, symmetric, and
exponential peak. To implement this, it is necessary to convert discrete horizontal and vertical frequencies, ^ f u , f v ` into radial visual frequencies.
orthogonal [17] that is
For a symmetric printing grid, the horizontal and vertical discrete frequencies are periodic and given in terms of the dot pitch ' and the number of frequencies N by
The inverse 2D-Hadamard transform is given as
Hn
>x @ 84
H n*
H nt
H n >T @H n N
H n1
(9)
(10)
GVIP Special Issue on Image Compression, 2007
The Hadamard matrix of the order n is generated in terms of Hadamard matrix of order n-1 using Kronecker product ‘
’ [18] given by
1.0000 0.6571 1.0000 0.9599 1.0000 0.7684 1.0000 0.8746 0.6571 0.1391 0.4495 0.3393 0.6306 0.1828 0.5558 0.2480
1 §1 1 · ¨¨ ¸¸ , H n 2 ©1 1¹
H1
Table 1: The human visual frequency weighting matrix for Hadamard transform
H n 1
H1
(11)
1.0000 0.4495 0.7617 0.6669 1.0000 0.5196 0.8898 0.5912 0.9599 0.3393 0.6669 0.5419 0.9283 0.3930 0.8192 0.4564
HT matrix has its AC components in a random order. The processing is performed based on the 8x 8 sub-blocks of the whole image, the third order HT matrix H3 is used. By applying (11) H3 becomes:
1.0000 0.6306 1.0000 0.9283 1.0000 0.7371 1.0000 0.8404 0.7684 0.1828 0.5196 0.3930 0.7371 0.2278 0.6471 0.2948 1.0000 0.5558 0.8898 0.8192 1.0000 0.6471 0.9571 0.7371 0.8746 0.2480 0.5912 0.4564 0.8404 0.2948 0.7371 0.3598
H3
ª1 «1 « «1 « 1 «1 8 «1 « «1 «1 « ¬«1
1 1
1 1
1
1
1
1
1
1
1 1 1 1 1 1
1 1
1 1 1 1
1 1
1 1
1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1
1º 1»» 1» » 1» 1» » 1» 1» » 1¼»
The quantization table is (12)
Qu , v
q H u , v
(14)
where, q is the step size of the uniform quantizer. Therefore, the HVS-based quantization table can be derived by
QHVS u, v
Round >Qu , v @
(15)
4. Proposed method Given Hadamard matrix, the number of sign changes in each row of the Hadamard Transform matrix is given as 0,7,3,4,1,6,2 and 5 in the rows 1 to 8 respectively. The number of sign changes in each column of the Hadamard Transform matrix is given as 0,7,3,4,1,6,2 and 5 in the columns 1 to 8 respectively. The number of sign changes is referred to as sequency. The concept of sequency is analogous to frequency for the Fourier transform. Therefore, R = [0 7 3 4 1 6 2 5], C= [0 7 3 4 1 6 2 5] The horizontal and vertical discrete frequencies in the Hadamard domain are given in Equation (13)
f u f v
Ru for u =1, 2,….,N ' 2N C v for v =1, 2,….,N ' 2N
Table 2: Proposed quantization table, 16 24 16
(13)
QHVS u, v
17 16 21
16 18
24 115 36 47 25 88
29 65
16 36 21
24 16 31
18 27
17 47 24
30 17 41
20 35
16 25 16
17 16 22
16 19
21 88 31
41 22 70
25 54
18 65 27
35 19 54
22 44
Table 2 shows the proposed luminance quantization table derived with q=16. Varying levels of image compression and quality are obtainable through selection of specific quantization matrices by varying ‘q’. This enables the user to decide on quality levels ranging form 1 to 100, where 1 gives the worst quality and highest compression, while 100 gives the best quality and lowest compression. Above quantization matrix works well for quality level 50.
The dot pitch ( ' ) of the high resolution computer display is about 0.25mm. High resolution computer display is about 128mm height and 128mm width to display a 512x512 pixel image. The appropriate viewing distance is four times of height. Hence, distance is considered as 512mm. Constant Z is a symmetric parameter, derived from experiments and set to 0.7[19]. Thus, the human visual frequency weighting matrix H u , v of (2) is calculated for HT using equations (4) and (5) as given in Table 1. The human visual frequency weighting matrix H u , v indicates the perceptual importance of the transform coefficients. After multiplying the 64 hadamard coefficients with human visual frequency weighting matrix H u , v , the weighted hadamard coefficients contribute the same perceptual importance to human observers.
The compression method is given as: 1. The input image is first divided into non overlapping 8 x 8 blocks. 2. Each block is transformed into 64 Hadamard coefficients via 2D-HT. 3. These 64 HT coefficients are then uniformly quantized by HVS based quantization table and rounded. The image is reconstructed through decompression, a process that uses the inverse HT. Nonzero coefficients are used to reconstruct the original image. Flow charts for 85
GVIP Special Issue on Image Compression, 2007
the proposed method are shown in Figure 1and 2 respectively.
co efficients as follows AC p ACa T to embed bit ‘0’ ACa T to embed bit’1’,
AC p Image,
where, ‘p’ and ‘a’ are different locations.
x
5. Experimental results Experiments are performed on two gray images as given in Figure 3 [20] to verify the proposed compression technique.
Calculate the Hadamard Transform of image (block wise)
Quantize each block using proposed HVS based quantization table
(a) Figure 3: (a) Lena
Round of the quantized coefficients
(b) (b) Peppers
These two images are represented by 8 bits/pixel and each image size is 512 x 512. The entropy (E) [21] is calculated as
Image in HT domain in compressed form
E
¦ p(e) log 2 p (e)
(16)
es
Figure 1. Flowchart for image compression
p(e) is the
where, s is the set of coefficients and
probability of coefficient (e) . An often used global objective quality measure is the mean square error (MSE) defined as
Image in HT domain in compressed form
MSE Multiply each block using proposed quantization table to obtain coefficients
n 1 m 1 1 ' xij xij ¦¦ (n 1)(m 1) i 0 j 0
>
2
@
where, n is the number of total pixels and
(17)
xij and xij
'
are the pixel values in the original and reconstructed image. The peak to peak signal to noise ratio (PSNR in dB) is calculated as
Calculate inverse HT of coefficients (block wise)
PSNR 10 log10 Reconstructed image
2552 MSE
(18)
where, the usable gray level values range from 0 to 255. The other metric used to test the quality of the reconstructed image is Normalized Cross Correlation. Normalized Cross Correlation (NCC) is defined as given in equation (19).
Figure 2. Flowchart for image reconstruction The human visual frequency weighting matrix for Hadamard transform indicates that, middle and high frequency bands in HT are distributed in a random order. This property increases the reliability of watermark. The steps of the algorithm for watermarking are as follows.
NCC
§ ·§ ' ' · ¨ xij x ¸¨ xij x ¸ ¦¦ ¹© i j © ¹ 2 ª § ' ' · § x x · ºª «¦¦¨ ij ¸ »«¦¦¨ xij x ¸ ¹ ¼«¬ i j © ¹ ¬i j ©
1.Identify two AC coefficients in each transformed (HT) block of image. 2. Embed the watermark bit in one of the AC 86
(19) 2
º » »¼
GVIP Special Issue on Image Compression, 2007
achieve high performance without increasing any complexity. Performance comparison of proposed HVSbased Hadamard quantization table with other quantization tables is shown in Figures 4 and 5 respectively.
where, x indicates the mean of the original image '
and x indicates the mean of reconstructed image. Comparative performance is studied in terms of PSNR considering the following methods. 1.
2.
3.
Standard quantization matrix as used in JPEG algorithm is applied to quantize the hadamard coefficients denoted as Q1. HVS-based quantization matrix (as applied for DCT) is used to quantize the Hadamard coefficients denoted as Q2. Proposed HVS- based quantization matrix to quantize the Hadamard coefficients.
The experiments are done on images of Lena and Peppers and are presented in Table 3 and 4. Table 3: PSNR results for Lena image using different quantization tables Entropy (Bits/pixel)
PSNR Q1
Q2
0.3
27.1810
27.9678
Proposed method 28.0495
0.5
29.2018
30.4706
30.9096
0.7
30.7308
32.3234
32.8423
0.9
32.1647
33.7089
34.3915
1.0
32.7591
34.4589
35.2633
Figure 4: Performance comparison of Lena image
Table 4: PSNR results for Peppers image using different quantization tables Entropy (Bits/pixel)
PSNR Q1
Q2
0.3
26.5786
27.5926
Proposed method 27.7241
0.5
28.9478
30.2501
30.5620
0.7
30.5664
32.0616
32.3596
0.9
31.8695
33.3461
33.8205
1.0 32.6403 34.2340 34.3731 The bit rate and the decoded quality are determined simultaneously by the quantization table, and therefore, the proposed quantization table has a strong influence on the whole compression performance. Experimental results indicate that the proposed method can achieve better performance in terms of PSNR at the same level of compression. Proposed HVS based quantization table
Figure 5: Performance comparison of Peppers image The process of quantization results in both blurring and blocking artefacts. The effect of blocking artifacts occurs due to the discontinuity at block boundaries. The blockiness is estimated as the average differences across block boundaries. This feature is used to constitute a quality assessment model to calculate the score [22]. The score typically has a value between 0 and 10 (10 and 0 87
GVIP Special Issue on Image Compression, 2007
represent more and less quality respectively). Performance comparison in terms of Normalized Cross Correlation (NCC) and score is given in Table 5 for Lena and Peppers at 0.9 bits per pixel.
PSNR (less than 1 dB) than the proposed method. This is because DCT is high gain transform. To test the reliabilility of watermarking, binary logo of size 64x64 and Lena image of size 512 x 512 are considered. Watermark bits are embedded in the location AC(2,2) by modifying the contents of AC(2,6). Threshold value T=8 is considered for experimentation. Logo and watermarked images are shown in Figure 7.
Table 5: NCC and Score results for Lena and Peppers using different quantization tables Lena
Q1
Q2
Proposed
NCC
0.9913
0.9940
0.9949
Score
3.4139
3.8274
4.1027
Peppers NCC
Q1 0.9936
Q2 0.9954
Proposed 0.9959
Score
3.5986
4.1087
4.4115
(a)
The results of retrieved watermarks with DCT and HT techniques are given in Table 7.
Table 6: PSNR results for different images using different quantization tables Entropy
Q1
Q2
Proposed
Mandrill Barbara Zelda Airplane Washsat
2.17 1.47 0.89 1.11 0.92
29.0 31.8 35.3 34.2 34.0
30.4 33.4 36.5 36.0 35.0
31.2 34.2 37.1 36.7 35.6
Table 7: Retrieved watermarks and NCC results
The reconstructed images Lena and Peppers using the proposed HVS-based quantization table are shown in Figure 6.
(a)
(c)
Figure 7: (a) Logo(64x64) (b)Watermarked Lena image using DCT (c)Watermarked Lena image using HT
The results of the proposed method with other images are given in Table 6.
Image
(b)
Attack 64x64 logo embedded Salt and Pepper noise (Density 0.01)
DCT PSNR=34.2dB
HT PSNR=47.1dB
NCC=0.8320
NCC=0.8687
Bit plane removal (4th bit plane =0)
NCC=0.4888
NCC=0.5190
Cropping (25%)
NCC=0.9909
NCC=0.9932
Histogram equalization
NCC=0.6896
NCC=0.9277
The watermarked images are attacked by image cropping, histogram equalization, bit plane removal and noise attacks. Extracted watermarks are given in Table 7. Experiments demonstrate that the HT based watermarking scheme is robust to various attacks than DCT. PSNR of watermarked image is high in HT based watermarking scheme.
(b)
Figure 6: (a) Reconstructed Lena image(0.5bpp and PSNR is 30.90) (b) Reconstructed Peppers image(0.5bpp and PSNR is 30.56) The proposed method performs well in all tests. The error introduced by quantization of a particular Hadamard coefficient will not be visible if its quantization error is less than the just noticeable difference. By using JPEG method, PSNR is 34.9759 dB and 36.1517 dB for reconstructed Peppers and Lena images respectively (1.0 bpp). Experimental results indicate that JPEG gives better
6. Conclusions In this paper, a simple approach to the generation of optimal quantization table based on HVS model is presented. This quantization table is considered to quantize the HT coefficients and to obtain the superior image compression over standard quantization tables available in the literature. Superiority of this method is 88
GVIP Special Issue on Image Compression, 2007
observed in terms of PSNR and NCC. It is observed that the Hadamard transform has more useful middle and high frequency bands for image watermarking than several high gain transforms, such as DCT, DFT (Discrete Fourier Transform) and DST (Discrete Sine Transform). HT matrix has its AC components in a random order and is exploited for digital image watermarking. The scheme of digital image watermarking presented here with HT is robust compared to DCT based watermarking scheme.
international journal of computer science and network security, Volume No.6,168-174, Sept 2006. [13] J.L.Mannos and D.J.Sakrison. The effect of a visual fidelity criterion in the encoding of images, IEEE Trans.Information Theory, Volume No.20,525-536, July 1974. [14] S.Daly. Subroutine for the generation of a two dimensional human visual contrast sensitivity function, Tech.Rep Eastman Kodak, 1987. [15] A.O.Osinubi and R A King. One-Dimensional hadamard naturalness-preserving transform reconstruction of signals degraded by nonstationary noise processes, IEEE Transactions on Signal Processing, Volume 40, No.3,645-659, March 1992. [16] J.Johnson and M.Puschel. In search of the optimal walsh-hadamard transform, ICASSP, 2000. [17] Khalid Sayood. “Introduction to Data Compression”, second edition, Academic press, 2000. [18] T.S Anthony, Jun Shen, Andrew K.K.Chow, Jerry Woon Robust digital image –in-image watermarking algorithm using the fast hadamard transform. Volume ASSP-33, No.4, pp.1006 - 1012, August 1985. [19] J.P.Allebach and D.L.Nehhoff . Model based digital half toning, IEEE Signal processing magazine, 14-27, July 2003. [20] University of Waterloo, “Waterloo Repertoire, http://links.uwaterloo.ca/greyset2.base.html [21] David Soloman. “Data Compression”, second edition, Springer, 2000. [22] Z.Wang HR Sheikh and AC Bovik . No-Reference perceptual quality assessment of JPEG compressed images, IEEE international conference, 477-480, Sept 2002.
Acknowledgements First author thank Prof B. Chandra Mohan and Sri CH. Srinivasa rao research scholars at JNTU College of Engineering, Kakinada for the valuable discussions related to this work. The authors would like to thank the reviewers for the review of the paper and valuable suggestions.
References [1] Z.Fan and R.D.Queiroz. Maximum likelihood estimation of JPEG quantization table in the identification of Bitmap Compression. IEEE, 948951,2000. [2] R.Sudhakar and R.Karthiga. Image compression using coding of wavelet coefficients-a survey. GVIP journal, Volume 5, Issue 6, 2005. [3] G. K. Kharate , A. A. Ghatol and P.P. Rege. Image compression using wavelet packet tree. GVIP journal, Volume 5, Issue 6, pp.37-40,2005. [4] S.Esakkirajan, T.Veerakumar , V. Senthil Murugan and R.Sudhakar. Image compression using contoulet transform and multistage vector quantization. GVIP journal, Volume 6, Issue 1, pp.19-28, 2006. [5] W. Fourati and M. S. Bouhlel. A novel approach to improve the performance of JPEG 2000. GVIP journal, Volume 5, Issue 5,pp.1-9,2005. [6] V.Ratnakar and M.Livny. RD-OPT: An efficient algorithm for optimizing DCT quantization tables. Data Compression Conference, IEEE, 332 - 341. March, 1995. [7] O.Tasdizen and I.Hamzaoglu. A high performance and low cost hardware architecture for H.264 transform and quantization algorithms. Sabanaci University, www.sabanciuniv.edu/~hamzaoglu. [8] D.M.Monro and B.G.Sherlock. Optimum DCT quantization. Data Compression Conference, IEEE, Dec, 1993. [9] D.M.Bethel,D.M.Monro and B.G.Sherlock. Optimal quantization of the discrete cosine transform for image compression. , Conference Publication No443, IEE ,69-72, 1997. [10] L.W.Chang, CY Wang and SM Lee Designing JPEG quantization tables based on Human Visual System. Signal Processing: Image Communication, Volume 16, Issue 5,501-506, Jan 2001. [11] J.Sullivan,L.Ray and R.Miller. Design of minimum visual modulation halftone patterns. IEEE Transactions on Systems, Man and Cybernetics,Volume 21,33-38,1991. [12] P.Poda and A.Tamtaoui. On the enhancement of unequal error protection performances in images transmission over time-varying channels, IJCSNS 89
GVIP Special Issue on Image Compression, 2007
Biographies
B.N.Chatterji is a former Professor in E&ECE Department IIT, Kharagpur. He received B.Tech. and Ph.D. (Hons.) from E&ECE Department IIT, Kharagpur in 1965 and 1970, respectively. He has served the Institute under various administrative capabilities as Head of Department, Dean (Academic), etc. He has chaired many international and national symposium and conferences organized in India and abroad, apart from organizing 15 short term courses for Industries and Engineering college teachers. He has guided 35 Ph.D. scholars. Presently, he is active in research by guiding three research scholars. He has published more than 150 papers in reputed international and national journals apart from authoring three scientific books. His research interests are low-level vision, computer vision, image analysis, pattern recognition and motion analysis.
K.Veeraswamy is currently working as Associate Professor in ECE Department, Bapatla Engineering College, Bapatla, India. He is working towards his Ph.D. at JNTU College of Engineering, Kakinada, India. He received his M.Tech. from the same institute. He has nine years experience of teaching undergraduate students and post graduate students. His research interests are in the areas of image compression, image watermarking and networking protocols.
S.Srinivas Kumar is currently Professor and HOD in ECE Department, JNTU College of Engineering, Kakinada, India. He received his M.Tech. from Jawaharlal Nehru Technological University, Hyderabad, India. He received his Ph.D. from E&ECE Department IIT, Kharagpur. He has nineteen years experience of teaching undergraduate and postgraduate students and guided number of post-graduate theses. He has published 15 research papers in National and International journals. Presently he is guiding five Ph.D students in the area of Image processing. His research interests are in the areas of digital image processing, computer vision, and application of artificial neural networks and fuzzy logic to engineering problems.
90