DISTRIBUTED VIDEO CODING A PROJECT REPORT By
Pratyush Pandab1 Under The Guidance Of
Prof. B. Majhi Professor and Head of Dept. Computer Science and Engineering, NIT Rourkela.
Department of Computer Science Engineering NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA ROURKELA-769008 (ORISSA)
1
Pratyush Pandab is a 4 th year B. Tech student from Dept. of Computer Science and Engineering at CET, BBSR.. 1
National Institute of Technology Rourkela Certificate This is to certify that the work in this Project entitled “Distributed Video Coding” by Pratyush Pandab has been carried out under my supervision in the partial fulfilment of the requirements for the degree of “Bachelor of Technology” in Computer Science during session 2006-2010 in the Department of Computer Science and Engineering, College of Engineering and Technology, Bhubaneswar, and this work has not been submitted elsewhere for a degree.
Place: Rourkela Date: July 15, 2009
Prof. Banshidhar Majhi Professor and Head of Dept. of CSE. NIT, Rourkela.
2
Declaration by the Candidate I hereby declare that the project report entitled “Distributive Video Coding” is a record of bonafide project work carried out by me under the guidance of Prof. B. Majhi of NIT- Rourkela, Orissa. I further declare that the work reported in this project has not been submitted and will not be submitted, either in part or full, to any other university for the award of any degree or diploma.
Pratyush Pandab Dept. of CSE, CET Bhubaneswar.
3
Table of Contents Abstract ........................................................................................ 5 1. Introduction to DVC .......................................................... 5 2. Foundation of Distributive Coding .................................... 6 2.1. Slepian-Wolf Theorem for Lossless Distributive Coding 6-7 2.2. Wyner-Ziv Theorem for Lossy Distributive Coding 7-8 3. Distributed Video Coding Schemes ..................................... 9 3.1. Stanford’s low complexity video coding algorithm 9-10 3.2. Berkeley’s robust video coding solution 10-11 4. Towards Practical Wyner Ziv Coding of Video ................. 4.1. Wyner Ziv Video Codec 4.2. Flexible Decoder Side Information 4.3. Implementation
12 12-14 14-15 15-16
5. Results ................................................................................ 16 6. Conclusions ......................................................................... 17 7. References ........................................................................... 17
4
Abstract
Distributed Video
coding (DVC) is a new coding paradigm for video compression mainly based on the information theoretical results of Slepian and Wolf (SW) and Wyner and Ziv (WZ) theorems. In some major applications such as wireless low power surveillance and multimedia sensor networks, wireless PC cameras and mobile camera phones, it is really a challenge for traditional video coding architecture to achieve the standard requirements of power consumption and speed. For both encoder and decoder, it is necessary to have low power consumption for some cases. So low power and low complexity encoders are essential to satisfy these requirements. This project report gives a practical implementation of distributed video coding.
Slepian
and Wolf’s coding is the lossless source coding and Wyner-Ziv coding is a lossy compression with receiver side information. It has been observed that in most of the traditional video coding algorithm such as MPEG2, H.263+ or H.264 the encoder has complex computational operation rather than the decoder. On the other hand distributed video coding is a new kind of video coding which allows a low complexity video encoding where the major part of computational burden is shifted to decoder. DVC is mainly applicable to two areas viz. low complexity video coding and robust video coding. The different techniques are analysed based on certain parameters such as compression rate, decoding and motion compensation.
1. Introduction to DVC In
video coding, as standardized by MPEG or the ITU-T H.26x recommendations, the encoder exploits the statistics of the source signal. This principle seems so fundamental that it is rarely questioned. However, efficient compression can also be achieved by exploiting source statistics—partially or wholly—at the decoder only. This surprising insight is the consequence of information-theoretic bounds established in the 1970s by Slepian and Wolf for distributed lossless coding, and by Wyner and Ziv for lossy coding with decoder side information. Schemes that build upon these theorems are generally referred to as distributed coding algorithms.
In short, Distributed Video Coding (DVC) is a technique that allows the encoder side of the communication channel to be less complex and thus use less power. Distributed coding is a radical departure from conventional, non-distributed coding. Distributed coding exploits the source statistics in the decoder and, hence, the encoder can be very simple, at the expense of a more complex decoder. The traditional balance of complex encoder and simple decoder is essentially reversed. Such algorithms hold great promise for new generations of mobile video cameras. 2
MPEG s tands for Moving Pi cture Expert Group.
5
The foundations of DVC go back to the 70’𝑠 as Slepian and Wolf established the achievable rates for lossless coding of two correlated sources in different configurations. Then, Wyner and Ziv extended the Slepian and Wolf theorem to the lossy case. But it was until lately that the first practical implementations of DVC were introduced.
The DVC theoretical foundations as well as different practical implementations of DVC were exploited lately. Unlike conventional encoders (e.g. AVC/H.264 [13]), where the source statistics are exploited at the encoder side, DVC can shift this task towards the decoder side. This would result in encoders, which are low in terms of complexity. On the other hand, DVC decoders would be highly complex in this case. Therefore, DVC is suitable for some emerging applications, where computational power is sparse at the encoder side such as wireless low power video surveillance, multimedia sensor networks, wireless PC cameras and mobile camera phone. Furthermore, DVC is based on a statistical framework, not a deterministic one that makes it have good error resilience properties. DVC can be used to design codec independent of scalable codecs . In other words, the enhancement layer is independent from the base layer codec.
𝑿
𝑺𝒐𝒖𝒓𝒄𝒆
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑹𝑿 𝑿′ Joint Decoder
𝒀
𝑺𝒐𝒖𝒓𝒄𝒆
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑹𝒀
𝒀′
Figure 1: Distributed compression of two statistically dependent random processes 𝑿 and 𝒀. The decoder jointly decodes 𝑿 and 𝒀 and thus exploits their mutual dependence.
2. Foundation of Distributed Coding 𝟐. 𝟏. 𝑺𝒍𝒆𝒑𝒊𝒂𝒏 − 𝑾𝒐𝒍𝒇 𝑻𝒉𝒆𝒐𝒓𝒆𝒎 𝒇𝒐𝒓 𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅 𝑪𝒐𝒅𝒊𝒏𝒈
Distributed
compression refers to the coding of two (or more) dependent random sequences, but with the special twist that a separate encoder is used for each ( 𝐹𝑖𝑔. 1). Each encoder sends a separate bit stream to a single decoder which may operate jointly on all incoming bit streams and thus exploit the statistical dependencies.
6
Figure 2: Slepian-Wolf theorem, 1973: achievable rate region for distributed compression of two statistically dependent i.i.d sources 𝑿 and 𝒀.
Let
𝑿 and 𝒀 are two statistically dependent independent identically distributed (i.i.d) sequences. These sequences are encoded independently with bit rates 𝑹𝑿 and 𝑹𝒀 but jointly decoded. The expressions for rate combinations according to Slepian-Wolf theorem are as follows.
𝑹𝑿 ≥ 𝑯 𝑿 𝒀
(𝟏)
𝑹𝒀 ≥ 𝑯(𝒀|𝑿)
(𝟐)
𝑹𝑿 + 𝑹𝒀 ≥ 𝑯(𝑿, 𝒀)
(𝟑)
where 𝑯(𝑿|𝒀 ) and 𝑯(𝒀 |𝑿) are conditional entropy . The total sum rates 𝑹𝑿 + 𝑹𝒀 can achieve the joint entropy which is maximum. Another interesting feature of Slepian-Wolf coding is that it is well suited to channel coding. The achievable rate region is shown in 𝐹𝑖𝑔. 2
𝟐. 𝟐. 𝑾𝒚𝒏𝒆𝒓 − 𝒁𝒊𝒗 𝑻𝒉𝒆𝒐𝒓𝒆𝒎 𝒇𝒐𝒓 𝑳𝒐𝒔𝒔𝒚 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒗𝒆 𝑪𝒐𝒅𝒊𝒏𝒈
Wyner and Ziv extended the idea of Slepian and Wolf’s theorem in 1976. They extended their work to establish information theoretic bounds for lossy compression. The theorem states that, if 𝑿 and 𝒀 are two statistically dependent sequence and X is encoded independently without the access of side information 𝒀 (𝐹𝑖𝑔. 3), the sequence 𝑿 can be reconstructed with a distortion below 𝑫.
A distortion is written as 𝑫 = 𝑬
𝒅 𝑿, 𝑿 is acceptable as shown in 𝐹𝑖𝑔. 4. When the 𝒁 encoder does not have any idea about 𝒀, a rate loss 𝑹𝑾 𝑿|𝒀 𝑫 − 𝑹𝑿|𝒀 𝑫 ≥ 𝟎 is achieved. However, in case of mean squared error distortion and Gaussian memory less source a rate 7
𝒁 loss of 𝑹𝑾 𝑿|𝒀 𝑫 − 𝑹𝑿|𝒀 𝑫 = 𝟎 is achieved. This basic architecture of Wyner-Ziv coding is shown in 𝐹𝑖𝑔. 5.
𝑹𝑿 ≥ 𝑯 𝑿 𝒀
𝑺𝒐𝒖𝒓𝒄𝒆
𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔
𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔
𝑿|𝒀
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝒀
𝒀
𝒀
𝑿
𝑿
Figure 3: Compression of a sequence of random symbols 𝑿 using statistical related side information 𝒀
𝒁 𝑹𝑾 𝑿|𝒀 (𝑫) ≥ 𝑹𝑿|𝒀 (𝑫)
𝑺𝒐𝒖𝒓𝒄𝒆
𝑿
𝑿|𝒀
𝑳𝒐𝒔𝒔𝒚
𝑳𝒐𝒔𝒔𝒚
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝑿 𝑫𝒊𝒔𝒕𝒐𝒓𝒕𝒊𝒐𝒏
𝒀
𝒀
𝒀
𝑫 = 𝑬[𝒅 𝑿, 𝑿 ]
Figure 4: Lossy Compression of a sequence 𝑿 using statistically related side information 𝒀
𝑿
𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎
𝑺𝒍𝒆𝒑𝒊𝒂𝒏 𝑾𝒐𝒍𝒇
𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒆𝒓
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑺𝒍𝒆𝒑𝒊𝒂𝒏 𝑾𝒐𝒍𝒇
𝑰𝒏𝒗𝒆𝒓𝒔𝒆 𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎
𝑹𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏
𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝑿 𝒀 Figure 5: Basic Architecture of Wyner Ziv coding
8
3. Distributed Video Coding Schemes Recently most of the practical DVC solutions have been proposed by two groups, viz. Bernd Girod’s group at Stanford University and Ramchandran’s group at university of Berkley, California. This section briefly describes the advances of DVC and their performances.
𝟑. 𝟏. 𝑺𝒕𝒂𝒏𝒇𝒐𝒓𝒅′ 𝒔 𝒍𝒐𝒘 𝒄𝒐𝒎𝒑𝒍𝒆𝒙𝒊𝒕𝒚 𝒗𝒊𝒅𝒆𝒐 𝒄𝒐𝒅𝒊𝒏𝒈 𝒂𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎 𝑷𝒊𝒙𝒆𝒍 𝑫𝒐𝒎𝒂𝒊𝒏 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
Among the various solutions proposed by Girod’s group the pixel domain coding solution is the most simplest (𝐹𝑖𝑔. 6). In their proposed scheme the video frames are divided into 𝐾𝑒𝑦 𝑓𝑟𝑎𝑚𝑒𝑠 and 𝑊𝑦𝑛𝑒𝑟 − 𝑍𝑖𝑣 𝑓𝑟𝑎𝑚𝑒𝑠. The Wyner Ziv frames are placed in between key frames which are encoded independently but decoded jointly. The scheme is simplest because neither DCT nor motion estimation and inverse discrete cosine transform are required. Every pixel in a Wyner-Ziv frame is uniformly quantized with 𝟐𝑴 intervals. The quantized indices is fed to Slepian-Wolf encoder with Rate Compatible Punctured Turbo (RCPT) code. At the decoder side the side information 𝑺 can be generated by interpolation or extrapolation from previously decoded key frames or from previously reconstructed Wyner-Ziv frame. At the decoder side, the decoder combines the side information 𝑺 and received parity bits to recover𝒒. After reconstruction of 𝒒, the decoder reconstructs the 𝑺 which can be written as 𝑺 = 𝑬[𝑺|𝒒, 𝑺 ].
𝑺𝒍𝒆𝒑𝒊𝒂𝒏− 𝑾𝒐𝒍𝒇 𝑪𝒐𝒅𝒆𝒓 𝑻𝒖𝒓𝒃𝒐
𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒆𝒓
𝒃𝒖𝒇𝒇𝒆𝒓
𝑬𝒏𝒄𝒐𝒅𝒆𝒓
𝑺
𝑲
𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒇𝒓𝒂𝒎𝒆𝒔
𝑻𝒖𝒓𝒃𝒐
𝑹𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏
𝑫𝒆𝒄𝒐𝒅𝒆𝒓 𝑺𝒊𝒅𝒆 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏
𝑺
𝑾𝒁 𝒇𝒓𝒂𝒎𝒆𝒔
𝑹𝒆𝒒𝒖𝒆𝒔𝒕 𝑩𝒊𝒕𝒔
𝑰𝒏𝒕𝒆𝒓𝒑𝒐𝒍𝒂𝒕𝒊𝒐𝒏𝒐𝒓 𝑬𝒙𝒕𝒓𝒂𝒑𝒐𝒍𝒂𝒕𝒊𝒐𝒏
𝑲𝒆𝒚 𝒇𝒓𝒂𝒎𝒆𝒔
𝑪𝒐𝒏𝒗𝒆𝒏𝒕𝒊𝒐𝒏𝒂𝒍
𝑪𝒐𝒏𝒗𝒆𝒏𝒕𝒊𝒐𝒏𝒂𝒍
𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆
𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆
𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆 𝒆𝒏𝒄𝒐𝒅𝒆𝒓
𝑫𝒆𝒄𝒐𝒅𝒆𝒅
𝑺
𝑲 𝑫𝒆𝒄𝒐𝒅𝒆𝒅 𝑲𝒆𝒚 𝑭𝒓𝒂𝒎𝒆𝒔
𝑰𝒏𝒕𝒆𝒓 𝒇𝒓𝒂𝒎𝒆 𝒅𝒆𝒄𝒐𝒅𝒆𝒓
Figure 6: pixel domain encoding for low complexity encoder and decoder
9
𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 𝑫𝒐𝒎𝒊𝒂𝒏 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
Like
the pixel domain coding, in this scheme neither motion estimation nor motion compensation are needed at the encoder. But here a blockwise DCT is performed in a Wyner-Ziv frame. The DCT coefficients are independently quantized and compressed by Slepian-Wolf turbo code. The side information can be generated from the previously reconstructed frame with or without motion compensation. This scheme has a higher encoder complexity than the pixel domain system.
𝑱𝒐𝒊𝒏𝒕 𝒅𝒆𝒄𝒐𝒅𝒊𝒏𝒈 𝒂𝒏𝒅 𝒎𝒐𝒕𝒊𝒐𝒏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:
In order to achieve high compression efficiency, motion estimation has to be performed at the decoder side. In this scheme, the frames in a video are organised into group of picture (GOP)at the encoder end. Each GOP consists of key frames and Wyner-Ziv frames. For a given Wyner-Ziv frame a 4X4 discrete cosine transform is applied. The transformed coefficient band is then uniformly quantized, then bit plane extraction is performed. Each bit plane is then independently fed into turbo encoder. Some of the robust hash code word are send to the decoder for the motion estimation. The key frame is encoded by conventional coding such as MPEG or H.26𝑋. The encoded Wyner Ziv bits are stored in buffer and only a minimum number of bits are sent on request by the decoder.
The decoder receives three bit streams viz. Key frame bits, hash bits, and wyner ziv bits. The key frames are decoded using the conventional decoding and the hash bit is used for motion compensation. The Wyner-Ziv bit stream is decoded using turbo decoder. Then requantization followed by reconstruction of the Wyner-Ziv frame by motion compensation method. The reconstructed frame is then inversely transformed to obtain 𝑾which is the estimation of Wyner-Ziv frame.
𝟑. 𝟐. 𝑩𝒆𝒓𝒌𝒆𝒍𝒆𝒚′ 𝒔 𝒓𝒐𝒃𝒖𝒔𝒕 𝒗𝒊𝒅𝒆𝒐 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏 𝑷𝒐𝒘𝒆𝒓 𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝑹𝒐𝒃𝒖𝒔𝒕 𝒉𝒊𝒈𝒉 𝒄𝒐𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝒃𝒂𝒔𝒆𝒅 𝑴𝒖𝒍𝒕𝒊𝒎𝒆𝒅𝒊𝒂 𝒄𝒐𝒅𝒊𝒏𝒈(𝑷𝑹𝑰𝑺𝑴):
This
solution was proposed by Ramchandran’s group at the university of Berkeley, under the name of PRISM. It combines the features of intra-frame coding with inter-frame coding compression efficiency all in one. This architecture uses Wyner-Ziv coding but the generation of side information is different from other schemes. A new feature of PRISM is the use of multiple candidates for side information. At the encoder end, the video frame is first divided into 8X8 or 16X16 blocks. In the classification stage it is necessary to decide what kind of encoding is well suited for each block in the current frame. There are three classes of coding used at the encoder end such as no coding (skip class), traditional coding (intra coding class) and syndrome coding (syndrome coding class). 𝐹𝑖𝑔. 7 Shows a PRISM encoder in which the syndrome coding is the most essential part. For a input video frame 10
block wise DCT is performed followed by zigzag scan. High frequency components are quantized coded with entropy encoder. The coarse quantization followed by syndrome coding is applied to the DC component. The syndrome encoded bits are further quantized and sent to the decoder. A cyclic redundancy check (CRC) of the base quantized transform coefficients is also computed and transmitted to help the decoder for motion estimation.
At the decoder end (see 𝐹𝑖𝑔. 8) the frame blocks in skip class can be reconstructed by the co-located blocks in the previously reconstructed frame. The frame blocks in the intra coding class are reconstructed by the traditional decoder. Syndromes encoded blocks are decoded by performing motion estimation by using CRC bits. The previously decoded sequences and multiple candidates’ side information generation are used for calculating motion estimation. The CRC is used as a reliable and unique signature for each block to identify the best candidate predictor. Then the bit streams are dequantized and inversely transformed and scanned to reconstruct the video sequence. 𝑻𝒐𝒑 𝑭𝒓𝒂𝒄𝒕𝒊𝒐𝒏
𝑩𝒂𝒔𝒆 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏
𝑹𝒆𝒇𝒊𝒏𝒆𝒎𝒆𝒏𝒕
𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝒄𝒐𝒅𝒊𝒏𝒈
𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏 𝑩𝒍𝒐𝒄𝒌𝒘𝒊𝒔𝒆 𝑫𝑪𝑻 & 𝒛𝒊𝒈𝒛𝒂𝒈 𝒔𝒄𝒂𝒏 𝑽𝒊𝒅𝒆𝒐 𝒇𝒓𝒂𝒎𝒆
𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝒄𝒐𝒅𝒊𝒏𝒈
𝑩𝒐𝒕𝒕𝒐𝒎 𝑭𝒓𝒂𝒄𝒕𝒊𝒐𝒏
𝑩𝒊𝒕 𝒔𝒕𝒓𝒆𝒂𝒎
Figure 7: Encoder Architecture of PRISM
𝑴𝒐𝒕𝒊𝒐𝒏 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏
𝑺𝒊𝒅𝒆 𝒊𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈
𝑷𝒓𝒆𝒗𝒊𝒐𝒖𝒔 𝑫𝒆𝒄𝒐𝒅𝒆𝒅 𝑭𝒓𝒂𝒎𝒆 𝑩𝒂𝒔𝒆 𝒂𝒏𝒅 𝑹𝒆𝒇𝒊𝒏𝒆𝒎𝒆𝒏𝒕 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏
𝑩𝒊𝒕 𝒔𝒕𝒓𝒆𝒂𝒎 𝒊𝒏𝒑𝒖𝒕
𝑩𝒍𝒐𝒄𝒌𝒘𝒊𝒔𝒆 𝑫𝑪𝑻 & 𝒛𝒊𝒈𝒛𝒂𝒈 𝒔𝒄𝒂𝒏
𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈
𝑫𝒆𝒒𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏 𝑫𝒆𝒄𝒐𝒅𝒆𝒅
Figure 8: Decoder Architecture of PRISM
𝑽𝒊𝒅𝒆𝒐 𝒇𝒓𝒂𝒎𝒆 𝒃𝒍𝒐𝒄𝒌
11
𝟒. 𝐓𝐨𝐰𝐚𝐫𝐝𝐬 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐖𝐲𝐧𝐞𝐫 − 𝐙𝐢𝐯 𝐂𝐨𝐝𝐢𝐧𝐠 𝐨𝐟 𝐕𝐢𝐝𝐞𝐨 In current interframe video compression systems, the encoder performs
predictive coding to exploit the similarities of successive frames. The Wyner-Ziv Theorem on source coding with side information available only at the decoder suggests that an asymmetric video codec, where individual frames are encoded separately, but decoded conditionally (given temporally adjacent frames) could achieve similar efficiency. This project shows the results on a Wyner-Ziv coding scheme for motion video that uses intraframe encoding, but interframe decoding. In this system, key frames are compressed by a conventional intraframe codec and in-between frames are encoded using a Wyner-Ziv intraframe coder. The decoder uses previously reconstructed frames to generate side information for interframe decoding of the Wyner-Ziv frames.
Current video compression standards perform interframe predictive coding to exploit the similarities among successive frames. Since predictive coding makes use of motion estimation, the video encoder is typically 5 to 10 times more complex than the decoder. This asymmetry in complexity is desirable for broadcasting or for streaming video-on-demand systems where video is compressed once and decoded many times. However, some future systems may require the dual scenario. For example, we may be interested in compression for mobile wireless cameras uploading video to a fixed base station. Compression must be implemented at the camera where memory and computation are scarce. For this type of system what we desire is a low-complexity encoder, possibly at the expense of a high complexity decoder, that nevertheless compresses efficiently.
This project applies Wyner-Ziv coding to a real-world video signal. We take 𝑿 as the even frames and 𝒀 as the odd frames of the video sequence. 𝑿 is compressed by an intraframe encoder that does not know 𝒀 . The compressed stream is sent to a decoder which uses 𝒀 as side information to conditionally decode 𝑿. This present work, extends the Wyner-Ziv video codec, to a more general and practical framework. The key frames of the video sequence are compressed using a conventional intraframe codec. The remaining frames, the WynerZiv frames, are intraframe encoded using a Wyner-Ziv encoder. To decode a Wyner-Ziv frame, previously decoded frames (both key frames and Wyner-Ziv frames) are used to generate side information. Interframe decoding of the Wyner-Ziv frames is performed by exploiting the inherent similarities between the Wyner-Ziv frame and the side information.
𝟒. 𝟏. 𝐖𝐲𝐧𝐞𝐫 − 𝐙𝐢𝐯 𝐕𝐢𝐝𝐞𝐨 𝐂𝐨𝐝𝐞𝐜 The codec is an intraframe encoder and interframe decoder system for video compression (𝐹𝑖𝑔 9). A subset of frames from the sequence is designated as key frames. The key frames, 𝑲, are encoded and decoded using a conventional intraframe codec. In between the key frames are Wyner-Ziv frames which are intraframe encoded but interframe decoded.
12
Figure 9: Wyner-Ziv video codec with intraframe encoding and interframe decoding.
A Wyner-Ziv frame,
𝑺, is encoded as follows: We quantize each pixel value of the frame using a uniform scalar quantizer with 𝟐𝑴 levels to form the quantized symbol stream 𝒒. We take these symbols and form a long symbol block which is then sent to the Slepian-Wolf encoder. The Slepian-Wolf coder is implemented using a rate compatible punctured turbo code (RCPT). The RCPT, combined with feedback provides rate flexibility which is essential in adapting to the changing statistics between the side information and the frame to be encoded. 𝒒 is fed into the two constituent convolutional encoders of a turbo encoder. Before passing the symbols to the second convolutional encoder, interleaving is performed on the symbol level. The parity bits produced by the turbo encoder are stored in a buffer. The buffer transmits a subset of these parity bits to the decoder upon request.
For each Wyner-Ziv frame, the decoder takes adjacent previously decoded key frames and, possibly, previously decoded Wyner-Ziv frames to form the side information,𝑺 , which is an estimate of 𝑺. To be able to exploit the side information, the decoder assumes a statistical dependency model between 𝑺 and 𝑺.
The turbo decoder uses
the side information 𝑺 and the received subset of parity bits to form the decoded symbol stream 𝒒. If the decoder cannot reliably decode the symbols, it requests additional parity bits from the encoder buffer through feedback. The request and decode process is repeated until an acceptable probability of symbol error is guaranteed. By using the side information, the decoder needs to request 𝑘 ≤ 𝑀 bits to decode which of the 𝟐𝑴 bins a pixel belongs to and so compression is achieved.
13
With this reconstruction function, if the side information is within the reconstructed bin, the reconstructed pixel will take a value very close to the side information. If the side information is outside the bin, the function clips the reconstruction towards the boundary of the bin closest to the side information. This kind of reconstruction function has the advantage of limiting the magnitude of the reconstruction distortion to a maximum value, determined by the quantizer coarseness. Perceptually, this property is desirable since it eliminates the large positive or negative errors which may be very annoying to the viewer.
In areas
where the side information is not close to the frame (i.e. high motion frames, occlusions), the reconstruction scheme can only rely on the quantized symbol for reconstruction and quantizes towards the bin boundary. Since the quantization is coarse, this could lead to contouring which is visually unpleasant. To remedy this we perform subtractive dithering by shifting the quantizer partitions for every pixel using a pseudorandom pattern. This leads to better subjective quality in the reconstruction.
𝟒. 𝟐. 𝐅𝐥𝐞𝐱𝐢𝐛𝐥𝐞 𝐃𝐞𝐜𝐨𝐝𝐞𝐫 𝐒𝐢𝐝𝐞 𝐈𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 Analogous to the idea of varying the ratio of 𝑰 frames, 𝑷 frames, and 𝑩 frames in conventional video coding, we can vary the number of Wyner-Ziv frames between key frames to achieve different rate-distortion points for the proposed system. Many high quality key frames in the sequence lead to better side information for the Wyner-Ziv frames. For example, if there is one key frame for every Wyner-Ziv frame, the decoder can perform sophisticated motion compensated (MC) interpolation on the two adjacent key frames to generate a very good estimate of the Wyner-Ziv frame. For this case, reconstruction errors from other decoded Wyner-Ziv frames need not corrupt the side information of the current Wyner-Ziv frame. Better side information translates to improved rate distortion performance for the Wyner-Ziv encoded frame. However, since the key frames are intraframe encoded and decoded, they require more rate than the Wyner-Ziv frames, so the over-all rate of the system increases. Finding a good trade-off between the number of key frames and the degradation of the side information is a significant aspect in optimizing the compression performance. Aside from the number of key frames, the quality of their reconstruction also affects the side information and is an important consideration in the design.
The proposed Wyner-Ziv video coder employs feedback from the decoder to the encoder to send the proper number of bits. This is advantageous since the required bit-rate depends on the side information which is unknown to the encoder. Because of this feedback, the decoder has great flexibility to choose what side information to use. In fact, given the same Wyner-Ziv video encoder, there can be decoders of different sophistication and with different statistical models. For example, a “smart” decoder might use sophisticated motion compensated interpolation and request fewer bits, while a “dumb” decoder might use no motion compensation at all (simply takes a reconstructed adjacent frame as the side information) and request more bits for successful decoding.
14
𝐹𝑖𝑔. 10 illustrates a hierarchical frame dependency arrangements. In the diagram, there is same number of Wyner-Ziv frames in between the key frames. The decoded previous frame (whether a key frame or a Wyner-Ziv frame) is extrapolated to generate the side information for the current Wyner-Ziv frame. This technique requires a minimum of memory at the encoder. Conventional Intraframe Coder
𝑲𝒆𝒚 𝒇𝒓𝒂𝒎𝒆
𝑺𝟏
ted key frame 𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛
𝑾𝒁 𝒇𝒓𝒂𝒎𝒆
𝑺𝟐
𝑺𝟐
Side information
𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒃𝒊𝒕𝒔
𝑺𝟑
𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒃𝒊𝒕𝒔
𝑺 𝟐′ Reconstruc
ted WZ frame
𝑤𝑦𝑛𝑒𝑟𝑧𝑖𝑣 𝑑𝑒𝑐𝑜𝑑𝑒𝑟
𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛
𝑾𝒁 𝒇𝒓𝒂𝒎𝒆
𝑺 𝟏′ Reconstruc
𝑺𝟑
Side information
𝑺 𝟑′ Reconstruc
ted WZ frame
𝑤𝑦𝑛𝑒𝑟𝑧𝑖𝑣 𝑑𝑒𝑐𝑜𝑑𝑒𝑟
𝟒. 𝟑. 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 The key frames
were encoded as 𝑰 frames, with a fixed quantization parameter, using a standard H263+ codec. The number of Wyner-Ziv frames was varied in between the key frames and changed the frame dependency structure for each case. For these simulations, 𝑴𝑪(motion compensation) interpolation, based on the assumption of symmetric motion vectors, to generate the side information, were used. Let 𝑴𝑪𝑰(𝑨, 𝑩, 𝒅) be the result of MC interpolation between frames 𝑨 and 𝑩 at 𝒅 fractional distance from 𝑨. Four frame dependency arrangements were simulated with the side information derived as follows:
𝟏 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑲𝟑 𝟏 𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟑 ′, 𝟐
𝟐 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑺𝟑 − 𝑲𝟒 𝟏 𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟒 ′, 𝟑 𝟐 𝑺𝟑 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟒 ′, 𝟑
15
𝟑 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑺𝟑 − 𝑺𝟒 − 𝑲𝟓 𝟏 𝑺𝟑 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟓 ′, 𝟐 𝟏 𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑺𝟑 ′, 𝟐 𝟏 𝑺𝟒 = 𝑴𝑪𝑰 𝑺𝟑 ′, 𝑲𝟓 ′, 𝟐
𝟓. 𝐑𝐞𝐬𝐮𝐥𝐭𝐬
Figure 10: The first row shows the 𝑰 frames for a particular frame number. The second row shows the WZ reconstructed image with 2 bit planes, 4 bit planes and 6 bit planes respectively.
16
𝟔. 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧𝐬 In this project a simple layout of DVC block was implemented to find the compression level for different bit rates. The simulated practical WynerZiv video compression system and different frame dependency arrangements involving motion compensation interpolati on were investigated. Both the pixel and the transform domain works were carried out and satisfactory results were found.
𝟕. 𝐑𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞𝐬 [𝐼] Aaron, A., Setton, E., & Girod, B. (n.d.). Towards Practical Wyner Ziv COding of Video. [𝐼𝐼] Puri, R., & Ramchandran, K. (2002). PRISM: A new robust video coding architecture based on distributive compression principles. Allerton Conference on Communication, Control, and Computing. [𝐼𝐼𝐼] Rup, S., Dash, R., Ray, N. K., & Majhi, B. (n.d.). Advances in Distributive Video Coding. 𝐼𝑉 Slepian, D., & Wolf, J. (1973). Noiseless coding of correlated information sources. IEEE Transactions , 471-480. [𝑉] Wyner, A., & Ziv, J. (1976). The rate distortion function for source coding with side information at the decoder. IEEE Transcations. , 63-72.
17