Aspect Processing The Shape of Things to Come Mike Knee & Roberta Piroddi Snell & Wilcox, UK
Putting Pictures to Work
TM
snellwilcox.com
snellwilcox.com
Abstract The proliferation of video source formats, deliver y mechanisms and display devices has significantly increased broadcasters’ needs for aspect ratio conversion. We have taken this as an opportunity to develop aspect processing, a framework for producing an arbitrar y shape intelligently and flexibly. Aspect processing extends aspect ratio conversion to the definition of optimum viewing for a given screen size and shape. This paper presents new aspect processing technologies, which introduce a set of efficient criteria to automatically identify regions of interest in the picture and to process the picture accordingly. This paper also introduces a unique and novel technology – video seam carving, which works by altering the perspective of the image sequence to change between aspect ratios. These new technologies allow broadcasters to access new platforms and audiences with maximum re-use of systems and source material, resulting in a simpler and cheaper infrastructure.
Introduction Once upon a time, television content was produced with a 4:3 aspect ratio and delivered at standard definition through a mild compression system, for example PAL, to a 4:3 CRT display that was probably between about 12 and 24 inches wide. All the broadcaster had to do was to ensure reliable, faithful transmission of the content to the display. Now, the proliferation of source formats, deliver y mechanisms and screen sizes and shapes leads to a complex set of issues for the broadcaster to address. Broadcast programmes may be produced at standard or high definition, at a 4:3 or 16:9 aspect ratio, and may include content from anything from mobile phones to ultra-highdefinition widescreen film scanners working with aspect ratios up to 2.35:1 or even wider. The content will generally be compressed for deliver y, at bit rates ranging from many tens of Mbit/s for a premium HDTV ser vice, to a few tens of kbit/s for mobile content. The results will be viewed on anything from a 1.5-inch mobile phone screen to a 50-inch plasma display, or even bigger displays if we consider video projectors or outdoor screens.
The aspect ratio of the display may be anything from nearly square (or even a “portrait” format) to the 16:9 shape now common in home TV displays. Projection systems have greater flexibility to match ver y widescreen cinema film formats. The multiplicity of deliver y and display platforms means that aspect ratio conversion has become an important part of the broadcast chain. In this paper, we first review conventional approaches to aspect ratio conversion and discuss their advantages and disadvantages. To address the shortcomings, the wider concept of aspect processing is introduced, in which we attempt to optimize the viewing experience for a given display shape and size. Two sections then introduce new content-aware technologies for aspect processing: dynamic reframing and video seam car ving. Dynamic reframing is a technique for content-dependent cropping and resizing, geared especially towards small displays. Seam car ving is a much more radical approach, in which regions considered to be less visually important are quite literally removed from the picture.
Aspect Ratio Conversion In this section, we shall base our examples on 16:9 to 4:3 conversion, but the comments apply equally to conversion in the other direction and between other ratios. Examples of two 16:9 frames, which we shall use throughout this paper, are shown in Figure 1.
Figure 1: Two 16:9 frames from the film “Mission Antarctique” The first and simplest approach to handling different aspect ratios is to allow the source image to change shape to fill the display. This is known as anamorphic conversion. It involves no processing at all, as each scan line traverses the width of the screen by design. Anamorphic conversion has the advantage that no information is lost, but ever ything looks the wrong shape, as shown in Figure 2.
Figure 2: Anamorphic conversion
Figure 5: Combination of anamorphic conversion, cropping and letterboxing
At another extreme of the set of possible techniques, we have cropping. We remove the sides (or the top and bottom) of the image to achieve the desired display aspect ratio. Cropping
Another technique sometimes encountered is non-linear anamorphic conversion. The idea is to perform a stretch or
retains the shape of material that remains, but at the expense of losing what might be important information, as shown in Figure 3.
shrink conversion with a variable degree of shape distortion, so that the centre portion of the picture undergoes no distortion, at the cost of a greater than average distortion nearer the sides of the picture. In the example shown in Figure 6, the central 60% of the output picture undergoes no distortion, but outside this region the distortion goes up progressively to a maximum value in such a way as to achieve the desired overall conversion ratio.
Figure 3: Cropping The letterbox or, for conversion from narrower to wider pictures, pillarbox approach maintains the integrity of both shape and content of the source by resizing the source to fit within the display window and filling the remainder of the display window with black or, less commonly, with patterns or text. Letterboxing is often favoured by film makers as it preser ves the artistic intent, but this is at the cost of a loss or waste of resolution. Figure 4 shows letterboxing applied to our example images.
Figure 6 Non-linear anamorphic conversion This produces a pleasing result if the objects of interest are near the centre of the picture, but can be disastrous if objects of interest are too close to the sides. The techniques for aspect ratio conversion described so far have two drawbacks. They do not take into account the size of the display as distinct from its shape, and they fail to make use of any knowledge about the relative importance of different regions in the picture. Aspect processing can overcome these two drawbacks.
Figure 4: Letterboxing A compromise approach sometimes used successfully in consumer TV sets is to combine all three of the above techniques, “sharing the pain” between them. In our example, the required 25% reduction in aspect ratio could be achieved by a 9.14% reduction from each of the three techniques. This would produce the result shown in Figure 5.
Aspect Processing Aspect processing is a generalization of aspect ratio conversion. Aspect ratio conversion may be thought of as using a map function that links pixel locations in the input space to pixel locations in the output space. In the examples above, this map is more or less complicated but is fixed. In aspect processing, we allow the map function to var y smoothly from frame to frame in dependence on variations in content. Aspect processing is therefore a content aware technology.
snellwilcox.com
One important feature of aspect processing is that the map approach provides a convenient framework for these content aware methods to be applied together and, if desired, in conjunction with the fixed aspect ratio conversion techniques described above.
A further refinement of the approach is to include in the distance measure a suitably normalized measure of the badness of fit of the pixel to a motion model calculated for each segment. As the segment membership is updated, the motion models can be refined using an iterative gradient approach such as that described in [2].
Two approaches to aspect processing will now be presented. In dynamic reframing, the map function defines a rectangular window of variable position and size that attempts to encompass the region of greatest interest in the scene. In seam carving, the map function is much more complex and attempts to remove (or to expand) areas all over the picture that have least interest while maintaining the integrity of the areas of greatest interest.
Dynamic Reframing In order to do dynamic reframing, the first problem is to segment the image into a region of greatest interest, or foreground, and the rest of the picture, the background. We have developed three methods of generic foreground background segmentation, which can be combined and further modified by genre-specific enhancements.
Foreground-background segmentation by clustering In one method of segmentation, we use a clustering approach [1], based on representing the pixels as “point masses” in a multidimensional space. Each dimension represents a measurable feature of the pixel, crucially including its x and y co-ordinates in the picture, but also including some or all of the colour components, motion vectors, texture measures and any other quantity that is likely to be shared by pixels in the same segment. Each of the two segments is represented by a “centroid” in the space, which can also be thought of as an entr y in a vector quantization codebook. The process of assigning pixels to segments is then equivalent to vector quantization. Each pixel is assigned to the segment to whose codebook entr y it is closest, producing a partition of the space. Having assigned pixels to segments, the centroids themselves can be updated to take into account the new par tition of the space. This is an iterative process that can also be carried over from one frame of a sequence to the next. Overall performance can be improved by introducing a probabilistic or “soft” notion of segment membership. One problem with finding the “nearest” centroid is that the multidimensional space consists of quantities that are measured in different, incompatible units. We can overcome this by normalizing each dimension according to its variance. We can also take account of dependencies between dimensions by normalizing according to the covariance matrices of the pixels in each segment – this is the Mahalanobis distance.
Foreground-background segmentation from motion estimation Another approach to foreground-background segmentation is based more directly on motion information. Motion estimation can be applied to many image processing tasks, for example standards conversion, image restoration or compression. In many cases, a motion estimator is therefore already available. The use of motion estimation for segmentation is based on the assumption that the background motion can be modelled by a single, global, vector. The foreground is then an object or a set of objects that move in a markedly different way from the background. The larger the difference between the movement of the object and that of the background, the more likely the object is to be interesting and to be part of the foreground. In our work, we used a motion estimator based on phase correlation [3], in which large blocks of successive pictures are compared in the two-dimensional Fourier domain, leading to a correlation surface with peaks corresponding to the different speeds and directions of motion present in the scene, thereby giving us a list of candidate motion vectors. Vectors are then assigned to individual pixels by finding out which candidate vector best models the local motion. The global motion vector may be obtained by taking the sum of all the correlation surfaces generated for each block in the picture and taking the highest peak. In order to find out if an object is moving ver y differently from the background, we do not need to segment and fit a motion model. Instead, we estimate such a difference on a pixel basis. The probability that a pixel is part of the background may be expressed as the inverse exponential of the Euclidean distance between the global vector and the pixel’s local assigned vector. Further information from the motion estimation process can be used in segmentation. Additionally, for increased accuracy and quality of the segmentation results, measures of intra-frame interest may be successfully integrated in the framework. In our experience, the most useful measures of spatial interest are the ones that are perceptually motivated. We draw on the concept of pre-attentive saliency of objects, favouring the colour elements that are more likely to attract the viewer’s attention, independently from the content of the scene.
Genre-specific enhancements
Dynamic zoom
The segmentation can be further enhanced using a priori
For a small display device such as a mobile phone, it may be
knowledge of the programme material. For example, in football, simple detection of green areas can indicate background, as can a more sophisticated crowd detection algorithm. In drama programmes, flesh tone can indicate foreground.
advantageous to zoom dynamically into the picture so that the output is filled by the foreground. We can use the segmentation results to control an adaptive zoom process. Figure 9 shows some results, again as highlights followed by output pictures.
Results of a foreground-background segmentation process for our example pictures are shown in Figure 7.
Figure 7: Foreground-background segmentation Figure 9: Dynamic reframing In this case, a clustering based algorithm was used. Some compromises are evident between an “ideal” motion based approach, which would find just the bicycles, and picking up some of the detail in the background.
The key improvement here for small displays is that, where there is only one object of interest, the system is able to zoom in on it, together with the most interesting area of the nearby background.
Dynamic pan-scan The segmentation results can be used to control a dynamic pan-scan algorithm. In our example, a full-height 4:3 window into the input picture is steered using the segmentation information. The results, shown first as a highlighted window into a thumbnail of the input picture, are given in Figure 8.
Background softening In many deliver y systems, the picture signal is compressed at a ver y low bit-rate. The segmentation results can be used to control an adaptive filter that softens background areas, making them easier to compress. The result is a saving of between 10% and 30% of bit-rate for a given level of perceived quality.
Dynamic pre-warping
Figure 8 Dynamic pan-scan Comparing with Figure 3, the dynamic pan-scan algorithm gives a slight advantage over fixed cropping. With general source material, we find that the dynamic pan-scan output nearly always gives framing that is better than fixed cropping.
One disadvantage of dynamic reframing is that it removes all control of the framing from the display. Some viewers may prefer the full reframing suggested by the segmentation process, while others may prefer a milder reframing or none at all. To give flexibility back to the display, it is necessar y to transmit the whole picture (or at least a full-height, dynamic pan-scan picture), but if this is done at the display resolution and the display tries to do its own reframing, the result will have unacceptably low resolution.
snellwilcox.com
A novel solution to this problem is to pre-warp the picture [4], assigning more pixels and hence higher resolution to the region of interest, at the expense of fewer pixels outside the region of interest. If the display chooses to zoom into the region of interest, sufficient resolution will then be available, while if it chooses to display the whole picture, the only penalty is a softening of the picture outside the region of interest. Pre-warping is an attractive idea but it does produce a non-standard transmitted picture, so the display has to include an inverse warping function which is controlled by metadata and by user input. Figure 10 shows a pre-warped transmitted picture and examples of zoomed-in and full-picture displays, both derived from the same transmitted picture.
The minimum-energy seam can be found using a recursive technique in which we calculate best partial seams leading to each pixel on successive rows of the picture until we have a minimum-energy seam leading to each pixel on the bottom row. We simply remove all pixels belonging to this seam from the picture, shifting the rest of the picture into the gap to make a new picture one pixel narrower than before. This is the process of “car ving” a seam from the picture.
Video seam car ving In our application of seam car ving to aspect processing of moving sequences, we have made three powerful improvements to the original algorithm. The first is that an element of motion compensated recursion is introduced, in which the energy function is weighted to favour placing a seam where we would have expected the corresponding seam in the previous picture to have moved to [6]. The second improvement relies on thinking of seam car ving as a process that generates a map function linking input and output pixel locations – effectively a set of instructions for a rendering engine. We can manipulate the map function by filtering, scaling or mixing to make the seam car ving process smoother and also to enable the seam car ving analysis to be performed on a downsampled picture to save computation [6]. The third improvement extends the map approach to give
Figure 10: Pre-warping (top left) and two resulting display options
Seam Car ving Seam car ving is a totally radical approach to aspect processing. Unlike dynamic reframing, which removes areas of least interest from the edges of the picture, seam car ving removes such areas from throughout the picture.
smooth seam carving. We take note of the amount of energy removed from the picture by each seam in the original analysis phase. This then affects the width of picture material that is actually removed. The first few seams will car ve through the plainest, lowest-energy areas of the picture, so we remove more picture material from these areas. Figure 11 gives an indication of the energy levels of seams in our example pictures and shows the resulting output pictures.
First we give a brief, informal description of the seam car ving algorithm for still pictures. Much more detailed descriptions are given in [5] and [6]. Suppose we wish to shrink a picture horizontally. Seam car ving is applied repeatedly, shrinking the picture by one pixel width at a time. Each pass of the algorithm operates as follows. We calculate an energy or activity function for each pixel in the picture. We then find a seam (a set of connected pixels) of minimum energy extending from the top to the bottom of the picture. Figure 11: Smooth, motion compensated recursive video seam carving
Conclusions
If seam car ving is successful, it is difficult to tell where information has been removed from the picture. The effect often seems to be equivalent to a subtle change in perspective of the scene. However, there are cases where the result can look unnatural. These problems can be addressed by combining foreground-background segmentation with seam car ving. The segmentation results can be used to weight the energy levels of parts of the picture that we wish to preser ve. Seams will then be less likely to cross those areas, and picture material will not be removed. The improvements that we have made to seam car ving for video sequences allow a great deal of flexibility. We can perform seam car ving independently in both horizontal and vertical dimensions and combine the resulting maps as described in [6]. We can use seam car ving to expand or contract the picture in either or both dimensions. For example, Figure 12 shows the result of seam car ving both horizontally and vertically, the aim this time being to retain the 16:9 aspect ratio but to produce a smaller picture in which objects of interest retain their original size and shape. In this example, the smooth seam car ving approach is not used, so the first pictures show the “hard” horizontal and vertical seams that are removed from the picture.
This paper has introduced and demonstrated the concept of aspect processing, in which new technologies of dynamic reframing and video seam car ving can be combined with conventional aspect ratio conversion techniques. The result is a comprehensive toolkit for optimizing the television viewing experience over a wide range of display platforms, deliver y mechanisms and source characteristics.
References 1.
Knee, M.J. Image segmentation algorithms for video repurposing. Paper presented at CVMP, London, November 2006
2.
Vlachos, T. and Hill, L. Optimal search in Hough parameter hyperspace for estimation of complex motion in image sequences. IEE Proc. Vision, Image and Signal Processing, vol 149, issue 2, Apr 2002, pp 63 - 71
3.
Knee, M.J. International HDTV content exchange. Proc. IBC 2006.
4.
Knee, M.J. Video transmission. International patent application PVT/GB2008/050158 filed 5 March 2008 – to be published.
5.
Avidan, S and Shamir, A. 2006. Seam car ving for contentaware image resizing. ACM Transactions on Graphics, vol. 26,
6.
Knee, M.J. Seam car ving for video. Proc. NAB 2008.
no. 3
Acknowledgements The authors would like to thank the Directors of Snell & Wilcox Ltd. for their permission to publish this paper.
Figure 12: Motion compensated recursive video seam car ving for size reduction There are parallels here with pre-warping, except that here the “warped” picture is designed to be viewed directly and does not require an inverse warping operation in the display.
snellwilcox.com
For further information please contact one of our international sales offices: Americas New York Snell & Wilcox Inc. 274 Madison Avenue Suite 1704 New York City NY 10016 Tel: +1 212 481 1830 Fax: +1 212 481 2642
[email protected] Burbank Snell & Wilcox Inc. 3519 Pacific Ave. Burbank, CA 91505 Tel: +1 818 556 2616 Fax: +1 818 556 2626
[email protected] Asia Pacific Hong Kong Snell & Wilcox (Hong Kong) Ltd. Room 603, Tai Tung Building No.8 Fleming Road Wanchai Hong Kong Tel: +852 2356 1660 Fax: +852 2575 1690
[email protected]
© 2008 Snell & Wilcox Limited Snell & Wilcox and Putting Pictures to Work are trademarks of the Snell & Wilcox Group. All other trademarks are duly acknowledged. Code Corp50 08/08 v1
China Snell & Wilcox China D2002D
Cyber Tower Building B 2 Zhongguancun South Road Haidan District Beijing PRC, 100086 China Tel: +86 10 5172 7909 Fax: +86 10 5172 7914
[email protected] India Snell & Wilcox India NewBridge Business Centre Technopolis DLF Golf Course Rd Sector 54 Gurgaon-122002 India Tel: +91 124 462 6000
[email protected]
South East Asia Snell & Wilcox South East Asia Sdn. Bhd. Suite 9-09 Plaza 138 Jalan Ampang 50450 Kuala Lumpur Malaysia Tel: +60 (0) 3 2732 5557 Fax: +60 (0) 3 2732 8669
[email protected] Europe, Middle East & Africa UK Snell & Wilcox Ltd. Southleigh Park House Eastleigh Road Havant Hampshire PO9 2PE
United Kingdom Tel: +44 (0)23 9248 9000 Fax: +44 (0)23 9245 1411
[email protected]
France Snell & Wilcox France 3 rue de Rome 93110 Rosny Sous Bois France Tel: +33 (0) 1 45 28 1000 Fax: +33 (0) 1 45 28 6452
[email protected] Germany Snell & Wilcox GmbH Senefelderstrasse 3a 65205 Wiesbaden Germany Tel: +49 (0) 6122 98430 Fax: +49 (0) 6122 9843 55
[email protected] Russia Snell & Wilcox AO Room 214 35 Arbat str Moscow 119002 Russia Tel: +7 495 248 3443 Fax: +7 495 248 1104
[email protected]