JPEG2000 Image Adaptation for MPEG-21 Digital Items Yiqun Hu, Liang-Tien Chia, and Deepu Rajan Center for Multimedia and Network Technology School of Computer Engineering Nanyang Technological University, Singapore 639798 {p030070, asltchia, asdrajan}@ntu.edu.sg
Abstract. MPEG-21 user cases bring out a scenario of Universal Multimedia Access which is becoming the reality: people use different devices such as desktop PC, personal digital assistant as well as smartphone to access multimedia information. Viewing images on mobile devices is more and more popular than before. However, due to the screen size limitation, the experience of viewing large image on small screen devices is awkward. In this paper, an enhanced JPEG2000 image adaptation system is proposed for MPEG-21 digital item adaptation. The image is adapted considering both visual attentive region(s) of image and terminal screen size. Through the subjective testing, the system has been approved to be a solution of efficiently displaying large images in different devices. Keywords: MPEG-21 Digital Item Adaptation, JPEG2000, Image Adaptation
1
Introduction
MPEG-21 multimedia framework aims to provide universal multimedia access and experience for users with different devices. The most vital limitation of device terminal is the screen size. Viewing images especially large images on the small device is awkward. Many inconvenient scrolling operations are required when viewing the large images in its original size on small screens. On the contrary, if the image is directly down-scaled to the screen size, users can not see them efficiently. The ideal solution is to make the best of screen by only cropping the region which attracts human visual attention and fitting to the screen size. New JPEG2000 image compression standard provides flexible scalability for transmission as well as adaptation. With its scalability, different image related applications are becoming efficient such as scalable coding and progressive transmission. MPEG-21 Standard Part 7 Digital Item Adaptation [1] describes a standardized framework to adapt format-dependent and format-independent multimedia resources according to terminal capability. Both of these two factors motivate our work of JPEG2000 image adaptation using standard MPEG-21 digital item adaptation. In this paper, visual attention analysis is integrated into the K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3331, pp. 470–477, 2004. c Springer-Verlag Berlin Heidelberg 2004
JPEG2000 Image Adaptation for MPEG-21 Digital Items
471
MPEG-21 digital item adaptation framework for JPEG2000 image adaptation. Using our image adaptation system, different devices will display different views of the same JPEG200 image while reserving most attentive information. The whole adaptation is performed on JPEG2000 bitstream level and transparent to users. The rest of this paper is organized as follows. We begin in Section 2 by briefly review related work for visual attention and introduce the attentive region extraction procedure. In Section 3, the JPEG2000 adaptation engine using the enhanced visual attention model and MPEG21 digital item adaptation is introduced. Experiment evaluations are given in Section 4 and we conclude our paper in Section 5.
2
Visual Attention Model
General speaking, when human visually view an image, some particular regions attract visual attention more than others. This visual attention mechanism is useful in displaying large images on the devices with different screen size. Through cropping and downsizing operation, a possible best solution which both reduces the number of awkward scrolling operation and reserves important information can be achieved with the attentive region information. In this section, we discussed the techniques for attentive image region detection and our attentive region extraction method. 2.1
Review of Visual Attention Model
Selective attention mechanism in human visual system has been studied and applied in the literature of active vision. Designing computational model for visual attention is the key challenge to simulate human vision. Several visual attention models have been proposed from different assumptions of saliency. Itti and Koch [2] [3] proposed a computational model to compute the saliency map for images. They used contrast as the cue for visual attention. In their method, pyramid technology was used to compute three feature maps for three low level features: color, intensity and orientation. For each feature, saliency is measured by the cross scale contrast in the neighboring location (1). Ci,j = Fc (i, j) Fs (i, j)
(1)
Where Ci,j is the saliency at the location (i, j), Fx (i, j) is the low-level feature (color, intensity and orientation) at the location (i, j) of the scale x. Finally saliency map is generated by combining all feature maps. To shift among different salient points, a ”inhibition of return” mechanism is performed iteratively. Using the same cue of contrast, Ma et al. [4] provided another attention model which only considered the color contrast. They divide the total image into small perception units. Color contrast is calculated in each of the perception units using (2). Ci,j = d(pi,j , q) (2) q∈Θ
472
Y. Hu, L.-T. Chia, and D. Rajan
where Θ is the neighborhood of (i, j) whose size controls the sensitivity of perceive field. pi,j and q denote the color features in the neighboring pixel and center pixel. d is the gaussian distance of color in LUV space. Another saliency measure is based on the information loss along fine-to-coarse scale space which is proposed by Ferraro et al. [5]. They measured the saliency using the density of entropy production (3) which is loss of information at a given pixel for unit scale. ∇f (x, y, t) 2 ) (3) σ=( f (x, y, t) Where ∇ is gradient operator, (x, y) is spatial coordinates and t is scale parameter. Chen et al. [6] also proposed a semantic attention model combining visual attention, face attention as well as text attention and different applications using this attention model have also been proposed such as image navigation [7] and thumbnail cropping [8]. In our system, our objective is to provide a general solution of efficient displaying images on the devices with different screen size. Hence we currently use Itti’s model [2] because of its generality. 2.2
Attentive Region Extraction
It is assumed that small object at the edges of an image is unlikely to be the main attention region and the attention region closer to the center of the image is perceptually more important in human vision. We assign a weight to each pixel in the image. Without additive restriction, we assume the surface of the weights of the image satisfies the gaussian distribution along both horizontal and vertical directions ((4), (5)) and the total weight is the arithmetic mean of two directions. 1 1 x − µx 2 N (µx , σx2 ) = √ exp[− ( ) ] (4) 2 σx 2πσx N (µy , σy2 ) = √
1 1 y − µy 2 exp[− ( ) ] 2 σy 2πσy
(5)
Both gaussian curves are centered at the center point of the image by setting µx the half of the width (Width / 2) and µy the half of the height (Height / 2). The σx and σy are fixed to 10 so that the gaussian curve is smooth, avoiding sharpness which only considers the small center region of the image. These weights are used to modify the saliency map as (6). S¯x,y = Sx,y ∗ (
N (µx , σx2 ) + N (µy , σy2 ) ). 2
(6)
S¯x,y is the weighted value of the saliency map at location (x,y). Weighting the saliency map differently according to the position in the image, if there are tiny attention points in the edges of the image, we will skip them and keep our focus on the most important attention region. Our experiment result shows that this simple factor has a good effect on noise reduction. The modified saliency map will
JPEG2000 Image Adaptation for MPEG-21 Digital Items
473
Fig. 1. Same image displayed on Desktop PC and PDA
now assign different value for each point according to their topology attention. In our image adaptation model, a simple region growing algorithm whose similarity threshold is defined as 30% of the gray level range in the saliency map is used to generate the smallest bounding rectangle that includes the identified attention area(s). Firstly, we take the pixels with maximum value (one or multiple) as the seeds and execute the region growing algorithm. In each growing step, the 4-neighbour points are examined, if the difference between the point and the current seed is smaller than a threshold (30% of the range of gray-level value), the point will be added into the seed queue and will be grown later. The algorithm will continue until the seed queue is empty. Finally, the output are one or several separate regions and we generate a smallest rectangle to include these regions.
3
JPEG2000 Image Adaptation Engine
Different from current image engine, the proposed image adaptation server provides a transparent resolution adaptation for JPEG2000 images in the standard way: MPEG-21 Digital Item Adaptation [9]. JPEG2000 image bitstream and its Bitstream Syntax Description (BSD) [10] compose of digital item. BSD describes the high-level structure of JPEG2000 bitstream and adaptation is performed on the bitstream according to BSD. The adapted image is directly generated from JPEG2000 bitstream according to both attentive region information and terminal screen size. In our system, accessing image through different devices will obtain different views of the original image each of which deliveries the best experience using limited screen space. Figure 1 shows the view of accessing image through desktop PC as well as the view through the PDA. We can see only most attentive information is displayed on the small screen to avoid over down-scaling the image or additional scrolling operations. Our standard image adaptation engine automatically detects the visual attentive region(s) and adapts JPEG2000 image in bitstream level using standard digital item adaptation mechanism which differs itself from other similar wok. The advantage of
474
Y. Hu, L.-T. Chia, and D. Rajan
our intelligent resolution adaptation engine is to preserve, as much as possible, the most attentive (important) information of the original image while satisfying terminal screen constraints. The engine utilizes the Structured Scalable Meta-formats (SSM) for Fully Content Agnostic Adaptation [11] proposed as a MPEG-21 reference software module by HP Research Labs. The SSM module adapts JPEG2000 images resolution according to their ROIs and the terminal screen constraints of the viewers. BSD description of the JPEG2000 image is generated by BSDL module [10]. The attentive region is automatically detected using our enhanced visual attention model and adaptation operation is dynamically decided by considering both attentive region and terminal screen size constraint. We change the resolution of JPEG2000 image by directly adapting JPEG2000 bitstream in compressed domain. The whole adaptation procedure is described as follows. The BSD description and attentive region information are combined with image itself as a digital item. When the user requests the image, its terminal constraint is sent to server as a context description (XDI). Then combining XDI, BSD descrption and attentive region information, the Adaptation Decision-Taking Engine decide on the adaptation process for the image [11]. Finally, the new adapted image, its corresponding BSD description will be generated by the BSD Resource Adaptation Engine [10]. Description can be updated to support multiple step adaptation. A snapshot of BSD digital item adaptation is shown in Figure 2.
(a)
(b)
Fig. 2. Example of Digital Item BSD Adaptation; (a) Adaptation Decision Description; (b) JPEG2000 BSD Adaptation (Green - Original BSD, Blue - Adapted BSD).
The intelligent attentive region adaptation is decided according to the relationship between image size (Isize ), attentive region size (ARsize ) and the terminal screen size Csize . – If Csize > Isize : No adaptation, the original image is sent to the user directly. – If ARsize < Csize < Isize : Crop the attentive region according to the result of visual attention analysis, removing non-attention areas.
JPEG2000 Image Adaptation for MPEG-21 Digital Items
(a)
(b)
(c)
475
(d)
Fig. 3. Example of good intelligent adaptation; (a) Original Image; (b) Saliency Map; (c) Adapted Image on PDA; (d) Directly down-scaling Image on PDA
– If Csize < ARsize : Crop the attentive region first and reduce the region resolution to terminal screen size. (another adaptation can be performed by the adaptation engine)
4
Experiment Evaluations
600 images were selected from different categories of the standard Corel Photo Library as data set. The system was implemented as a web server application. Users can view the image through desktop PC as well as PDA. Several output examples of our intelligent visual attention based adaptation are shown in Figure 3 and Figure 4. Notice that the most interest information are reserved on small screen to achieve possible better user experience. Compared with directly downsizing image, it provides a better solution of viewing images on small devices. Due to the subjectivity of visual attention perspective, we applied the
476
Y. Hu, L.-T. Chia, and D. Rajan
(b)
(a)
(c)
Fig. 4. Example of bad and failed intelligent adaptation; (a) Original Image; (b) Saliency Map; (c) Cropped Image. Table 1. User Study Evaluation - percentage of images in each category Category Animal People Scenery Others Average
Failed 0.02 0.01 0.03 0.01 0.017
Bad Acceptable Medium Good 0.09 0.22 0.33 0.34 0.11 0.22 0.30 0.36 0.22 0.13 0.22 0.40 0.10 0.26 0.41 0.22 0.23 0.38 0.29 0.108
user study experiment in [4] to test the effectiveness of the proposed algorithm. 8 human subjects were invited to assign a score to each output of our adaptation for 4 different topics. The users were asked to grade the adapted images from 1 (failed) to 5 (good). From the evaluation result shown in Table 1, we found that for different categories of images, an average of close to 87% cases are acceptable including 67% are better than acceptable. Only 10% are bad and 1% are failed. Results are bad mainly because not the whole visual object is included in the cropped images (eg. the legs of a animal) and 1% failure rate is due to either wrong visual object identified as the attention region or images like scenery shots where there may not be specific visual objects. The framework works reasonably well for a general set of natural images. Among the 8 testers, all of them agree that visual attention based adaptation improves the experience of viewing images on small devices.
5
Conclusion
In this paper, we design a JPEG2000 image adaptation engine for efficiently displaying images on different devices. The engine intelligently analyzes visual attentive region of images and provides different views of the image for different devices which makes the best of terminal screen to provide most interest information. The advantages of this engine over others is its capability of attentive region automatic detection and because using standard MPEG-21 digital item
JPEG2000 Image Adaptation for MPEG-21 Digital Items
477
adaptation mechanism as well as JPEG2000 format, it is interoperable and extensible in future. Larger image test set and more extensive subjective test will be done in future to validate its efficiency.
References 1. Vetro, A., Timmerer, C.: Iso/iec 21000-7 fcd - part 7: Digital item adaptation. In: ISO/IEC JTC 1/SC 29/WG 11/N5845. (2003) 2. Itti, L., Koch, C., Niebur, E.: A model of saliency based visual attention for rapid scene analysis. IEEE Tran on Pattern Analysis and Machine Intelligence 20 (1998) 3. Itti, L., Koch, C.: A comparison of feature combination strategies for saliency-based visual attention systems. In: Proc. SPIE Human Vision and Electronic Imaging IV (HVEI’99), San Jose, CA. Volume 3644. (1999) 473–482 4. Ma, Y., Zhang, H.: Contrast-based image attention analysis by using fuzzy growing. In: Proc. ACM Multimedia, Berkeley, CA USA (2003) 5. Ferraro, M., Boccignone, G., Caelli, T.: On the representation of image structures via scale space entropy conditions. IEEE Tran on Pattern Analysis and Machine Intelligence 21 (1999) 6. Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., Zhou, H.: A visual attention model for adapting images on small displays. ACM Multimedia Systems Journal (2003) 7. Liu, H., Xie, X., Ma, W.Y., Zhang, H.J.: Automatic browsing of large pictures on mobile devices. In: Proceedings of the eleventh ACM international conference on Multimedia. (2003) 148–155 8. Suh, B., Ling, H., Bederson, B.B., Jacobs, D.W.: Automatic thumbnail cropping and its effectiveness. In: Proceedings of ACM symposium on user interface software and technology, Vancouver, Canada (2003) 9. Bormans, J., Hill, K.: Mpeg-21 overview v.5. In: ISO/IEC JTC1/SC29/WG11/ N5231. (2002) 10. Panis, G., Hutter, A., Heuer, J., Hellwagner, H., Kosch, H., Timmerer, C., Devillers, S., Amielh, M.: Bitstream syntax description: a tool for multimedia resource adaptation within mpeg-21. Singal Processing: Image Communication, EURASIP 18 (2003) 11. Mukherjee, D., Kuo, G., Liu, S., Beretta, G.: Motivation and use cases for decision-wise bsdlink, and a proposal for usage environment descriptoradaptationqoslinking. In: ISO/IEC JTC 1/SC 29/WG 11, Hewlett Packard Laboratories. (2003)