Robust Subspace Analysis For Detecting Visual Attention Regions In Images

  • Uploaded by: HU YIQUN
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Robust Subspace Analysis For Detecting Visual Attention Regions In Images as PDF for free.

More details

  • Words: 7,391
  • Pages: 9
Robust Subspace Analysis for Detecting Visual Attention Regions in Images Yiqun Hu, Deepu Rajan and Liang-Tien Chia Center for Multimedia and Network Technology School of Computer Engineering Nanyang Technological University, Singapore 639798 {y030070,

asdrajan, asltchia}@ntu.edu.sg

ABSTRACT

1.

Detecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The image is represented in a 2D space using polar transformation of its features so that each region in the image lies in a 1D linear subspace. A new subspace estimation algorithm based on Generalized Principal Component Analysis (GPCA) is proposed. The robustness of subspace estimation is improved by using weighted least square approximation where weights are calculated from the distribution of K nearest neighbors to reduce the sensitivity of outliers. Then a new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and geometric properties of the regions. The method has been shown to be effective through experiments to be able to overcome the scale dependency of other methods. Compared with existing visual attention detection methods, it directly measures the global visual contrast at the region level as opposed to pixel level contrast and can correctly extract the attentive region.

The bottom line for “efficient” transmission of multimedia content lies in its fast and perceptually pleasing delivery. The visual components of multimedia data in the form of images and videos are more prone to user dissatisfaction and to network induced distortions than text, speech and audio. One possible remedy is to detect regions of interest in images and to process them to suit specific operational constraints. For instance, the JPEG2000 image compression standard encodes the region of interest in more detail than the background. In image browsing applications, a user could be provided with coarser versions initially and a feedback mechanism could enable parts of the image (i.e., regions of interest) to be presented at a higher resolution. A variety of display devices with limited screen sizes would require appropriate region of interests of images to be displayed. Visual attention is a mechanism of the human visual system to focus on certain parts of a scene first, before attention is drawn to the other parts. Such areas that capture primary attention are called visual attention regions (VARs). For multimedia applications, the VAR should then indeed be the regions of interest and an automatic process of extracting the VARs becomes necessary. Identification of VARs has been shown to be useful for object recognition [1, 2, 3] and region based image retrieval [4, 5]. Similarly, images can be adapted for different users with different device capabilities based on the VARs extracted from the image, thus enhancing viewing pleasure. Examples of such adaptation include automatic browsing for large images [6], image resolution adaptation [7, 8] and automatic thumbnail generation [9]. Models for capturing VARs have been proposed in [10] and [11]. In [10], Itti et al. constructed pyramids of image features (intensity, colour and orientation) and used centersurround differences to calculate contrast. Various combination strategies were proposed to combine the features in order to identify the VAR [12]. Another model proposed by Ma and Zhang [11] relies on the HSV color space and the contrast is calculated as the difference of features between the center block and the spatial neighborhood blocks. Compared with [10], this model is computationally efficient if color is the right feature to detect the visual attention region. Another visual attention model used in [4, 13] measures competitive novelty to estimate the attention of each pixel. The model estimates the mismatch of the neighboring configurations with other randomly selected neighbourhoods to increase or decrease the attention value iteratively.

Categories and Subject Descriptors I.2.10 [Artificial Intelligence]: Vision and Scene Understanding— Perceptual reasoning; I.5.2 [Pattern Recognition]: Design Methodology—Pattern analysis

General Terms Algorithms

Keywords Subspace Analysis, GPCA, Visual Attention

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’05, November 6–11, 2005, Singapore. Copyright 2005 ACM 1-59593-044-2/05/0011 ...$5.00.

716

INTRODUCTION

Although the above methods have yielded interesting results, they do not address the following issues:

visual attention measure: Cumulative Projection (CP) is described in section 4. In section 5, four experiments are designed to evaluate the proposed method. Finally, conclusions and discussions related to future work are presented in section 6.

• Global attention: The mechanism of capturing visual attention should be perceived as a global process necessitating a global approach rather than considering local contrast calculated as spatial difference in features among pixels or blocks.

2.

• Scalability: The visual attention extraction algorithm should be scale invariant. In the existing methods [10, 11], a priori knowledge of the levels in a pyramid or the size of the blocks implies that the scale is fixed.

POLAR TRANSFORMATION OF FEATURES

In order to apply subspace analysis for visual attention region detection, we need a transformation that maps image regions onto linear subspaces. We consider a simple polar transformation of a feature value at one location into a point denoted by (θ, r) . The angle θ is given by

• Region extraction: The distribution of contrast at pixel level or block level as represented in a saliency map, does not indicate similar difference in feature values (e.g. intensity) and hence does not contain information about regions.

θ(fi ) =

fi − minj (fj ) × π/2 maxj (fj ) − minj (fj )

(1)

where fi is the feature value at pixel location i and θ is restricted to [0, π/2]. The radius r is the euclidean distance of a pixel from the center pixel of the image. It is interesting to note that this simple transformation satisfies two conditions that ensure the correspondence of an image region to a subspace [16]. They are

In [10] and [11], attentive points are foveats distributed at different locations. However, it is not trivial to relate such distribution of attentive points to image regions. As for scalability, [10] uses a set of predefined scales that may not be optimal for other images with attentive objects of different size, while in [11], blocks of size 8 × 8 are chosen empirically. Hence, we see that the use of spatial neighborhood configurations to extract attention regions could fail (as shown later in the experiments) since (a) it is hard to establish correspondence between fovea locations and image regions and (b) segmentation of the saliency map only groups together pixels with similar contrasts without considering region information. In this paper, we try to overcome the above problems by proposing a visual attention region detection algorithm based on subspace estimation and analysis. First, the image is mapped onto a 2-D space through a polar transformation so that possible visual attention regions are forced onto the linear subspaces. The Generalized Principal Component Analysis (GPCA) [14, 15] is then used to estimate the linear subspaces corresponding to VARs without actually segmenting these data. In order to handle noise in the transformed space, we embed the distribution of K nearest neighbors constraint within the GPCA framework. This extension improves the robustness of the proposed method to noisy data and results in a more accurate estimation of the subspaces. We call the new subspace estimation method as the NNGPCA method. The attentive region is then determined by a new attention measure that considers feature contrasts as well as geometric properties of regions. We show that the proposed visual attention detection method can solve the issue of scalability and determine attentive regions from global rather than local perspective. Not only are interregion contrasts examined but intra-region similarities are also involved to explore visual attention at the region level. Notice that detection of VARs is different from conventional image segmentation in that the former results in only those regions which are visually significant while the latter simply partitions an image into numerous homogeneous regions (e.g. in intensity). The rest of the paper is organized as follows. In section 2, the simple polar transformation of features is described and illustrated. The proposed NN-GPCA linear subspace estimation algorithm is described in section 3. The application of NN-GPCA to extract VARs in images using a proposed

• Subspace Constraint We require that each homogeneous region in an image corresponds to one and only one subspace (linear or nonlinear). The angle θ in the polar transformation ensures the mapping of the feature values of one region onto a line in 2D space. • Cluster Constraint In addition to the features lying on a subspace, they should also reside in clusters within the subspace. Thus, data not belonging to a cluster are considered as outliers. The radius r in the polar transformation forces clusters to be formed within the subspace.

−13

x 10 1.5

1

0.5

0 0

(a)

50

100

150

(b)

Figure 1: (a) Synthetic monochrome image and (b) polar transformation of its intensity

Figure 1 shows a synthetic monochrome image and the polar transformation of its intensity feature. The two linear subspaces corresponding to the two regions are evident. Figure 2 shows a noisy synthetic color image and the polar transformation of its color (hue+saturature). The four distinct regions are mapped into four linear subspaces although other subspaces are also formed due to noise. However, note that the sizes of the clusters formed by data and noise are significantly different, causing them to be distinguished easily. Hence, the true regions can be detected while noise is filtered out by imposing the subspace and cluster constraint.

717

where cn ∈ R represents the coefficient of the monomial xn . Given a collection of N sample points {xj }N j=1 , a linear system can be generated as ⎤ ⎡ vn (x1 )T ⎢ vn (x2 )T ⎥ ⎥ ⎢ ⎥ . ⎢ . ⎥c = 0 ⎢ (4) Ln c = ⎢ ⎥ . ⎥ ⎢ ⎦ ⎣ . vn (xN )T

Note also that the sizes of the clusters indicate the sizes of the corresponding regions. 50 40 30 20 10 0 0

(a)

20

40

60

80

to solve all normal vectors {bi }n i=1 to the subspaces. In the absence of noise, the number of subspaces can be estimated from the requirement that there is a unique solution for c. The normal vectors {bi }n i=1 can be solved once c is determined. However,in the presence of noise, the subspace estimation problem is cast as a constrained nonlinear optimization problem which is initialized using the normal vectors obtained as above. Further details of the GPCA algorithm are available in [14, 15]. While the GPCA algorithm provides an elegant solution to the problem of linear subspace estimation without segmenting data, there are some inherent limitations that we discuss next.

(b)

Figure 2: (a) Synthetic color image and (b) polar transformation of its hue

3. NN-GPCA SUBSPACE ESTIMATION Having transformed the image regions into a subspace representation, the objective is to estimate the subspaces. This involves determining the number of subspaces and their orientations in the presence of noise. In doing so, we do not rely on any segmentation algorithms. Vidal et al. [14, 15] have proposed an algebraic geometric approach called Generalized Principle Component Analysis (GPCA) for subspace modeling. As we will show later and for reasons elaborated in the following sub-section, the performance of GPCA degrades with noise. The GPCA algorithm is made more robust to outliers by combining it with the distribution of K nearest neighbors gknn that yields a weighted estimation of the subspaces so that outliers are weighted less than the inliers.We design a new NN-GPCA method which extends GPCA by combining gknn as weight coefficients for weighted least square estimation of subspaces to fit both subspace constraint and cluster constraint. This enables us to distinguish those linear subspaces that contain one or more big clusters from a set of noisy data. In this section, we first give a brief review of the GPCA algorithm including its limitations followed by the description of the proposed NN-GPCA algorithm.

1. Subspace number estimation: GPCA relies on a predefined threshold to combine the subspaces that are very close to each other. This threshold does not have a meaningful interpretation and there is no easy way to decide its value. 2. Effect of outliers: Each data point is either true data or noise that appears as outliers in the subspace representation. GPCA applies least square error approximation on the entire data including outliers. This could lead to an erroneous estimation of the subspaces. 3. Approximation bias: Since the objective function in the optimization problem consists of the sum of approximation errors, for a fixed number of subspaces, the estimation will be biased towards those subspaces that are populated more.

3.1 Review of GPCA

We illustrate the failure of the GPCA algorithm using synthetic data containing outliers in Figure 3 where the red lines represent the estimated subspaces. The data lie on four linear subspaces shown in Figure 3 (a) of which two subspaces contain true data and the other two contain outliers. In the absence of outliers as shown in Figure 3 (b) the GPCA estimation performs very well. However, the initial estimate of the subspaces shown in Figure 3 (c) and the final estimate using nonlinear optimization shown in Figure 3 (d) are erroneous when noisy data is also taken into account. We propose to overcome these drawbacks by weighting the data points using a K nearest neighbor distribution.

GPCA is an algebraic geometric approach to the problem of estimating a mixture of linear subspaces from sample data points. It  utilizes the fact that each data point x ∈ Rk T satisfies bi x = kj=1 bij xj = 0 where bi is the normal vector of the subspace it belongs to. Since every sample point lies on one of the subspaces, the following equation about homogeneous polynomial of degree n on x with real coefficients holds n  (bTi x) = 0 (2) pn (x) = i=1

where n is the number of subspaces and {bi }n i=1 are normal vectors of the subspaces. The problem of estimating subspaces is to solve this nonlinear system for all {bi }n i=1 . It can be converted to a linear expression by expanding the prodnk 1 n2 uct of all bTi x and viewing all monomials xn = xn 1 x2 ...xK of degree n as system unknowns. Using the definition of Veronese map [14, 15] vn : [x1 , ...xK ]T → [..., xn , ...]T , equation (2) becomes the following linear expression:  nK 1 cn1 ...nK xn (3) pn (x) = vn (x)T c = 1 ...xK = 0,

3.2

Assigning weights to data points

A subspace clustering method using the K th nearest neighbor distance (kNND) metric is shown to detect and remove outliers and small data clusters in [16]. The kNND metric uses the fact that in a cluster larger than K points, the kNND for a data point will be small; otherwise it will be large. According to the polar transformation of features, the true data lies in not any cluster but the cluster inside its subspace. Instead of using kNND, we utilize the distribution of

718

7

7

6

6

5

5

4

4

3

3

2

2

1 0 0

3.3

The weights obtained from the analysis of the distribution gknn of K nearest neighbors are used in the GPCA algorithm to improve robustness and accuracy of subspace estimation. By taking the weight of each data point xi into account, the linear system of equations (4) is modified as

1 2

4

6

8

0 0

2

(a) 7

6

6

5

5

4

4

3

3

2

2

1

1

0 0

2

4

(c)

4

6

8

. W Ln c = ⎡ W (x1 ) W (x2 ) ⎢ ⎢ . ⎢ ⎢ ⎢ ⎣

(b)

7

6

8

0 0

The NN-GPCA algorithm

2

4

6

8

.

⎤⎡ vn (x1 )T 2 T ⎥⎢ v n (x ) ⎥⎢ ⎢ . ⎥⎢ ⎥⎢ . ⎥⎢ ⎦⎣ . . W (xN ) vn (xN )T

⎤ ⎥ ⎥ ⎥ ⎥c = 0 ⎥ ⎥ ⎦

(6) where W (xi ) is the weight of xi . In order to estimate the number of subspaces,we first modulate Ln using the weights W (xi ) for each xi as

(d)

Figure 3: Effect of outliers on subspace estimation by GPCA (a) Synthetic data (b) estimation using true data only (without outliers); (c) initial estimate used to determine (d) final estimate after optimization

v˜n (xi ) = v¯n + W (xi )(vn (xi ) − v¯n )

(7)

i

where v¯n is the mean of the data. If x is an outlier, its small weight causes it to be pulled closer to the mean of the data. Next, we do a Singular Value Decomposition (SVD) on W Ln and eliminate the outliers using a very weak threshold. We emphasize that the choice of threshold in this case is not crucial since the weights allow less dependency on the threshold. This is unlike the case for GPCA where the presence of outliers may cause the number of dominant singular values to be large. The subspace estimation problem is formulated as an approximation problem using weighted least square technique for the purpose of calculating coefficient vector c. Since c can be obtained only up to a scale factor, we normalize it by the first component c1 . Thus the left side of equation (6) becomes ⎤ ⎡ ⎤ vn (x1 )(2..M )T ⎡ ⎤ ⎡ W (x1 )vn (x1 )(1)c1 c2 ⎢ vn (x2 )(2..M )T ⎥ ⎥ ⎢ c3 ⎥ ⎢ W (x2 )vn (x2 )(1)c1 ⎥ ⎢ ⎢ ⎥⎢ ⎥ ⎢ . ⎥⎢ . ⎥ ⎥ . W⎢ ⎥+⎢ ⎢ ⎥⎣ ⎥ ⎢ . ⎦ ⎥ ⎢ ⎣ ⎦ . . ⎦ ⎣ . cN W (xN )vn (xN )(1)c1 N T vn (x )(2..M ) (8) where vn (xi )(2..M ) represents a vector containing all components of vn (xi ) except for the first component vn (xi )(1). With c1 = 1, equation (6) can now be rewritten as ⎤ ⎡ ⎤ vn (x1 )(2..M )T ⎡ ⎤ ⎡ −W (x1 )vn (x1 )(1)T c2 ⎢ vn (x2 )(2..M )T ⎥ 2 2 T ⎥ ⎢ c3 ⎥ ⎢ −W (x )vn (x )(1) ⎥ ⎢ ⎢ ⎥⎢ ⎢ ⎥ . ⎥⎢ . ⎥ ⎥ W⎢ . ⎥=⎢ ⎢ ⎥⎣ ⎢ ⎥ . ⎦ ⎥ ⎢ ⎣ ⎦ . . ⎦ ⎣ . N N T c )v (x )(1) −W (x N n vn (xN )(2..M )T (9) The above equation can be succinctly written as W Ac = d, where A is the matrix whose rows are vn (xi )(2..M )T , i = 1, 2..N and d is the right side of equation (9). By minimizing the objective function ||d − Ac||W , we can obtain the weighted least square approximation of ci , i = 1, 2, 3..N as

all k nearest neighbors denoted as gknn to differentiate inliers and outliers. In this paper, we assign a weight to each data point xi calculated from gknn (xi ). This provides a simple method to reduce the sensitivity of outliers without segmentation of data. The weight is related to the probability of a data point lying on a subspace corresponding to an image region. Given a sample data xi , its K nearest neighbors are detected and the variance svar(gknn (xi )) along the direction of the subspace of xi (from origin to the current point) and the variance nvar(gknn (xi )) along the orthogonal direction are calculated using Principal Component Analysis (PCA). The sum S(gknn (xi )) = svar(gknn (xi )) + nvar(gknn (xi )) corresponds to the characteristic variance of K nearest neighbors of the current point. It will be small if these K neighbors form a cluster, otherwise it will be large. Since only the clusters inside the subspace are true data, we use the ratio R(gknn (xi )) = nvar(gknn (xi ))/svar(gknn (xi )) as the factor to bias the weights to those clusters in the subspace that correspond to true data. Hence, the weight for xi is calculated as 1 W (xi ) = (5) 1 + S(gknn (xi )) × R(gknn (xi )) When the data point xi lies in a cluster larger than K inside a subspace, nvar(gknn (xi )) is 0 in the absence of noise and the ratio R(gknn (xi )) is very small in the presence of noise. The sum S(gknn (xi )) is also small because these K data form a cluster. So W (xi ) is equal to or close to 1. Otherwise, R(gknn (xi )) and/or S(gknn (xi )) are large and W (xi ) is small or even close to zero. Additionally, the parameter K decides the minimum size of the cluster that is considered as true data. Since the outliers are always far away from the true data, any small value of K can differentiate outliers from true data and the selection of this value can be fixed for all cases. Hence the weight of each data point relates to the probability that they are inside the cluster of a specific subspace corresponding to one image region. Thus, the analysis of gknn (xi ) assigns small weights to both outliers and small clusters to reduce their effect on subspace estimation.

[c2 , ..cN ]T = (AT W T W A)−1 (AT W T W d) (10) The estimation error of coefficient vector c is reduced by c1 = 1

719

and

the diagonal matrix of weight coefficient W . Through W , the contribution of the outliers to the system are reduced by small weights. The normal vectors {bi }n i=1 are calculated from c using the same method as in GPCA [14, 15]. These vectors serve to initialize the following constrained nonlinear optimization which differs from [14, 15] in the weight matrix: min subject to

N

10

5

0 0

j=1

n 

15

j

j

j 2

W (x )||˜ x − x ||

(bTi x ˜j ) = 0

10

(a)

j = 1, ..., N.

20

30

(b)

Figure 5: NN-GPCA on Natural Image (a) original image; (b) estimation result of three subspaces

(11)

i=1

By using Lagrange multipliers λj for each constraint, the above optimization problem is equivalent to minimizing the function N 

(W (xj )||˜ xj − xj ||2 + λj

j=1

n 

(bTi x ˜j )).

main subspaces (three red lines) corresponding to three image regions with different hue. All outliers not lying in a cluster are ignored because of their small weights. Notice that multiple objects with similar visual appearance at different locations will be mapped to a single subspace. Hence, the proposed approach can detect similar regions simultaneously and assign the same amount of attention to them using the attention measure introduced later.

(12)

i=1

Taking partial derivatives w.r.t x ˜j and equating it to 0, xj − xj ||2 . By replacing we can solve for λj /2 and W (xj )||˜ them into the objective function (11), the simplified objective function on the normal vectors can be derived as

N  W (xj )(n n (bT xj ))2 n

i=1 Ti j 2 . (13) En (b1 , ..., bn ) = || i=1 bi l=i (bl x )|| j=1

3.4

We found the convergence of equation (13) to be very slow. Hence, a weighted k-means iteration method was used to determine the optimized {bi }n i=1 . The weighted data points are assigned to the nearest subspace and the updated subspaces are estimated. This process is continued till there is no change in the subspaces. This method of optimizing the bi ’s achieves the same performance as equation (13) but at a faster convergence rate. We illustrate the improvement in subspace estimation using NN-GPCA on the synthetic data used in Figure 3. The initial estimation calculated from the weighted linear system and the final optimized estimation are shown as green lines in Figure 4 (a) and (b), respectively. Comparing with GPCA, we note that the effect of two subspaces due to outliers on the initial estimation is reduced. Subsequently, the optimization process results in the correct estimation of the two subspaces that satisfy the cluster constraint while ignoring the outlier subspaces. Figure 5 shows 7

7

6

6

5

5

4

4

3

3

2

2

1 0 0

Computation of weights Estimation of the K nearest neighbors requires complexity of O(N 2 ) without any optimization. Since K is much smaller than N , the subsequent process of calculating weights using SVD on K nearest neighbors is of much lower order and hence can be ignored. Therefore, the complexity in this step is O(N 2 ).

Subspace estimation This process consists of (i) estimating the number of subspaces (n) for which the complexity is O(N ) since n is a small number; (ii) solving for ci , i = 1, 2, 3..N from a linear system of N linear equations in n + 1 unknowns. Without any optimization, the complexity is O(n2 · N ) without noise and O(n · N 2 ) in the presence of noise; (iii) solving for {bi }n i=1 , in the case of 2D, it is only required to find n roots of a linear equation of degree n in one unknown. Its complexity can be ignored; (iv) optimization using weighted k-means has complexity O(N ·log(N )) which is lower than O(N 2 ). So the total complexity is O(N +n·N 2 +N ·log(N )) = O(n·N 2 ) which is the same as the complexity of GPCA. From the above analysis, we can see that the computational complexity of NN-GPCA is O(N 2 +n·N 2 ) = O(n·N 2 ) which, again is the same as the complexity of GPCA. We conclude that our extension of NN-GPCA improves the robustness of subspace estimation in the presence of outliers without increasing the computational complexity. Since dividing the image into small blocks makes the number of data N much smaller than number of pixel and the number of subspace n also small, the algorithm with the complexity of O(n · N 2 ) is quite efficient. When applying it to multiple features, the complexity of NN-GPCA only increases linearly with the number of features.

1 2

4

(a)

6

8

0 0

2

4

6

Computational Complexity

We derive the computational complexity of the NN-GPCA algorithm for an image that contains n subspaces and which is divided into N blocks of 8 × 8 pixels each, where N is much smaller than the total number of pixels.

8

(b)

Figure 4: Subspace estimation using NN-GPCA (a) initial estimate used to determine (b) final estimate after optimization (Compare with Figure 3).

the result of subspace estimation on real data. Here, we transform hue information into the polar 2-D space resulting in a noisy representation in the subspaces. However, the proposed NN-GPCA method can correctly estimate the three

720

4. THE CUMULATIVE PROJECTION Having described the NN-GPCA algorithm to estimate the subspaces of a transformed image, we now present a new attention measure to detect visual attention regions within the subspaces. Due to the polar transformation of image features, each region lies on a linear subspace and hence a measure of feature contrast between the regions is simply |θ1 − θ2 |, where θ1 and θ2 are the angles made by the subspaces. The projection of data points onto the normal to a subspace is an indication of how visually important the corresponding region is to the human visual system and serves as a good measure for region attention because it considers both feature contrast as well as the geometry properties of the region. We call this measure as the Cumulative Projection (CP) and define it as CP (˜bj ) =

N  i=1

(|(xi )T ˜bj |)/

N 

||xl ||

(a)

(b)

(c)

Figure 6: Images to illustrate effect of region size and location on Cumulative Projection

Table 1: Cumulative Projection values Image Light Region Dark Region Image (a) 0.5772 0.0005 Image (b) 0.1549 0.4223 Image (c) 0.7071 0.5528

(14)

l=1

i.e., the sum of the projections of all normalized data onto the normal of a subspace. Besides feature contrast, the CP attention measure inherits two important properties about the size and location of the region that make it consistent with the human visual system (HVS) to correctly detect VARs. Firstly, an image consisting of a small object within a large background draws attention to the object first. Even if the differences in the feature values between object and background are not large, the CP attention value biases the attention to the smaller object in the foreground. This is because the projection of all data onto the normal of the subspace representing the small object will be larger than that on the normal of the subspace representing the background. Secondly, CP also captures the variability of attention on location of regions within an image. Most often, attention is drawn to the center of an image first. Closer a region is to the center of the image, higher will be its CP value. We illustrate these properties through a synthetic example in Figure 6, which shows three images with (a) small object on a large background, (b) a large object in the foreground and (c) two objects in the foreground but at different distances from the center of the image. As shown in table 1, the smaller brighter region in Figure 6 (a) has larger CP attention value than the background. Similarly, the attention is drawn to the edges in Figure 6 (b) first, which is reflected by the CP in table 1. In Figure 6 (c), because the sizes of the subspaces representing the white region and the black region are the same and the contrast between these regions and the background are identical, both would possibly capture visual attention to the same extent. However, a higher CP attention value is obtained for the white region by virtue of it being closer to the center of the image. So the white region will be focused before black region. Coming back to Figure 6 (b), one could argue that since the location of the darker edge region is farther from the center of the image, it should garner lower CP value than the lighter region. But in this case, the effect of the region size overrides the location property to generate a higher CP value, which is once again consistent with the HVS. Furthermore, if we were to imagine the lighter square region to shrink so that its size is the same as the size of the background region, attention would now be drawn to this region since it is closer to the center.

The Cumulative Projection measures visual attention in a global perspective by considering inter-region contrasts as well as the size and location of the regions. In the case of multiple features, we do a similar analysis on each feature so that whichever feature yields the highest CP value is automatically selected as the one which is finally used to extract VARs. In Figure 7 (a), the hue and intensity features of an image are shown. Their corresponding subspaces are shown in Figure 7 (b) in which the largest CP for the hue feature is 0.8139 and for the intensity feature is 0.5816 (both shown in green). The system automatically selects hue to detect visual attention. From the extracted attentive region, we can see that the detected region corresponds to the attentive object detected by the HVS. 15

10

5

0 0

10

20

30

20

30

30 25 20 15 10 5 0 0

(a)

10

(b)

(c)

Figure 7: Multiple feature analysis: (a) feature map, (b) subspace estimation and (c) extracted attention region using Top row: Hue feature and Bottom row: Intensity feature

5.

EXPERIMENTAL RESULTS

We conduct several experiments to demonstrate various properties of the proposed VAR extraction mechanism. We illustrate specifically the robustness and scale invariance of the algorithm, and also compare the performance of the proposed algorithm with Itti’s model [10]. We also show the utility of the algorithm to automatically detect VAR when multiple features are used. Each data point consists of the

721

Hue

Intensity

Subspace Mask

Hue

Intensity

Subspace Mask

Subspace Mask

Subspace Mask

Intensity Win

Hue Win

(a)

(b)

Figure 8: Two examples of automatic feature selection mean of the feature over an 8 × 8 block in the image. Unlike the partitioning of the image into blocks in [11] for the purpose of measuring contrast among blocks, here this process is done only to accelerate the analysis and to reduce the effect of noise. Hence, the size of the block is not crucial. The value of K in the subspace estimation is chosen as 15 for fixed image size of (256 × 384). The features used are hue,intensity and the R,G,B channels.

0.15

GPCA NN−GPCA

0.1

0.05

Experiment 1 In the first experiment, we compare the proposed NN-GPCA subspace estimation method with GPCA on synthetic data. We randomly pick n = 2 subspaces each containing N = 400 points on 1-dimensional subspaces of R2 . Zero-mean Gaussian noise with standard deviation of 20 is added to the sample points. We consider six cases of outliers that are randomly added to the data such that the number of outliers are 0%, 2.5%, 5.0%, 7.5%, 10%, and 12.5% of the original number of data points. For each of the six cases, the NNGPCA and GPCA algorithms are run 500 times. The error between the true unit normals {bi }n i=1 and their estimates {˜bi }n i=1 is computed for each run as [14, 15] error =

n 1 cos−1 (bTi ˜bi ) n i=1

0 0

0.025

0.050

0.075

0.100

0.125

Figure 9: Subspace estimation error v/s number of outliers for NN-GPCA (Blue) and GPCA (Red) for synthetic data.

five features mentioned at the beginning of this section, although only the cases for intensity and hue are shown for the sake of brevity. In Figure 8 (a) the intensity feature is more suitable for detecting the visual attention region compared to hue, while in Figure 8 (b), it is vice versa. The automatic feature selection capability is similar to the method of using convex hull of salient points in a contrast map [17]. However, instead of using heuristic knowledge as in that case, we rely on the computational measure of cumulative projection to choose the feature which is more robust.

(15)

Figure 9 plots the mean error calculated over all the 500 trials as a function of the number of outliers. We can see when there are no outliers, the estimation errors corresponding to NN-GPCA and GPCA are the same. As the number of outliers increases, NN-GPCA outperforms GPCA .

Experiment 3 The third experiment compares the attention detection results of the proposed method and the widely used visual attention model of [10]. Figure 10 (a) shows the saliency maps generated from [10] and (b) are the bounding boxes indicating visual attention regions extracted by segmenting the saliency map. Here, the saliency map only indicates the foveats of attention. We used methods of segmenting the

Experiment 2 The second experiment is designed to evaluate the automatic feature selection capability of the proposed method. The largest CP on the subspaces formed from each feature is the cue to select the optimal feature for attention region extraction. In the examples shown in Figure 8, we use the

722

saliency map as described in [2, 3], but the results were unsatisfactory. Figure 10 (c) and (d) show, respectively, the candidate visual attention regions and the smallest bounding boxes includes these regions extracted using the proposed method. Notice that the proposed method detected visual attention region correctly.

(a)

(b)

(c)

More VAR detection results are shown in Figure 12 in which the first row contains the original images, the second row shows the candidate attention regions (white regions indicate the region corresponding to the subspace with the largest CP ) and the last row shows the VAR represented by the smallest bounding box that includes all the elements belonging to the selected subspace. While these results are encouraging, we do realize that the choice of features is important in that they should reflect the homogeneity of the region. Even if the subspaces are correctly estimated, the VAR extraction could fail since there would be no correspondence between the subspace and a region. Choice of a suitable feature may not be trivial in some images shown for example in Figure 13. We show intensity, hue and green channel feature map of two images as well as their corresponding transformation graphs. In these cases, neither intensity nor color can indicate interesting region (object) as a subspace such that the attention region cannot be detected whatever the performance of subspace estimation.

(d)

Figure 10: Comparison of proposed subspace method with [7] (a) saliency map; (b) VAR from (a); (c) attentive region using our proposed method; (d) VAR region from (c)

Experiment 4

25

30

Models for VARs proposed in the literature calculate contrast among regions using a priori knowledge about a block size or pyramid level. These limit the ability of the model to detect regions only within the prescribed scale. The proposed method achieves scale invariance since we use the CP attention value which can identify regions of any size by considering both inter-region contrast and intra-region homogeneity. Figure 11 shows the visual attention regions

20

20

15 10

10

5

0 0

10

20

0 0

30

30

25

25

20

20

10

20

30

10

20

30

20

30

15

15

10

10

5

5 0 0

10

20

0 0

30

20

30 25

15 20

10

15 10

5 5 0 0

(a)

10

20

(b)

0 0

30

(c)

10

(d)

Figure 13: Failed examples (a, c) feature maps of original images and (b, d) their transformation graphs

6.

CONCLUSION & DISCUSSION

In this paper, we propose to solve image attentive region detection problem using linear subspace estimation and analysis techniques. Through a simple polar transformation, each image feature value is transformed to a data point denoted by (θ, r) in a 2-D space. The value of θ ensures that all pixels within a homogeneous region are mapped to a linear subspace and r encodes the location of a region within the image. We propose a new linear subspace estimation method called the NN-GPCA method in which the distribution of K nearest neighbors is used to weight the data points according to their probability of belonging to a big cluster in some subspace corresponding a image region. A weighted least square technique is used to estimate the subspaces. A new attention measure for image regions is then proposed that defines the cumulative projection of all data onto the normal of the corresponding subspace. The subspace with the largest CP is extracted to detect attentive region. Several experiments are performed to evaluate the performance of

(i)

(ii) Figure 11: VAR extraction across different scales: (left) small, (center) medium and (right) large using (i) our proposed method and (ii) method of [11]

extracted from the images which show flowers at different scales. We compare our method to that proposed in [11] in Figure 11 (ii) and observe that the results are the same for the small scale but as the scale increases, the performance of [11] degrades rapidly.

723

Figure 12: More examples of visual attention region detection using the proposed method the proposed method. The experimental results are promising as they show that the proposed method can detect VARs correctly. Future work will involve (i) choosing optimal features to be transformed into the subspace estimation domain (ii) selecting a suitable transformation for the weights as the data is embedded from low dimensional space into the Veronese map and (iii) developing an algorithm for estimating the optimal features (in the multiple feature case) simultaneously rather than as a sequential process.

[9]

[10]

7. REFERENCES [1] U. Rutishauser, D. Walther, C. Koch, and P. Perona. Is bottom-up attention useful for object recognition? In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 37–44, Washington, DC, USA, July 2004. [2] U. Rutishauser, D. Walther, C. Koch, and P. Perona. On the usefulness of attention for object recognition. In 2nd International Workshop on Attention and Performance in Computational Vision 2004, pages 96–103, Prague, Czech Republic, May 2004. [3] D. Walther, U. Rutishauser, C. Koch, and P. Perona. Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Computer Vision and Image Understanding, pages 745–770, to be published 2005. [4] A. Bamidele, F. W. Stentiford, and J. Morphett. An attention-based approach to content based image retrieval. British Telecommunications Advanced Research Technology Journal on Intelligent Spaces (Pervasive Computing), 22(3), July 2004. [5] X.-J. Wang, W.-Y. Ma, and X. Li. Data-driven approach for bridging the cognitive gap in image retrieval. In Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, volume 3, pages 2231–2234, Taibei, Taiwan, June 2004. [6] H. Liu, X. Xie, W.-Y. Ma, and H.-J. Zhang. Automatic browsing of large pictures on mobile devices. In Proceedings of the eleventh ACM international conference on Multimedia, pages 148–155, Berkeley, CA, USA, 2003. ACM Press. [7] L. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and H. Zhou. A visual attention model for adapting images on small displays. ACM Multimedia Systems Journal, 9(4):353–364, November 2003. [8] Y. Hu, L.-T. Chia, and D. Rajan. Region-of-interest based image resolution adaptation for mpeg-21 digital item. In Proceedings of the 12th annual ACM

[11]

[12]

[13]

[14]

[15]

[16]

[17]

724

international conference on Multimedia, pages 340–343, New York, NY, USA, 2004. ACM Press. B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs. Automatic thumbnail cropping and its effectiveness. In Proceedings of the 16th annual ACM symposium on User interface software and technology, pages 95–104, New York, NY, USA, Novemember 2003. ACM Press. L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259, November 1998. Y.-F. Ma and H.-J. Zhang. Contrast-based image attention analysis by using fuzzy growing. In Proceedings of the eleventh ACM international conference on Multimedia, pages 374–381, New York, NY, USA, Novemember 2003. ACM Press. L. Itti and C. Koch. A comparison of feature combination strategies for saliency-based visual attention systems. In Proceedings of SPIE Human Vision and Electronic Imaging IV (HVEI’99), volume 3644, pages 473–482, San Jose, CA, January 1999. A. P. Bradley and F. W. Stentiford. Visual attention for region of interest coding in jpeg2000. Journal of Visual Communication and Image Representation, 14(3):232–250, September 2003. R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis (gpca). In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 621–628, Madison, Wisconsin, USA, June 2003. R. Vidal. Generalized Principal Component Analysis (GPCA): an Algebraic Geometric Approach to Subspace Clustering and Motion Segmentation. PhD thesis, School of Electrical Engineering and Computer Sciences, University of California at Berkeley, August 2003. Q. Ke and T. Kanade. Robust subspace clustering by combined use of knnd metric and svd algorithm. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 592–599, Washington, DC, USA, July 2004. Y. Hu, X. Xie, W.-Y. Ma, L.-T. Chia, and D. Rajan. Salient region detection using weighted feature maps based on the human visual attention model. In Proceedings of the Fifth IEEE Pacific-Rim Conference on Multimedia, volume 2, pages 993–1000, Tokyo Waterfront City, Japan, November 2004.

Related Documents


More Documents from "aleko"