Scale Adaptive Visual Attention Detection by Subspace Analysis Yiqun Hu, Deepu Rajan and Liang-Tien Chia
School of Computer Engineering Nanyang Technological University, Singapore, 639798
{y030070, asdrajan, asltchia}@ntu.edu.sg
ABSTRACT
this problem, we propose that the features be endowed with the ability to automatically ascertain as to which scale it should be observed in so as to be useful in VAR detection. Texture features such as Gabor features [5] reside in high dimensional feature spaces and their extraction requires a priori information, such as directions for Gabor transforms and the distances in the case of co-occurrence matrices [1]. Moreover, the problem that we address in this paper is not that of texture segmentation, but the detection of salient region detection as determined by the human visual system. We have proposed a framework [2] to detect attention region(s) by mapping color information of image patches onto a collection of linear subspaces so that each subspace corresponds to region(s) with homogeneous color. The nearest neighbor GPCA (NN-GPCA) was proposed to robustly extract those subspaces with cluster(s) corresponding to VARs. To calculate the attention of a region, the cumulative projection was used as a region-based attention measure. In this paper, we extend this framework to detect visual attention regions by using only intensity. The average intensity of scale-adaptive patches are mapped onto linear subspaces (sec 2). An iterative refinement scheme is applied on NNGPCA to solve the estimation bias for detecting linear subspaces with cluster(s) (sec 3). We also explicitly consider the cluster factor in region attention measure to remove spurious noise (sec 4). The experiments on both synthetic and real data illustrate the promising result of the proposed method (sec 5).
We describe a method to extract visual attention regions in images by robust subspace analysis from simple feature like intensity endowed with scale adaptivity in order to represent textured areas in an image. The scale adaptive descriptor is mapped onto clusters in linear spaces. A new subspace estimation algorithm based on the Generalized Principal Component Analysis (GPCA) is proposed to estimate multiple linear subspaces. The visual attention of each region is calculated using a new region attention measure that considers feature contrast and spatial geometric properties. Compared with existing visual attention detection methods, the proposed method directly measures global visual attention at the region level as opposed to pixel level.
Categories and Subject Descriptors I.4.8 [Image Processing and Computer Vision]: Scene Analysis
General Terms Algorithms
Keywords visual attention, scale-selection, GPCA.
1.
INTRODUCTION
2. SCALE-SENTIENT REPRESENTATION
Visual attention detection in images is useful in several applications like surveillance and image retrieval. Use of features that exhibit sufficient local contrast enable detection of visual attention regions (VAR) in images in which the VARs are easily identifiable. For instance, a tiger in dense forest can be easily identified as a VAR using current visual attention detection methods [3]. However, a more challenging scenario is one in which the VAR is not easily distinguishable from the background, as in highly textured images; for example, a zebra in the woods. In order to solve
In [2], the simple polar transformation was proposed to transfer image color information into a collection of linear subspaces so that each subspace corresponds to a region with homogeneous color. Homogeneity of image regions is not only in color but also in texture. Here, we introduce the scale-sentient mean feature to map texture regularity and proximity into linearity and clustering property of subspace.
2.1 Adaptive Scale Selection The patches of regular texture can display homogeneity if these patches are selected to correspond to a “texton”. The scale of these patches could be different especially for near-regular texture. This requires that the feature extraction should be scale-sentient. Here, we use the automatic scale selection method in [4] to locate the patches and decide their characteristic scales.The Gaussian scale-space of an image is generated by convolving the image using a set of Gaussian kernels with increasing variances. The determi-
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany. Copyright 2007 ACM 978-1-59593-701-8/07/0009 ...$5.00.
525
Gaussian Scale Space Scale 1 (s =2)
Scale 4 (s =8)
Figure 1: Illustration of local scale estimation on a synthetic image.
nant, kHnorm Lk of the normalized Hessian matrix is used to select the locations where kHnorm Lk is a local maximum in both spatial and scale domains. The characteristic scale of these patches is decided as the scale that achieves the local maximum in scale domain. Figure 1 illustrates the process of scale selection in a synthetic image. The image consists of nine blobs of four different radii together with additive white Gaussian noise of variance 0.1. We calculate kHnorm Lk at each location and for every scale (the scales of maximum are shown for two interesting points). The local maxima of these values in the scale-space volume are ranked and the top 9 values indicate the location of interesting points in the image. The scales associated with those 9 values reflect the characteristic size of the blob. The interesting points are represented in 3-D (x-y-scale) space and a ball is drawn at the location of the local maxima with a radius proportional to the scale.
ing c from the linear system formed by the Veronese map [6] X nK 1 pn (x) = vn (x)T c = cn1 ...nK xn (1) 1 ...xK = 0, and then solve the normal vector b of every subspace from c. The NN-GPCA improves its robustness by considering the clustering constraint of the subspace. A weight matrix W calculated from k nearest neighbor analysis is utilized in . the weighted least square estimation of c from W Ln c = 0. Thus the effect of those outliers that do not form a cluster in the linear subspace(s) is reduced. Although NN-GPCA improves the robustness of GPCA by reducing the effect of outliers, the problem of estimation bias due to subspace skew still remains. Subspace skew is the phenomenon by which one of the subspaces is dominant (in terms of the number of data points in it) compared to the other subspaces. Since the objective function in the optimization problem of both GPCA and NN-GPCA consists of the sum of approximation errors, for a fixed number of subspaces, the estimation will be biased towards those subspaces that are populated more. Fig 2 (a) - (b) show an example of this problem in which the estimations are biased to the subspace with more data points.
2.2 Scale-adaptive Feature Extraction Returning to the extraction of scale sentient features, we carry out the above scale-space analysis only on the intensity channels of an image. This results in the identification of interesting points and the associated scales. The feature used for further analysis in the VAR extraction algorithm is then simply the mean of the intensity over a square window that is centered at the interesting point and whose size is determined by the scale. However, at those locations where no interesting points were detected at any scale, the mean is computed over an 8 × 8 neighborhood. This implies that the windows can overlap. The mean as a feature is attractive because of its simplicity and the adaptive scale at which it is computed. The adaptivity ensures that texture regularity of image regions are captured in the feature. The polar transformation converts such descriptors into multiple linear subspaces.
3.
3.2 Iterative Refinement of c In both GPCA and NN-GPCA, multiple linear subspaces are transformed into a single linear model in the high dimensional embedding space via the Veronese map. The least squares estimate in the high dimensional space only minimizes the sum of squared error in the embedding space and does not necessarily minimize the error in the lowdimensional space. Hence, we propose an iterative refinement algorithm that adjusts the weight of each data point to solve the dominant bias problem of NN-GPCA. We begin the iteration with the coefficient vector c obtained from the weighted least squares approximation as in NN-GPCA. The error of each data point xi is then calculated according to
UNBIASED SUBSPACE ESTIMATION
3.1 Review of NN-GPCA NN-GPCA is an extension of the original GPCA [6] which is a geometric approach to estimate multiple linear subspaces with clusters without segmentation. GPCA considers the problem of estimating multiple linear subspaces as solv-
ei =
W (xi ) × |vn (xi )T c0 | ||vn (xi )||
(2)
where W (xi) is the weight of xi , c0 is the initial estimate of c, | · | is the absolute value of the projection of vn (xi ) onto
526
700
700
700
600
600
600
500
500
500
400
400
400
300
300
300
200
200
200
100
100
100
0
0
−100
0
10
20
30
40
50
60
70
80
90
100
−100
0
0
10
20
30
40
(a)
50
60
70
80
90
100
−100
700
700
600
600
600
500
500
500
400
400
400
300
300
300
200
200
200
100
100
100
0
0
0
10
20
30
40
50
10
20
30
40
60
70
80
90
100
−100
50
60
70
80
90
100
60
70
80
90
100
(c)
700
−100
0
(b)
0
0
10
20
30
(c)
40
50
60
70
80
90
100
−100
0
10
20
30
(d)
40
50
(e)
Figure 2: Subspace skew on subspace estimation of GPCA and the improvement of the proposed method (a) Synthetic data (b) initial estimate of GPCA and (c) final estimate of GPCA after optimization. (d) estimation after 3rd iteration; (e) initial estimate after convergence and (e) final estimate after optimization.
Cluster Growing
c0 and || · || is the length of the vector vn (xi ). The weight of each data point is then updated as p+1
i
p
Due to the clustering constraint, points that do not lie in any cluster are noise and should be removed before calculating the Cluster Compactness. We present a cluster growing algorithm to remove these noise. Here we sort the points according to their weights and pick the top m points to form an initial set of inliers. In the case of multiple clusters within a subspace, these m points will naturally come from each of the clusters, which implies that the initial set contains points drawn from all regions falling on that subspace. A normalized average minimum distance d is defined as d = D/W, where D is the sum of the minimum distance to its neighbor for all inliers and W is the sum of weights of all inliers. Given a data point xi , we calculate its distance d(xi ) to the nearest inlier. Only when
i
W (x ) = W (x ) + ei (3) where p is the iteration number. The new value of c is obtained from the weighted least squares solution of W · Ln · c = 0. During the initial steps of the iteration procedure, the weights of the data points in the smaller subspace change more than those in the dominant subspace. As iteration proceeds, the weights in the smaller subspace change less as the error decreases. The cumulative effect of the change in weights on all the subspaces is reflected in the sum and variance of the errors. Hence the sum (se ) and variance (ve ) of the errors serve as the stopping criterion, (rather than the errors in individual data points) i.e., if |sp+1 − spe | < e p+1 p ′ ǫ ∧ |ve − ve | < ǫ , the iteration stops. The proposed refinement process iteratively equalizes the importance of the inliers in the embedding space by adjusting their weights according to the residuals in the current estimation. Notice that since the initial weights of the outliers calculated from the K nearest neighbors are small, they are not increased significantly in the refinement process. Thus, only the inliers are considered in this process. It was found through experiments that the iterative refinement of c converged in only about 5 to 10 iterations. By applying the proposed refinement process to the synthetic data in Fig 2 (a), we can see from Fig 2 (d) - (e) that both subspaces can be accurately estimated.
4.
d(xi ) ≤ u × d, W (xi )
(4)
the data point xi is included as an inlier. Here, u is a constant that is required to extend the threshold beyond the average minimum distance; from experiments, it was observed that the choice of u was not critical. Normalization by weight of each data point increases the discriminatory power across different images, since a small (large) weight associated with noise (inlier) causes the left hand side of equation (4) to be even larger (smaller). When all data points have been considered, the algorithm returns the set of inliers that form clusters in each estimated subspace.
Cluster Compactness
SUBSPACE-BASED ATTENTION
In [2], we propose to use Cumulative Projection (CP) to measure the attention of every subspace. The CP for a subspace calculates the sum of the projections of all inliers onto its normal. This attention measure considers not only the feature contrast but also spatial geometric properties (e.g. size and location) of the corresponding region. However, some spurious regions of noise data may achieve high CP value because of their small sizes. In this section, we explicitly integrate clustering constraint with CP to reduce the noise effect through a measure called cluster compactness.
We now define the cluster compactness for a subspace with normal bj as cl d , (5) CC(bj ) = dcl bj where dcl bj is the mean of the kNND of all inliers to the subspace with normal bj and dcl is the mean of dcl bj over all subspaces. CC(bj ) measures the compactness of the clus-
527
ter(s) in one subspace compared to the other subspaces. This measure is used in conjunction with the Cumulative Projection to assign attention measures to regions in the image as Attention = CC(bj ) × CP (bj ), where CP (bj ) is the projection of data points onto the normal bj of the subspace, referred to as the cumulative projection, earlier.
5.
EXPERIMENT EVALUATION
To evaluate the performance of the proposed method, we conduct experiments on both synthetic data and real images. In synthetic data, the proposed refinement process is shown to solve the bias to the subspace skew of GPCA; in real images, the proposed new framework achieve the promising result for detecting attention regions, especially the challenging scenario of textured regions.
5.1 Synthetic Data
Figure 4: Examples of VAR detection using the proposed method. Top row: Original images, Middle row: patches whose corresponding subspace ranked high, Bottom row: Detected VARs.
To evaluate the robustness to subspace skew, we generate n = 2 subspaces with very different populations and add Gaussian noise with variance of 10 to the sample points. We test the algorithm for different ratios of the number of data points in one subspace to that in the other. Specifically, we start from a ratio of 1:1 till 10:1. The estimation was run 100 times P on each ratio and the error was calculated −1 T ˜ (bi bi ). Fig 3 shows that GPCA as error = n1 n i=1 cos and NN-GPCA achieve the same performance for small ratios but as the ratio increases the NN-GPCA algorithm far outperforms the GPCA algorithm. 1.8 1.6
6. CONCLUSION In this paper, we enhance the proposed NN-GPCA algorithm by iterative solution refinement for the problem of scale adaptivity and bias to subspace skew. Using this enhanced version of NN-GPCA, we detect the attention subspace corresponding to salient regions only from the polar transformation of the intensity of scale-adaptive patches. The clustering constraint is integrated into the subspacebased attention measure. From the experiments, we show that the proposed method can estimate skewed subspaces with high accuracy and that the proposed framework can detect VARs in images without distinct visual contrast.
GPCA estimation error NN−GPCA estimation error
7. REFERENCES
1.4
[1] A. Baraldi and F. Parmiggiani. A investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters. IEEE Transaction on Geoscience and Remote Sensing, 33(2):293–304, Mar 1995. [2] Y. Hu, D. Rajan, and L. T. Chia. Robust subspace analysis for detecting visual attention regions in images. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 716–724, New York, NY, USA, 2005. ACM Press. [3] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259, November 1998. [4] T. Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79–116, November 1998. [5] B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transaction on Pattern Analysis and Machine Intelligence, 18(8):837–842, June 1996. [6] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(12):1945–1959, November 2005.
1.2 1 0.8
1:2
1:4
1:6
1:8
1:10
Figure 3: The estimation error v/s ratio of subspace skew for NN-GPCA without and with the refinement of c.
5.2 Real Images We also test the performance of the proposed framework for detecting VARs on real images. Three examples on the real image are shown in Fig 4. The top row shows the original images. The patches whose corresponding polar representations belong to the detected most salient attention subspace are marked as bright pixels in the middle row and the corresponding bounding boxes are shown in the bottom row. From the experiment results, it is shown that the VARs have been successfully detected.
528