IDB-MMS workshop Department of WINS Department of Computer Science and Logic University of Amsterdam Kruislaan 403 1098 SJ Amsterdam The Netherlands
Dear Conference Secretary,
Enclosed is the final version of our paper entitled, “Content Based Image Retrieval: Optimal Keys, Texture, Projections, or Templates,” to the book pertaining to the First International Workshop on Image Databases and Multi-media Search.
Sincerely,
Michael S. Lew
Department of Computer Science Leiden University, Postbus 9512 2300 RA Leiden The Netherlands Email:
[email protected] Telephone: 071-527-7034
Content Based Image Retrieval: Optimal Keys, Texture, Projections, or Templates Michael S. Lew
D.P. (Nies) Huijsmans
Department of Computer Science Leiden University, Postbus 9512 2300 RA Leiden, The Netherlands {mlew huijsman}@wi.leidenuniv.nl
Dee Denteneer Philips Research Laboratory Prof. Holstlaan 4 5656 AA Eindhoven
[email protected]
Topics: content based image retrieval, Fisher’s linear discriminant, Karhunen-Loeve, LBP, texture, optimal keys, template, projections,
Abstract. Two significant problems in content based retrieval methods are (1) Accuracy: most of the current content based image retrieval methods have not been quantitatively compared nor benchmarked with respect to accuracy and (2) Efficiency: image database search methods must be analyzed for their computational efficiency and interrelationships. We assert that the accuracy problem is due to the generality of the applications involved. In the current systems, the goal of the user is not clear, which results in difficulties in creating ground truth. In this paper, we quantitatively compare and evaluate four fundamentally different methods for image copy location, namely, optimal keys, texture, projection, and template methods in a large portrait database. We discuss some important theoretical interrelationships, computational efficiency, and accuracy with respect to real noise experiments.
1
Introduction
There has been vigorous recent interest in searching large image databases. For an overview, see Gudivada and Raghavan [3]. As libraries, museums, etc. become digitized, we will need tools to search through terrabyte databases for semantically meaningful information. However, most current methods have not been quantitatively compared nor benchmarked. In content based image retrieval, there are several different problem spaces. These are locating (1) exact copies; (2) copies with artificial noise; (3) copies with real noise; and (4) finding similar images, which we refer to as image association. In problem spaces (1), (2), and (3), we can define the ground truth because the goal is clear. In problem space (4), it is difficult to define the ground truth because different users may have different preferences for what a “similar image” is. Thus, systems such as QBIC [1], etc. have not been quantitatively compared with respect to performance accuracy. There are a wide variety of sources for real noise. We examined print-scanner noise, and general image degradation. In print-scanner noise, the image is printed to plain paper and then scanned in. This test is indicative of the performance of an
algorithm for finding copies of scanned in newspaper or magazine pictures. For the general image degradation, the copies of the original image have been subjected to decades of different handling and treatment by their owners. In this paper, we compare optimal keys, texture, projections, and template methods for the purpose of noisy image copy location (problem spaces (2) and (3)) in the Leiden 19th century portrait database (LCPD). Section 2 describes the image database and ground truth. Section 3 discusses the relationship between optimal keys, the Karhunen Loeve Transform (KLT) and Fisher’s linear discriminant. Section 4 describes the implementations of the different methods, and in Section 5, we test the methods on the full noise database. Conclusions are given in Section 6.
2
Image Database and Ground Truth
The LCPD (see Figure 1) is currently composed of 5570 images taken during the 19th century, and it will be continually expanded until at least 50,000 images are in the database. Some images are copies of each other, however, due to different storage conditions, the copies have varying kinds and differing amounts of degradation. The degradation varies from light and moisture damage to scratches and writing on the images.
Figure 1. Sample images from the Leiden 19th Century Portrait Database.
3
Optimal Keys, KLT, and Fisher’s Approach
Both the Karhunen-Loeve transform (KLT or Hotelling transform) and Fisher’s linear discriminant method are well documented in the pattern recognition and statistics literature [2,4,5]. The KLT determines the optimal (in a truncation sense) linear features for describing a data set. Fisher’s approach focuses on determining the optimal linear discriminant features for classification. Since Fisher’s approach minimizes the classification error, it seems to be the appropriate measure. However, it is often difficult to accurately model real world noise, which is a requirement for optimality of Fisher’s method. In the rest of this section we derive the theoretical mathematical relationship between optimal keys, KLT, and Fisher’s linear discriminant.
With respect to large image database, it is also of significant importance to develop computationally efficient algorithms. Instead of comparing the raw images, we could compare keys: functions or algorithms which map images onto one or more feature values. Thus instead of computing 2 ( kl ) 2 (1) d i = X − Ξi = ∑ x ( kl ) − ξ
(
)
for i = 1,...,N , we need only compute k(X) and compare it with k(Ξ i), i = 1,...,N. Here X denotes an input image, Ξ i, i = 1,...,N denote the images in the database, and k denotes the key. Moreover, for the comparison of these key values efficient data structures and algorithms are readily available. This approach using keys to index the database may also be combined hierarchically with a more sophisticated matching algorithm. In such a hierarchical approach, a small subset of the database is selected very efficiently using the key values, and then this subset is searched using another key index. From a mathematical-statistical viewpoint, there are two stochastic processes of relevance. The first process is denoted the image process and the exact images within the database are considered to be realizations of this process. The second process is called the noise process. Input images are assumed to result from a combination of the image process and the noise process. Formally, denote the probability distribution of the image and noise process by FΞ and FE, respectively. Then the optimal key algorithm is that k such that Pr{|k(Ξ+E) - k(Ξ)| < |k(Ξ+E) - k(Φ)|} (2) is maximized. Here Ξ and Φ are distributed according to FΞ and E is distributed according to FE and Ξ, Φ, and E are independent. Thus equation (2) states that optimal keys maximize the probability that the key computed from a corrupted image is closer to the key computed from its original than to the key computed from another image. In short, the optimal key criterion in (2) results in maximizing the probability of finding the correct image. However, in practice, it is difficult to model FΞ and FE so that the optimization of equation (2) is not realistic. Thus, we simplify equation (2) by restricting the space of keys to linear keys, such that k(Ξ+E) = k(Ξ) + k(E). This means that k(E) must be small and that k(Ξ) k(Φ) must be large. Furthermore, we implement the magnitude of a stochastic variable through the magnitude of its variance. Assuming that FΞ has covariance matrix ΣB and that FE has covariance matrix ΣW, then the requirement that k(E) be small and k(Ξ) - k(Φ) be large amounts to k t Σ Bk (3) maximize kt Σ W k
or, equivalently, maximize ktΣB k subject to ktΣB k = 1
(4) (5)
Criterion (3) is the criterion used in Fisher’s linear discriminant. Furthermore, if we ignore the noise process then equations (4) and (5) become
maximize ktΣB k subject to ktk = 1
(6) (7)
which is the Karhunen-Loeve transform.
4
Description of Methods
In the previous section, we discussed the theoretical interrelationships between optimal keys, the KLT and Fisher’s linear discriminant. Optimal keys were defined as functions on the images which maximized the probability of finding the correct copy of image. It was found that when the noise could be accurately modeled, the optimal key method is similar to Fisher’s linear discriminant. When the noise is ignored, the optimal key method reduces to the Karhunen-Loeve transform. Thus, in the experiments, we use Fisher’s linear discriminant for the print-scanner problem, and the KLT for the general image degradation. There are a vast number of texture methods in the research literature[6]. In a recent comprehensive survey [6], the method of linear binary patterns (LBP) [7] had the lowest error rate for the first image set. On this basis, we chose to use the LBP as our representative texture method. In the surveyed LBP method [6], a texture unit is a two-level 3x3 pattern where the threshold is the center pixel. This gives 28 or 256 possible texture units. The texture spectrum is the distribution of the 256 patterns. In our copy location problem space, we calculate the texture spectrum for each image, and then rank by feature vector distance. Projections have been used for many different classification problems, and most recently applied to image copy location [8]. The projections method refers to comparing the horizontal and vertical projections of the images (compute the row and columns sums), whereas the template method (image differencing) refers to comparing the images pixel by pixel based on spatial location. The ranking error is defined for our test as the average rank at which a copy of the input image was found in the database. Thus, the best possible score would be 1.0, which would mean that the copy in the database was always the best matching image. For the template and projection methods, both intensity and gradient were used as features. For all methods, the metric was the average absolute difference.
5
Experiments: KLT, Projections, and Template
In this section, we performed two experiments to compare the ranking error for three different methods, KLT/Fisher, projections, and template. The experiments are intended to duplicate real noise applications. In the print-scanner experiments, the original image is printed to plain paper using a laser printer and then scanned in. An example of the noise is shown in Figures 2.
Figure 2. Image 140 and print-scanner degraded copy
In the general image degradation experiments, the copies of the original were subjected to decades of general noise which includes scratches, writing, environmental exposure, etc. An example of the copies is shown in Figure 3. Note that these copies have different contrast and markings.
Figure 3. Two copies (c000409 and c000412) subjected to general noise.
5.1 Print-Scanner Noise In this experiment, 25 images were printed to plain paper, and then scanned in. Fisher’s method was used instead of the KLT since the print-scanner noise can be statistically modeled. Table 1 and Figure 4 show the distribution of the rank of the print-scanner experiments. Note that for the sake of visual clarity, we only graphed the top 3 methods. The column labeled Best refers to the number of times in which the copy in the database was in the first rank. The other columns show the distribution as the copy in the database was found in positions 2 - 6, 7 - 16, and 17 - 36. Column Worst shows the number of times that the database copy was not among the first 36. Table 1. Distribution of Position of Print-Scanner Degraded Image Method Template - intensity Template - gradient Projection - intensity Projection - gradient LBP Fisher - 10 features Fisher - 25 features
Best 1 17 1 17 17 8 22
25
Top 10 1 0 1 1 1 4 0
Top 20 0 0 1 1 1 3 0
Worst 22 6 21 4 3 2 2
Proj. Grad.
LBP
20
15 10 5 0 Best
Top 5 1 2 1 2 3 8 1
Fisher's - 25 feat.
Top 5
Top 10
Top 20
Worst
Figure 4. Chart of the rank distribution for the print-scanner degraded image In this experiment, the Fisher method with 25 features had higher performance relative to the other methods. 5.2 General Image Degradation Since the image degradation is general, the noise could not be modeled accurately. Thus the KLT was chosen as the optimal key method for this experiment. In Table 2, and Figure 5, the distribution of the rank error for the general degradation is shown. The projection and template methods using intensity as the feature class performed the worst for our database with average ranking errors of 14.33 and 14.42, respectively. When the gradient images were used for the projection and
template methods, the average ranking errors decreased to 5.92 and 5.5, respectively. The KLT with 10 features had the least average ranking error of 4.33. Table 2. Distribution of Position of General Degraded Image Method Template - intensity Template - gradient Projection - intensity Projection - gradient LBP KLT - 5 features KLT - 10 features 14 12 10 8 6 4 2 0
Best 12 8 12 14 6 8 8
Top 5 0 3 0 1 5 3 3
Top 10 2 5 1 2 1 0 2
Proj. Grad. LBP
Best
KLT - 10 feat.
Top 5
Top 10
Top 20 0 0 0 0 3 4 5
Worst 9 7 10 6 8 8 5
Top 20
Worst
Figure 5. Chart of the rank distribution for the general noise degradation Regarding computational efficiency, Table 3 shows the number of features used by each method Table 3. Number of Features Used by Each Method Method Template - intensity Template - gradient Projection - intensity Projection - gradient LBP KLT - 5 features KLT - 10 features Fisher - 10 features Fisher - 25 features
Number of Features 47250 47250 440 440 256 5 10 10 25
Overall, the methods which were based on the optimal key selection, KLT and Fisher’s approach, not only had greater accuracy than the other methods, but also required only 10% of the features as the LBP or Projection methods.
6
Conclusions
In this paper we applied the Karhunen-Loeve transform and the Fisher linear discriminant to a large portrait database. Regarding the general degradation (light, moisture, physical scratches, etc.) noise, the Fisher linear discriminant is not appropriate until an accurate noise model can be found. For images which have been printed and then scanned in, Fisher’s method and the projection method using the gradient had the least ranking error. In the case of the real noise from decades of exposure to different environments and scanner noise, the KLT had the least ranking error with respect to the projection and template methods. Overall, the optimal key methods performed significantly better than the texture, projection and template methods when the noise could be modeled accurately, and slightly better in the case of general noise. With respect to computational efficiency, it was shown that optimal key selection techniques are directly related to principal component methods. Key selection techniques require fewer features for similar accuracy, which results in less computation. In the experiments, the methods based on the optimal key selection techniques, namely, the KLT and Fisher’s approach were shown to require less than 10% of the features relative to the texture, projection and template methods, while still achieving equivalent or better accuracy.
WWW Demo Program The method described here can be evaluated directly via our WWW image database retrieval demo. See http://ind156b.wi.leidenuniv.nl:2000
Acknowledgments This research was supported by the Advanced School for Computing and Imaging, Delft, and the Netherlands Computer Science Research Foundation.
References 1.
2. 3. 4.
Faloutsos, C., R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, W. Equitz, “Efficient and Effective Querying by Image Content,” Journal of Intelligent Information Systems, pp. 231-262, 1994. Fisher, R.A., “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenetics (London) 7, pp. 700-725, 1936. Gudivada, V. N., and V. V. Raghavan, “Finding the Right Image, Content-Based Image Retrieval Systems,” Computer, IEEE Computer Society, pp. 18-62, Sept. 1995. Hotelling, H., “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of Educ. Psychol., vol. 24, pp. 417-441, 498-520, 1933.
5. 6.
7. 8.
Therrien, C. W., Decision Estimation and Classification, John Wiley & Sons, Inc., New York, 1989 Ojala, T., M. Pietikainen and D. Harwood, “A Comparative Study of Texture Measures with Classification Based on Feature Distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51-59, 1996 Wang, L. and D. C. He, “Texture Classification Using Texture Spectrum,” Pattern Recognition 23, pp. 905-910, 1990 Huijsmans, D. P. and M. S. Lew, “Efficient Content-Based Image Retrieval in Digital Picture Collections Using Projections: (Near)-Copy Location,” Proc. International Conference on Pattern Recognition, Vienna, Austria, August, pp. 104-108, 1996..