Web Based Image Similarity Search

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Web Based Image Similarity Search as PDF for free.

More details

  • Words: 1,967
  • Pages: 5
Web based Image Similarity Search Vinay Menon, Uday Babbar and Ujjwal Dasgupta Delhi College of Engineering, Delhi Email :{[email protected], [email protected], [email protected]}

Abstract—This paper proposes an implementation for a web based image similarity search using a robust image matching algorithm and a user definable matching criterion. Matching is done using image signatures. Image signatures are generated by decomposing images using wavelet transforms. A database of image signatures is generated by using it in conjunction with a web crawler. Search results are found by calculating the Euclidian distance between the signatures of the subject and the database. Proposed architecture is time efficient, robust and uses a rotation, and scale and perspective invariant method. Keywords: Image matching, Similarity, Reverse image search, image signature

I.

INTRODUCTION

As internet gets highly multimedia intensive there is an increased need for tools to manipulate large collections of media and to make search for media more intuitive. Traditionally search engines use manual captions, text in the related webpage, size and dimension of images for image retrieval. This method is highly subjective and may lead to unrelated search results. The challenge is to bridge the gap between the physical characteristics of digital images (e.g. color, texture) that are used for comparison, and the semantic meaning of the images that are used by humans to query the database. A focus of significant research is an algorithmic process for determining the perceptual similarity of images. The comparison algorithm has to judge differences in images in a similar way as a human would. This is easier said as done, because of some special properties of the human eye and brain. What makes humans perceive two images as being similar? This is a difficult question for many reasons.

Two images can be perceived as similar because of the presence of similar objects (e.g. a ball, a girl, a Monalisa portrait) or because of a similar look (same colors, shapes, textures). Recognition of objects and hence content based image retrieval is extremely difficult. This algorithm therefore analyzes the image composition in terms of colors, shapes and textures. Another problem in implementation of a search engine is that comparing images in a large scale collection is a processing power and bandwidth intensive task. So an image signature /fingerprint based solution has to be used. This algorithm should extract features from an image and use them to calculate a signature. The signatures generation should be such that size, rotation, segmentation insensitive matching is possible. These signature /fingerprints generation in addition to allowing for these comparisons should be fast enough for a viable implementation. II.

IMPLEMENTATION

The implementation structure consists of three main blocks: the crawler, the signature generator and the similarity calculator. A. WEB CRAWLER The web crawler feeds the signature calculator with images from the internet which in turn populates the database. This is implemented in PHP. Code snippet is given below: class Crawler { protected $markup = ''; public function __construct($uri) { $this->markup = $this->getMarkup($uri);

} public function getMarkup($uri) { return file_get_contents($uri); } public function get($type) { $method = "_get_{$type}"; if (method_exists($this, $method)){ return call_user_method($method, $this); } } protected function _get_images() { if (!empty($this->markup)){ preg_match_all('/]+)\/>/i', $this>markup, $images); return !empty($images[1]) ? $images[1] : FALSE; } } protected function _get_links() { if (!empty($this->markup)){ preg_match_all('/]+)\>(.*?)\<\/a\>/i', $this>markup, $links); return !empty($links[1]) ? $links[1] : FALSE; } } } B.

SIGNATURE (FEATURE VECTOR) GENERATOR

This is the crux of the implementation where all of the image processing is done. The fingerprint has to be calculated only once for each image and then can be stored in a database for fast searching and comparing. Initially before the fingerprint is calculated, the image is resized/scaled into a standard size. Since color comparisons are difficult to calculate in most color spaces especially in the popularly used RGB color space, the image is transformed into the CIE Luv color space. CIE Luv has components L*, u* and v*. L* component defines the luminance, and u*, v* define chrominance. CIE Luv is very used widely in color differences, especially with additive colors. The CIE Luv color space is defined from CIE XYZ. CIE XYZ -> CIE Lab

{ L* = 116*((Y/Yn)^(1/3)) with Y/Yn>0.008856 { L* = 903.3*Y/Yn with Y/Yn<=0.008856 u* = 13*(L*)*(u'-u'n) v* = 13*(L*)*(v'-v'n) Where u'=4*X/(X+15*Y*+3*Z) and v'=9*Y/(X+15*Y+3*Z) and u'n and v'n have the same definitions for u' and v' but applied to the white point reference. The CIE XYZ is itself defined from the RGB using the following transformation: |X| |0.430574 0.341550 0.178325| |Red | |Y| = |0.222015 0.706655 0.071330| * |Green| |Z| |0.020183 0.129553 0.939180| |Blue |

The finger print or the signature vector is composed of three parts: a grayscale DCT part, a color DCT part and an EDH part. The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT transforms a signal from the spatial domain into the frequency domain. A signal in the frequency domain contains the same information as that in the spatial domain. The order of values obtained by applying the DCT is coincidentally from lowest to highest frequency. This feature and the psychological observation that the human eye is less sensitive to recognizing the higher-order frequencies leads to the possibility of compressing a spatial signal by transforming it to the frequency domain and dropping high-order values and keeping low-order ones.

The general equation for a 2D (N by M image) DCT is defined by the following equation:

Where

The basic operation of the DCT is as follows: • • • •





The input image is N by M; f(i,j) is the intensity of the pixel in row i and column j; F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix. For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT. The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level; 8 bit pixels have levels from 0 to 255.

The output array of DCT coefficients contains integers; these can range from -1024 to 1023.

Then a 2D-DCT of the two color components are calculated and used for the color part of the fingerprint. A two-dimensional DCT-II of an image or a matrix is simply the one-dimensional DCT-II, from above, performed along the rows and then along the columns (or vice versa). That is, the 2d DCT-II is given by the formula (omitting normalization and other scale factors):

Here only the first three of each are taken, since the human eye is much more sensitive to luminosity than to color. This part of the fingerprint represents the color composition of the image with reduced spatial resolution compared to the gray scale part. On the luminosity channel of the image an EDH (Edge Direction Histogram) is calculated using a Sobel kernel. The Sobel kernel relies on central differences, but gives greater weight to the central pixels when averaging:

The 64 (8 x 8) DCT basis functions are illustrated

A 2D-DCT is calculated over the L-channel (Luminosity) of the image. The first coefficient stands for the DC value, or average luminosity of the image. The next coefficients represent the higher order values with increasing frequency. A number of these coefficients (the first 10 coefficients) are taken and normalized for the grayscale fingerprint part. This part of the fingerprints represents the basic composition of the image.

This is equivalent to first blurring the image using a 3 × 3 approximation to the Gaussian and then calculating first derivatives. This is because convolutions (and derivatives) are commutative and associative. These kernels are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one kernel for each of the two perpendicular orientations. The kernels can be applied separately to the input image, to produce separate measurements of the gradient component in each orientation (call these Gx and Gy). These can then be combined together to find the absolute magnitude of the gradient at each point and

the orientation of that gradient. The gradient magnitude is given by:

The smaller the distance value the more similar are the images.

The histogram consists of eight equal bins N, NE, E, SE, S, SW, W and NW. This part of the fingerprint represents the "flow directions" of the image. Image shows result after applying Sobel operator:

C. SIMILARITY CHECKER The similarity checker calculates nearness of matching between two images by calculating a distance value. Identical images yield the same fingerprint, similar images yield fingerprints similar to each other (according to a distance function) and unequal images yield totally different fingerprints. The distance function compares the fingerprints by weighting the coefficients and calculating the Euclidean Distance. The user can input various comparison considerations for example, if the general brightness of the image should not be considered when comparing, the DC components weight can simply be set to zero. For less detail and a more tolerant search the higher coefficients can be made smaller or set to zero. For grayscale searching the color components weights are simply set to zero. For rotation or mirror invariant searching the components are shuffled accordingly and compared again. This weight vector can be used for a lot of tuning. The Euclidean distance between points p and q is the length of the line segment . In Cartesian coordinates, if p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, then the distance from p to q is given by:

III.

EVALUATION

Image shows some results from the implementation. There exists no clearly defined benchmark for assessing a proposed similarity measure on images. Indeed, human notions of similarity may vary, and different definitions are appropriate for different tasks. The approach specified here has specific strengths and weaknesses. Since this is a web based implementation it entails large scale image matching and speed is an

important criterion Scaling this implementation for a full fledged search engine would require more improvements in the efficiency of the fingerprint calculation which is central to this application. A more efficient retrieval system can be constructed by implementing a Hierarchical Image classification using certain tagged images and extrapolating the groups by using this application.

V.

REFERENCES

[1] Percentile Blobs for Image Similarity by Nicholas R. Howe, Cornell, University Department of Computer Science, Ithaca, NY 14853 [2] Image Similarity: A Genetic Algorithm Based Approach by R. C. Joshi, and Shashikala Tapaswi [3]Color spaces FAQ http://www.ilkeratalay.com/colorspacesfaq.php

[4]Image Similarity: Thomas Moser

IV.

CONCLUSION

Here we have presented search engine capable of finding similar images using an algorithm for measuring perceptual similarity. In this system a web based image crawler is initially used in conjunction with the signature generator to create a database of image fingerprints. Fingerprints are constructed using processes that are fast and less CPU intensive. During the search the fingerprint of the subject image is calculated and compared with that of the database. Results of similarity were found by calculating the Euclidean distances. This comparison is flexible i.e. if desired; images can be compared independent of rotation or vertical and horizontal mirroring as well as grayscale only, depending on which difference function is taken.

[5]Discrete Cosine Transform: http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html [6] Hierarchical Image Classification, Stefan Kuthan, Allan Hanbury, TU Vienna [7] Edge Detection, Bryan S. Morse, Brigham Young University [8] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee,W. Niblack, and D. Petkovic. Automatic and semi-automatic methods for image annotation and retrieval in QBIC. In W. Niblack and R. C. Jain, editors, Storage and Retrieval for Image and Video Databases III. SPIE – The International Society for Optical Engineering, 1995. [9] J. Smith, “Integrated Spatial and Feature Image Systems: Retrieval, Compression and Analysis,” PhD thesis, Columbia University, USA, February, 1997.

Related Documents

Similarity
November 2019 13
Similarity
August 2019 15