Mixed Query Image Retrieval System

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Mixed Query Image Retrieval System as PDF for free.

More details

  • Words: 2,832
  • Pages: 6
Mixed Query Image Retrieval System Bingjing Cai☆, Chris Zheng※, Sen Yang*, Jeffery Z. J. Zheng﹠ ☆

School of Software, Yunnan University, Kunming, Yunnan, P.R. China, 650091 Email: [email protected]



Conjugate Systems Pty Ltd, 45 Greenways Road, Glen Waverley, Victoria 3150, Australia Email: [email protected]

*

Department of CS&T, School of Information, Yunnan University, Kunming, Yunnan, P.R.China, 650091 Email: [email protected]



School of Software, Yunnan University, Kunming, Yunnan, P.R.China, 650091 Email: [email protected]

Abstract - This paper discusses a proposed mixed query

by keyword or by content. The first method was introduced

image and text retrieval database, combining text-based search

with the development of text-based search technology; the

technology and image content-based search technology for more

second method introduced with the content-based search

accurate results. The target application will be aimed at large,

technology.

categorized image sets, such as large image collections of libraries

Keyword-based image retrieval systems are based on the

or patent offices. It can be shown that this mixed query image

traditional text-based retrieval technology. The concept is

search engine, can achieve better efficiency and higher quality

simple. The images are tagged with keywords and managed by

than traditional text-based queries and content-based systems.

the system. Users search for images via a textbox and images

This paper will discuss the designing principle of the mixed query

tagged with the keyword will be displayed. The search is easy

image search engine, outline the architecture and show results

but the tagging becomes laborious. Keywords have to be

from an initial prototype database.

manually tagged to each picture by the user. For most people, this system of organizing is much too time consuming. If

Index Terms - content-based image search technology,

pictures are not tagged, then they cannot be organized and

keyword-based image search technology, mixed query image

data is lost. Furthermore, text labels do not represent the

retrieval system

picture itself, only a very abstract description which differs I.

A.

INTRODUCTION

Two Phrases of Development of Image Retrieval System Technology In the sudden explosion into the information age, all types

of data are being produced at an enormous rate and are still increasing rapidly. These data includes large sets of image, sound, video and other multimedia data being generated by cheap digital capture and storage facilities. With increase in demand for processing and categorized this rich multimedia information, large amounts of data are being created without a way to classify or search through them. It is with this need in mind that image search engines have come to be one of the hottest and fastest growing areas of research and application development. Currently, there exists two ways of searching for an image –

between people and cultures. There is no standard for tagging available and the only dependable tagging comes from ‘social tagging’ – an example being Flickr [1]. However, tag still cannot represent the content within and the search results will be unsuitable in many cases. Content-base search uses queries in the form of image objects

rather

than

text

tags. The

principle

behind

content-based retrieval technology is to extract either meaning or measurement from within the picture itself. Image properties such as color, texture, shape and other qualities can be expressed as a given measurement, able to be processed by the computer. Once the picture is able to be quantitatively and qualitatively defined, it thus can be managed. As a consequence, the real attraction for content-based search is that it promises the possibility for automation of image

classification and search.

systems, whether it is keyword-based search technique or

B.

image content-based search technique, cannot always obtain

Research and development of image retrieval techniques With the emergence of large scale image collections,

satisfactory results and meet users’ requirement. Effective and

content-based image retrieval was proposed. Since then, many

efficient system architecture for the image retrieval system is

techniques in this research direction have been developed and

needed, combining with both text-based search and image

many image retrieval systems, both research and commercial,

content-based search.

have been built. In the early 90s, IBM developed its first

Therefore, in this paper, we discuss a method of improving

content-based image retrieval, QBIC, standing for Query By

image retrieval for large image databases. It combines

Image Content system [2]. It is the first commercial

keyword-based search and image content-based search

content-based Image Retrieval system and its system

technologies. We name it as Mixed Query Image Retrieval

framework and techniques have profound effects on later

system. We present the system architecture, and illustrate our

Image Retrieval system. Photobook [3] is a set of interactive

system prototype. The experiment demonstrated that, the

tools for browsing and searching images developed at MIT

degree of accuracy, effect and efficiency of this mixed query

Media Lab. Photobook consists of three sub-books, from

image retrieval system are greatly enhanced. II.

which shape, texture, and face features are extracted

THE DESIGN PRICIPLE OF MIXED QUERY IMAGE

respectively. Users can then query based on corresponding

RETRIEVAL SYSTEM

features in each of the three sub-books. There are other

Recent years, there has been a rapid increase in the size of

content-based image retrieval systems such as the ViualSEEK

digital image collections. Both military and civilian equipment

system [4] developed at Columbia University, Like.com

generates gigabytes of images every day. As for such large

website system [5] developed by the Riya team in the United

size image databases, although image content-based search is

States and so forth.

faster and more accurate than text-based search, the retrieval

Like.com is one of the best true visual search engines, the

results are not always ideal. It is very possible that, for

contents of photos are used to search and retrieve similar items.

example, when we use the image content-based search engine

Its launch focuses on handbags, jewelry, shoes, and watches,

to seek a picture of a red bus, the return picture may be of a

allowing users to search and purchase items from thousands of

red house. As it is known to us, the image content-based

leading and boutique brands. It has classified image databases,

search engine extracts visual features from images, such as

such image databases of handbags and jewelry separately, and

color, texture and shape, according to different feature

has evident effect of content-based image retrieval.

extraction algorithms. It does not have human perception and

Currently, however, the popular model for image retrieval

the ability to distinguish and identify true meanings of images.

of most systems has been based on text-based search

The color and shape, as well as the texture of the house may

technology so far, such as Google, AltaVista, and Yahoo and

be very similar to the red bus. But it is not the result we expect.

so forth. Since the low-level features of images (color, shape,

But if in a relatively small, categorized image database, image

texture, etc) do not represent the image semantic information,

content-based retrieval engine definitely works better.

which means they do not tell what the image is, the results of

The design principle of the mixed query image retrieval

the content-based image retrieval are not always satisfying.

system is that first we divide the original large image database

C.

into a number of relatively small image databases based on

Existing problem and the objective of this paper Currently, most people focus on the improvement and

categories and establish keyword-indexing for each category

optimization a certain kind of content-based image retrieval

of the now divided image databases. We then employ image

techniques. Improvement of image content-based search

content-based search engine within each of the classified

techniques is definitely of great significance to obtain accurate

image database.

retrieval results in large size image databases. However, only

A.

one kind of image search techniques applied to image retrieval

Combination of Keyword-based Search and Image Content-based Search

There are two layers of combination of text-based and image content-based search. Firstly, according to categories, divide the original large image database into small image database

and

establish

keyword-on-category

indexing

associated with small databases. Secondly, in every classified image databases, establish both image content-based indexing and keyword-based indexing. 1)

Classify image database. As for large image database, we can refer to image content

information or descriptive information of images to classify the image database based on category. A number of small image sub-databases are generated and each of them belongs to a certain category, for instance, shoes or handbags. Take the image database of the patent office for example; suppose that the number of patent pictures is about 5,000,000 and the number of the patent categories is more than 40. If we divide

Fig.2. An example of classified image databases structure

the image database according to the standard international

Then, according to these categories, we can employ

patent category, a number of image sub-databases are

traditional text-based search indexing technique to establish

generated, which are classified and contain about 100,000 to

text index for the cluster of image databases. Thus, we could

200,000 images each. In smaller image database, both the

utilize key words to search for a certain kind of image

image content-based and text-bases search engines can obtain

collections.

more accurate retrieval results and work faster. And efficiency

2)

of the whole system can be enhanced and higher quality of search results can also be obtained.

Establishing Content-based Indexing After classifying the original large size image databases into

relative small image databases on different categories, we

A distributed cluster of sub-databases are generated after

could utilize image content-based search. Since we have

classifying the original large image databases. We can see it as

divided large size image database into small, categorized

shown in figure 1.

sub-databases, content-based search engine is more accurate

After classifying, the structure of image databases is similar

and faster. Feature (content) extraction is the basis of

to the category structure of libraries’ databases or patent office

content-based image retrieval. In a broad sense, features may

IPC categories structure [5]. Figure 2 shows an example of

include both text-based features (keywords, annotations, etc.)

such kind of categorized structure.

and visual features (color, texture, shape, faces, etc.). Within the visual feature scope, the features can be further classified as general features and domain-specific features. The former include color, texture and shape features while the latter is application dependent and may include, for example, human faces and fingerprints. Feature extraction algorithms of color-based, texture-based and shape-based image retrieval are available and in hot discussions. Here, we propose mixed content-based

image

retrieval,

combining

color-based,

texture-based and shape-based search. In our actual system, Fig.1. Classify the large image database into small distributed image

we utilize a mixed content-based image retrieval engine. Since

databases on categories

we aim at introducing the method and architecture of the

mixed query image retrieval system in this paper, we do not

receiving query request from management and control module,

intend to discuss content-based image retrieval techniques,

the system searches in the image database according to the

including multi-dimensional indexing techniques, in detail.

indexing tables and then return result to the management and

III.

ARCHITECTURE DESIGN OF MIXED QUERY IMAGE RETRIEVAL SYSTEM

control module. D.

There are four main parts in the mixed query image retrieval system. They are the user query interface, management

and

control

module,

text-based

and

content-based indexing database, and text and image

The database contains two kinds of data, text and image. The text database keep descriptive information related to images. The image database contains real pictures. The system architecture is shown in figure 3.

databases. A.

Database

IV.

User Query Interface

INTERACTIVE MODEL OF MIXED QUERY IMAGE RETRIEVAL SYSTEM

The user query interface should be friendly and flexible. To communicate with the user in a friendly manner, the query

Figure 4 shows the processing flow in the mixed query

interface is graphics-based. The interface collects the

image retrieval system. Several main parts in the process are

information needed from the users and displays the retrieval

explained briefly below.

results back to the users. Users can input keywords based on

¾

From information that the user input, extract keywords

category to assess to a certain kind of image database, and

based on category and according to these keywords,

then use the digital image characteristics, a picture or

select a corresponding clusters of indexing databases.

keywords of pictures to search in the image database. After

Use either content-based search or text-based search in

processing, the system will return results via the user query

the selected image databases.

interface. B.

Management and Control Module The module is the linchpin of the system, controlling and

managing the performance of the system. It processes user request, analyzes which type of the user request, text or image, process user request, and then return results to users. C.

Text-based indexing and image content-based indexing

Fig.3. System architecture of mixed query image retrieval system

Module There are two parts, that is, keywords based on category indexing and image content-based indexing. According to categories, a large image database is divided into

relatively

small

classified

image

databases.

Keyword-on-category indexing is established, associated with classified image databases. After processing text information in the management and control module, system automatically give control to the console of the corresponding image databases, according to the keyword-on-category indexing table. Image content-based indexing database keep image information and established image indexing tables. After

Fig.4. Interactive Model of Mixed query image retrieval system

¾

Gather candidate results from every selected indexing database, optimal resort, and select the most optimal set of results.

¾

According to the optimal set of image indexes, search in the image databases, and access to the actual image pictures.

¾

Organize the result pages, and return them to users. In the current design, process iii, process ii and process iv

occupy most of the internal processing time for the entire search process. In test queries, the processing time of process ii is less than 0.1 second. The processing time of process iii is less than 1 second and the time of process iv is also less than 1 second. The whole process can be completed within 2 seconds. In actual operation, especially in the real internet environment, the network transmission speed of process v can be improved by small size image output. V.

SYSTEM PROTOTYPE

VI.

CONCLUSION

This paper proposes a method to solve the problem of how to enhance the efficiency of image retrieval system applied to large scale image collections. Categorized segmentation of a large image database is the core concept. In this paper, we discuss a mixed query image retrieval system, combined both image content-based and text-bases search technologies, and present the system architecture that we design for this combined system and the system prototype. Experiments confirm that the mixed query image retrieval system can manage the large image database better and achieve high efficiency and better quality. Virtually, there are still many open issues needed to be solved before image retrieval systems can be put into practice. To achieve faster retrieval speed and make the image retrieval system scalable to large size image collections, multi-dimensional indexing technique is of great importance to the image retrieval system. In this

According to the system architecture design in the previous

paper we only discuss a kind of effective architecture of such

section, we have developed the prototype of mixed query

systems. In conclusion, integration of multiple technique and

image retrieval system. As shown below:

information sources of humans and computers will lead to a more successful image retrieval system. ACKNOWLEDGMENT I would like to thank Tony Chen for editing my English writing.

Fig.5a. System prototype

Fig.5b. System prototype

REFERENCES [1] Flickr web site, available at http://www.flickr.com

[10] [Coad 1991] P Coad and E Yourdon. “Object-Oriented Analysis”.

[2] IBM QBIC (Query By Image Content) home page, available at

Prentice-Hall, 1991.

http://wwwqbic.almaden.ibm.com/

[11] [Dijikstra 1979] E. Dijikstra. “Programming Considered as a Human

[3] Photobook system of MIT University home page, available at

Activity.” Classics in Software Engineering. Yourdon Press, 1979.

http://vismod.media.mit.edu/vismod/demos/photobook/index.html

[12] [Myers 1982] G. Myers. “Advances in Computer Architecture”. John

[4] VisualSEEK system of The DVMM Lab at Columbia University, available

Wiley and Sons, 1982.

at

[13] [Tsai 1988] J. Tsai and J. Ridge, “Intelligent Support for Specifications

http://www.ee.columbia.edu/ln/dvmm/researchProjects/MultimediaIndexing/

Transformation”. IEEE Software Vol. 5(6), p.34.

VisualSEEk/VisualSEEk.htm

[14] [Yourdon 1979] E. Yourdon and L. Constantine, “Structured Design”.

[5] LIKE.COM website, available at http://www.like.com

Prentice-Hall, Englewood-Cliffs, 1979.

[6] China Intelligence and Patent Website, available at http://www.cnipr.com/

[15] The google image search page, available at http://images.google.com

[7] Jeffrey Z. J. Zheng, Chris H. Zheng and Tosiyasu L. Kunii, “Concept Cell

[16] Thomas Deselasers, Tobias Weyand, Daniel Keysers, Wolfgang

Model for Knowledge Representation”, International Journal of Information

Ma-cherey and Hermann Ney, “FIRE in ImageCLEF 2005: Combining

Acquisition, Vol 1, No. 2, 149-168(2004)

Content-based Image Retrieval with Textual Information Retrieval”, In

[8 [Beynon-Davies 1993] P. Beynon-Davis. “Information System

Workshop of the Cross--Language Evaluation Forum (CLEF 2005), lecture

Development”. The Macmillan Press 1993.

Notes in Computer Science, volume 4022, pages 652-661, Vienna, Austria,

[9] [Booch 1990] G. Booch. “Object-oriented Analysis and Design”.

September 2005

Addison Wesley. 1990.

Related Documents

Image Retrieval
November 2019 24
Robust Face Image Retrieval
November 2019 12
Fuzzy Image Retrieval
November 2019 21