Comparison Between Three Commonly Used Search Engine Techniques

  • Uploaded by: Ashish Nadkar
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Comparison Between Three Commonly Used Search Engine Techniques as PDF for free.

More details

  • Words: 923
  • Pages: 9
COMPARISON OF THREE COMMONLY USED SEARCH ENGINE TECHNIQUES

NADKAR ASHISH ANANT

FULIA ISANKUMAR MANGALDAS

[email protected]

[email protected]

Contact No: 9892073460

Contact No: 9987866545

FR. CONCEICAO RODRIGUES COLLEGE OF ENGINEERING Fr. Agnel Ashram, Bandstand, Bandra (west), Mumbai – 400059 Phone: 022-26423841/42 Fax: 022-26516831

ABSTRACT Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. Almost a 10th click is for searching while browsing on internet, even gathering information for this TPP wouldn’t have been possible without intense use of search engines.

This TPP overview the comparison of commonly used search engines techniques followed by 3 search giants: Google, Yahoo & MSN. The backbone of all major search engines is the same involving three vital stages: crawling, indexing and searching. However the various search techniques adopted by different search engines differ in the way these 3 stages are implemented. This paper will unravel the mystery behind search. Further the TPP suggests some new features which would enhance the current searching & browsing experience.

1. Introduction: Modern web search engines are complex software systems using the technology that has evolved over the years to give relevant results to users. Although the exact details of search engines’ methods are commercially hidden (especially concerning results ranking), their mode of operation is broadly known.

2. Components of search engine: 2.1. Crawling: Crawling process involves identifying, downloading and storing in a database as many potentially useful web pages as possible, given constraints of time, bandwidth and storage. This data is later used for indexing and searching.

2.2 Link map & Indexing: Pages discovered by crawlers are fed into service that creates a link map of the pages .This data is stored in data structures that allow fast access to the data. Indexing is the process of extracting text from web pages, tokenizing it, and then creating an index structure (inverted index) that can be used to quickly find which pages contain a particular word.

2.3. Searching or result matching: Searching is the process of looking up words in an index to find documents where they appear .Because most searchers view only the first two pages of results, it is important to maximize the chance that a relevant URL is listed amongst the first 10.

3. Comparison of 3 search engine techniques: We’ll be comparing 3 major search engine techniques followed by: Google, Yahoo! & MSN Live Search on the basis of: • •

• • • •

Ranking algorithm , Hit Count Estimation (HCE), i.e. a number near the top of the results page estimating the total number of results available to the search engine. Multiple HCEs are sometimes used in research for comparisons. Page content, Crawling ability, Query processing , Link reputation & site age.

3.1. Ranking algorithm: Google determined relevancy primarily on PageRank algorithm. PageRank essentially says that a site that has more inbound links than their competitors is likely a better site, therefore should rank higher. Yahoo! Search’s Algorithm is not far from Google Algorithm but different at some points, Yahoo gives much interest in taking its web directory as part of its Ranking Algorithm. MSN’s Live Search has a poor relevancy algorithm. New sites that are generally untrusted in other systems can rank quickly in MSN Search. 3.2. Hit Count Estimation: The hit count estimate of Google, Yahoo! and MSN Live search correlate significantly, with Google and Live Search having a particularly high value but with Yahoo! correlating less with both Google and Live Search. The reason for the different Yahoo! results was that Yahoo! automatically corrected some apparent user errors. 3.3. Page Content: 1.

2.

Google heavily biases search results toward informational resources.

Yahoo! offers a paid inclusion program, so when Yahoo! Search users click on high ranked paid inclusion results in the organic search results Yahoo! Profits. Their results are more biased toward commerce. 3.

4. MSN places too much weight on page content. Their poor relevancy algorithms cause a heavy bias towards commercial results. 5.

3.4. Crawling ability: Google is most efficient at crawling ability than competing engines. Yahoo! is pretty good at crawling sites deeply so long as they have sufficient link popularity to get all their pages indexed. MSN is nowhere near as comprehensive as Yahoo! or Google at crawling deeply through large sites. 3.5. Query processing: Google is much better than Yahoo! or MSN at determining the true intent of a query. It does concept matching. Google's search results are biased toward informational websites. Yahoo! seems to be more about text matching when compared to Google. MSN might be a bit better than Yahoo! at processing queries for meaning instead of taking them quite so literally. 3.6. Link reputation & site age: Google is much better as being able to determine the difference between real, editorial citations and low quality, spammed, bought, or artificial links. Older sites are trusted more. In Yahoo! it’s still easy to manipulate using low to mid quality links and somewhat to aggressively focused anchor text. It places some weight on older sites. MSN is as good as other major search engines at telling the difference between real organic citations and low quality links. MSN ranks new sites higher due to link bursts.

4. Conclusion: Thus we have analyzed three commonly used search techniques adopted by the horizontal or general purpose search engines.

5. Our Viewpoints:

Domain Specific search options Search providing results for image queries Enhanced map searches Integration of different types of searches into a single search engine

Illustration:

Related Documents


More Documents from "mohan reddy"