Search Engines

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Search Engines as PDF for free.

More details

  • Words: 6,751
  • Pages: 135
Search Engines

Sudarsun Santhiappan.,

M.Tech.,

Director – R & D, Burning Glass Technologies

Today's Coverage Introduction Types of Search Engines Components of a Search Engine Semantics and Relevancy Search Engine Optimization

Copyleft (ɔ) 2009 Sudarsun Santhiappan

2

What is a Search Engine ? What is a Search ? Why do we need a Search Engine ? What are we searching against ? How good is a Search Engine ? What is Search on Search (Meta SE) ? Compared Search Engines Side-by-Side ? How are Images and Videos searched ? Apart from Web Search, what else ? Copyleft (ɔ) 2009 Sudarsun Santhiappan

3

Introduction Web Search Engine is a software program that searches the Internet (bunch of websites) based on the words that you designate as search terms (query words). Search engines look through their own databases of information in order to find what it is that you are looking for. Web Search Engines are a good example for massively sized Information Retrieval Systems. Tried “Similar pages” Link in Google result set ? Copyleft (ɔ) 2009 Sudarsun Santhiappan

4

Dictionary Definitions Search COMPUTING (transitive verb) to examine a computer file, disk, database, or network for particular information

Engine something that supplies the driving force or energy to a movement, system, or trend

Search Engine a computer program that searches for particular keywords and returns a list of documents in which they were found, especially a commercial service that scans documents on the Internet

Copyleft (ɔ) 2009 Sudarsun Santhiappan

5

About definition of search engines oh well … search engines do not search only for keywords, some search for other stuff as well and they are really not “engines” in the classical sense but then mouse is not a “mouse”

Copyleft (ɔ) 2009 Sudarsun Santhiappan

6

use of search engines … among others

Copyleft (ɔ) 2009 Sudarsun Santhiappan

7

Types of Search Engines Text Search Engines General: AltaVista, AskJeeves, Bing, Google Specialized: Google Scholar, Scirus, Citeseer

Intranet vs Internet Search Engines Image Search Engines How can we search on the Image content ?

Video Search Engines Image Search with Time dimension !! Copyleft (ɔ) 2009 Sudarsun Santhiappan

8

Types of Search Engine Crawler Powered Indexes Guruji.com, Google.com

Human Powered Indexes www.dmoz.org

Hybrid Models Submitted URLs to a search engine ?

Semantic Indexes Hakia.com,

Copyleft (ɔ) 2009 Sudarsun Santhiappan

9

Have you tried Hakia ? What is Semantic Search ? How's it different from Keyword Search? What is categorized search ? Side-by-Side comparison with Google!! Have you compared Bing with Google ?

Copyleft (ɔ) 2009 Sudarsun Santhiappan

10

Copyleft (ɔ) 2009 Sudarsun Santhiappan

11

Copyleft (ɔ) 2009 Sudarsun Santhiappan

12

Directories www.dmoz.org Website classified into a Taxonomy Website are categorically arranged Searching vs Navigation Instead of Query, you Click and navigate Accurate search always! (if data is available) CopyleftManually (ɔ) 2009 Sudarsun Santhiappan Problem: Mostly created

13

Copyleft (ɔ) 2009 Sudarsun Santhiappan

14

Copyleft (ɔ) 2009 Sudarsun Santhiappan

15

How does a Search Engine work ?

Copyleft (ɔ) 2009 Sudarsun Santhiappan

16

How Search Engines Work (Sherman 2003) Crawler U RL1

Indexer

The Web U RL3

Search Engine Database

U RL2

U RL4

Eggs? Eggs. Copyleft (ɔ) 2009 Sudarsun Santhiappan

Al l Abou t Eggs - 90% Eggo - r81% You Eggs EgoBrows by40% er Huh? 10% S. I. - Am

17

how do search engines work? elaboration crawlers, spiders: go out to find content in various ways go through the web looking for new & changed sites periodic, not for each query no search engine works in real time

some search engines do it for themselves, others not buy content from companies such as Inktomi

for a number of reasons crawlers do not cover all of the web – just a fraction Copyleft (ɔ) 2009 Sudarsun Santhiappan what is not covered is “invisible web” ?

18

Elaboration …

organizing content: labeling, arranging indexing for searching – automatic keywords and other fields arranging by URL popularity - PageRank as Google

classifying as directory mostly human handpicked & classified

as a result of different organization we have basically two kinds of search engines: search – input is a query that is searched & displayed directory – classified content – a class is displayed and fused: directories have search capabilities & vice versa Copyleft (ɔ) 2009 Sudarsun Santhiappan

19

Elaboration (cont.) databases, caches: storing content humongous files usually distributed over many computers

query processor: searching, retrieval, display takes your query as input engines have differing rules on how they are handled

displays ranked output some engines also cluster output and provide visualization some engines provide categorically structured results

at the other end is your browser Copyleft (ɔ) 2009 Sudarsun Santhiappan

20

Similarities & Differences All search engines have these basic parts in common BUT the actual processes – methods how they do it – are based on various algorithms and they significantly differ most are proprietary (patented) with details kept mostly secret (or protected) but based on well known principles from information retrieval or classification to some extent Google is an exception – they published their method Copyleft (ɔ) 2009 Sudarsun Santhiappan

21

Google Search In the beginning it ran on Stanford computers Basic approach has been described in their famous paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine” well written, simple language, has their pictures in acknowledgement they cite the support by NSF’s Digital Library Initiative i.e. initially, Google came out of government sponsored research describe their method PageRank - based on ranking hyperlinks as in citation indexing “We chose our system name, Google, because it is a common spelling of googol, or ten on hundredth power” Copyleft (ɔ) 2009 Sudarsun Santhiappan

22

coverage differences no engine covers more than a fraction of WWW estimates: none more than 16% hard (even impossible) to discern & compare coverage, but they differ substantially in what they cover

in addition: many national search engines

own coverage, orientation, governance many specialized or domain search engines

own coverage geared to subject of interest many comprehensive sources independent of search engines

some have compilations of evaluated web sources Copyleft (ɔ) 2009 Sudarsun Santhiappan

23

searching differences substantial differences among search engines on searching, retrieval display need to know how they work & differ in respect to defaults in searching a query searching of phrases, case sensitivity, categories searching of different fields, formats, types of resources advance search capabilities and features possibilities for refinement, using relevance feedback display options personalization options Copyleft (ɔ) 2009 Sudarsun Santhiappan

24

Copyleft (ɔ) 2009 Sudarsun Santhiappan

25

Copyleft (ɔ) 2009 Sudarsun Santhiappan

26

Limitations every search engine has limitation as to Coverage: meta engines just follow coverage limitations & have more of their own search capabilities finding quality information

some have compromised search with economics becoming little more than advertisers

but search engines are also many times victims of spamdexing affecting what is included and how ranked Copyleft (ɔ) 2009 Sudarsun Santhiappan

27

Spamming a search engine use of techniques that push rankings higher than they belong is also called spamdexing methods typically include textual as well as linkbased techniques like e-mail spam, search engine spam is a form of adversarial information retrieval the conflicting goals of accurate results of search providers & high positioning by content page rank Copyleft (ɔ) 2009 Sudarsun Santhiappan

28

Meta Search Engines Search on Search

Copyleft (ɔ) 2009 Sudarsun Santhiappan

29

Meta search engines meta engines search multiple engines getting combined results from a variety of engines

do not have their own databases but have their own business models affecting results

a number of techniques used interesting ones: clustering, statistical analysis

Copyleft (ɔ) 2009 Sudarsun Santhiappan

30

Some Meta engines - with organized results Dogpile : results from a number of leading search engines; gives source, so overlap can be compared; (has also a (bad) joke of the day) Surfwax : gives statistics and text sources & linking to sources; for some terms gives related terms to focus Teoma : results with suggestions for narrowing; links resources derived; originated at Rutgers Turbo10 : provides in clusters; 31 Copyleft (ɔ) 2009 results Sudarsun Santhiappan engines searched can be edited

Copyleft (ɔ) 2009 Sudarsun Santhiappan

32

Copyleft (ɔ) 2009 Sudarsun Santhiappan

33

Some Meta Engines (cont.) Large directory Complete Planet directory of over 70,000 databases & specialty engines

Results with graphical displays Vivisimo Webbrain Kartoo

clusters results; innovative results in tree structure – fun to use

results in display by topics of query

Copyleft (ɔ) 2009 Sudarsun Santhiappan

34

Domain Specific Search Engines

Copyleft (ɔ) 2009 Sudarsun Santhiappan

35

Domain Search Engines & Catalogs cover specific subjects & topics important tool for subject searches particularly for subject specialist valued by professional searchers

selection mostly hand-picked rather than by crawlers, following inclusion criteria often not readily discernable but content more trustworthy Copyleft (ɔ) 2009 Sudarsun Santhiappan

36

Domain Search Engines … Open Directory Project large edited catalog of the web – global, run by volunteers

BUBL LINK selected Internet resources covering all academic subject areas; organized by Dewey Decimal System – from UK

Profusion search in categories for resources & search engines Resource Discovery Network – UK “UK's free national gateway to Internet resources for the learning, teaching Copyleft (ɔ) 2009 Sudarsun Santhiappan 37 and research community”

Domain Engines … sample Think Quest – Oracle Education Foundation education resources, programs; web sites created by students

All Music Guide resource about musicians, albums, and songs

Internet Movie Database treasure trove of American and British movies

Genealogy links and surname search engines well.. that is getting really specialized (and popular)

Daypop searches the “living web” “The living web is composed of sites that update on a daily basis: newspapers, online magazines, and weblogs” Copyleft (ɔ) 2009 Sudarsun Santhiappan

38

Science, scholarship engines … sample Psychcrawler - Amer Psychological Association web index for psychology

Entrez PubMed – Nat Library of Medicine biomedical literature from MEDLINE & health journals

CiteSeer - NEC Research Center scientific literature, citations index; strong in computer science

Scholar Google searches for scholarly articles & resources

Infomine scholarly internet research collections

Scirus scientific information in journals & on the web

Copyleft (ɔ) 2009 Sudarsun Santhiappan

39

Science, scholarship engines … sample commercial access an addition to freely accessible engines many provide search free but access to full text paid by subscription or per item RUL provides access to these & many more: ScienceDirect

Elsevier: “world's largest electronic collection of science,

technology and medicine full text and bibliographic information”

ACM Portal

Association for Computing Machinery: access to ACM Digital Library & Guide to Computing

Copyleft (ɔ) 2009 Sudarsun Santhiappan

40

Search Engine Internals

Copyleft (ɔ) 2009 Sudarsun Santhiappan

41

Search Engine Internals Crawlers Indexers Searching Semantics Ranking

Copyleft (ɔ) 2009 Sudarsun Santhiappan

42

Standard Web Search Engine Architecture crawl the web

Check for duplicates, store the documents

DocIds

create an inverted index

user query Search engine Show results To user server s Santhiappan Copyleft (ɔ) 2009 Sudarsun

Inverte d index 43

Typical Search Engine

Copyleft (ɔ) 2009 Sudarsun Santhiappan

44

Copyleft (ɔ) 2009 Sudarsun Santhiappan

45

Copyleft (ɔ) 2009 Sudarsun Santhiappan

46

Crawlers What is Crawling ? How does Crawling happen ? Have you tried “wget -r ” in Linux ? Have you tried “DAP” to download entire site? Page Walk Spidering & Crawlbots Copyleft (ɔ) 2009 Sudarsun Santhiappan

47

Copyleft (ɔ) 2009 Sudarsun Santhiappan

48

Copyleft (ɔ) 2009 Sudarsun Santhiappan

49

Spidering the Web Replicating the Spider's behavior of building the Internet (web) by adding spirals (sites) But, can the web be fully crawled ? By the time, one round of indexing is over, the page might have changed already! That's why we have cached page link in the search result! Copyleft (ɔ) 2009 Sudarsun Santhiappan

50

Copyleft (ɔ) 2009 Sudarsun Santhiappan

51

Crawler Bots How to make your website Crawlable ? White-listing and Black-listing! Meta Tags to control the Bots Can HTTPS pages be crawled ? Does Sessions maintained while crawling ? Can dynamic pages be crawled ? URL normalization cool.com?page=2 [crawler unfriendly] Copyleft (ɔ) 2009 Sudarsun Santhiappan cool.com/page/2 [norm'd and crawler friendly] 52

How to control Robots ? <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW"> <TITLE>... Index: This tell the spider/bot that it’s OK to index this page Noindex: Spider/bot see this and don’t index any of the content on this page. Follow: This let the spider/bot know that it’s OK to travel down links found on this page. Nofollow: It tells the spider/bot not to follow any of the links on this page.

Copyleft (ɔ) 2009 Sudarsun Santhiappan

53

Crawling – Process Flow

Copyleft (ɔ) 2009 Sudarsun Santhiappan

54

Data Structures Tree primarily while Crawling Both Depth-First-Search and Breadth-FirstSearch are used Every page that the crawler visits shall be added as a node to the Tree Fan-out information is represented as Children for a node (page).

Copyleft (ɔ) 2009 Sudarsun Santhiappan

55

Inverted Indexes the IR Way

Copyleft (ɔ) 2009 Sudarsun Santhiappan

56

How Inverted Files Are Created Periodically rebuilt, static otherwise. Documents are parsed to extract tokens. These are saved with the Document Doc 1 ID. Doc 2 Now is the time for all good men to come to the aid of their country

It was a dark and stormy night in the country manor. The time Copyleft (ɔ) 2009 Sudarsun Santhiappan was past midnight

Term now is the time for all good men to come to the aid of their country it was a dark and stormy night in the country manor the time was past midnight

Doc # 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 57 2 2 2

Term Doc # now 1 is 1 the 1 time 1 for 1 all 1 good 1 men 1 to 1 come 1 to 1 the 1 aid 1 of 1 their 1 country 1 it 2 was 2 a 2 dark 2 and 2 stormy 2 night 2 in 2 the 2 country 2 manor 2 the 2 time 2 was 2 Copyleft (ɔ) 2009past Sudarsun Santhiappan2 midnight 2

How Inverted Files are Created After all documents have been parsed the inverted file is sorted alphabetically.

Term a aid all and come country country dark for good in is it manor men midnight night now of past stormy the the the the their time time to to was was

Doc #

58

2 1 1 2 1 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 2 1 1 2 2 1 1 2 1 1 2 2

Term Doc # a 2 aid 1 all 1 and 2 come 1 country 1 country 2 dark 2 for 1 good 1 in 2 is 1 it 2 manor 2 men 1 midnight 2 night 2 now 1 of 1 past 2 stormy 2 the 1 the 1 the 2 the 2 their 1 time 1 time 2 to 1 to 1 Copyleft (ɔ) 2009 wasSudarsun Santhiappan 2 was 2

How Inverted Files are Created Multiple term entries for a single document are merged. Withindocument term frequency information is compiled.

Term a aid all and come country country dark for good in is it manor men midnight night now of past stormy the the their time time to was

Doc #

Freq 2 1 1 2 1 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 2 1 2 1 1 2 1 2

59

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2

How Inverted Files are Created Finally, the file can be split into A Dictionary or Lexicon file and A Postings file

Copyleft (ɔ) 2009 Sudarsun Santhiappan

60

Term a aid all and come country country dark for good in is it manor men midnight night now of past stormy the the their time time to was

Doc #

How Inverted Files are Created Freq

2 1 1 2 1 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 2 1 2 1 1 2 1 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2

Dictionary/Lexicon Postings

Term a aid all and come country dark for good in is it manor men midnight night now of past stormy the their time to was

N docs

Tot Freq

1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1

1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 2 2 2

Copyleft (ɔ) 2009 Sudarsun Santhiappan

Doc #

Freq 2 1 1 2 1 1 2 2 1 1 2 1 2 2 1 2 2 1 1 2 2 1 2 1 1 2 1 2

61

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2

Inverted indexes Permit fast search for individual terms For each term, you get a list consisting of:

document ID frequency of term in doc (optional) position of term in doc (optional) These lists can be used to solve Boolean queries: country -> d1, d2 manor -> d2 country AND manor -> d2 Also used for statistical ranking algorithms Copyleft (ɔ) 2009 Sudarsun Santhiappan

62

Inverted Indexes for Web Search Engines Inverted indexes are still used, even though the web is so huge. Some systems partition the indexes across different machines. Each machine handles different parts of the data. Other systems duplicate the data across many machines; queries are distributed among the machines. Copyleft (ɔ) 2009 Sudarsun Most do a combination ofSanthiappan these.

63

machines. Additionally, each partition is allocated multiple machines to handle the queries.

From description of the FAST search engine, by Knut Risvik Copyleft (ɔ) 2009 Sudarsun Santhiappan

64

Cascading Allocation of CPUs A variation on this that produces a costsavings: Put high-quality/common pages on many machines Put lower quality/less common pages on fewer machines Query goes to high quality machines first If no hits found there, go to other machines Copyleft (ɔ) 2009 Sudarsun Santhiappan

65

The Search Process

Copyleft (ɔ) 2009 Sudarsun Santhiappan

66

Searching – Process Flow

Copyleft (ɔ) 2009 Sudarsun Santhiappan

67

Google Query Evaluation Parse the Query Convert words to WordID Seek to the start of the doclist in the short barrel for every word. Scan through the doclists until there is a document that matches all the search terms. Compute the rank of that document for the query. If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step 4. If we are not at the end of any doclist go to step 4. Copyleft (ɔ) 2009 Sudarsun Santhiappan

68

Sort the documents that have matched by rank and return the top k.

Queries Search engines are one tool used to answer info needs Users express their information needs as queries Usually informally expressed as two or three words (we call this a ranked query) A recent study showed the mean query length was 2.4 words per query with a median of 2 Around 48.4% of users submit just one query in a session, 20.8% submit two, and about 31% submit three or more Less than 5% of queries use Boolean operators (AND, OR, and NOT), and around 5% contain quoted phrases Copyleft (ɔ) 2009 Sudarsun Santhiappan

69

Queries... About 1.28 million different words were used in queries in the Excite log studied (which contained 1.03 million queries) Around 75 words account for 9% of all words used in queries. The top-ten non-trivial words occurring in 531,000 queries are “sex” (10,757), “free” (9,710), “nude” (7,047), “pictures” (5,939), “university” (4,383), “pics” (3,815), “chat” (3,515), “adult” (3,385), “women” (3,211), and “new” (3,109) 16.9% of the queries were about entertainment, 16.8% about sex, pornography, or preferences, and 13.3% concerned commerce, travel, employment, and the economy Copyleft (ɔ) 2009 Sudarsun Santhiappan

70

Answers What is a good answer to a query? One that is relevant to the user’s information need! Search engines typically return ten answers-per-page, where each answer is a short summary of a web document Likely relevance to an information need is approximated by statistical similarity between web documents and the query Users favour search engines that have high precision, that is, those that return relevant answers in the first page of results Copyleft (ɔ) 2009 Sudarsun Santhiappan

71

Approximating Relevance Statistical similarity is used to estimate the relevance of a query to an answer Consider the query “Richardson Richmond Football” A good answer contains all three words, and the more frequently the better; we call this term frequency (TF) Some query terms are more important—have better discriminating power—than others. For example, an answer containing only “Richardson” is likely to be better than an answer containing only “Football”; we call this inverse document frequency (IDF)

Copyleft (ɔ) 2009 Sudarsun Santhiappan

72

Ranking To improve the accuracy of search engines: Google Inc. use their patented PageRank(tm) technology. Google ranks a page higher if it links to pages that are an authorative source, and a link from an authorative source to a page ranks that page higher Relevance feedback is a technique that adds words to a query based on a user selecting a more like this option Query expansion adds words to a query using thesaural or other techniques Searching within categories or groups to narrow a search Copyleft (ɔ) 2009 Sudarsun Santhiappan

73

Resolving Queries Queries are resolved using the inverted index Consider the example query “Cat Mat Hat”. This is evaluated as follows: Select a word from the query (say, “Cat”) Retrieve the inverted list from disk for the word Process the list. For each document the word occurs in, add weight to an accumulator for that document based on the TF, IDF, and document length Repeat for each word in the query Find the best-ranked documents with the highest weights Lookup the document in the mapping table Retrieve and summarize the docs, and present to the user Copyleft (ɔ) 2009 Sudarsun Santhiappan

74

Fast Search Engines Inverted lists are stored in a compressed format. This allows more information per second to be retrieved from disk, and it lowers disk head seek times As long as decompression is fast, there is a beneficial tradeoff in time Documents are stored in a compressed format for the same reason Different compression schemes are used for lists (which are integers) and documents (which are multimedia, but mostly text) Copyleft (ɔ) 2009 Sudarsun Santhiappan

75

Fast Search Engines Sort disk accesses to minimise disk head movement when retrieving lists or documents Use hash tables in memory to store the vocabulary; avoid slow hash functions that use modulo Pre-calculate and store constants in ranking formulae Carefully choose integer compression schemes Organise inverted lists so that the information frequently needed is at the start of the list Use heap structures when partial sorting is required Develop a query plan for each query Copyleft (ɔ) 2009 Sudarsun Santhiappan

76

Search Engine Architecture

Copyleft (ɔ) 2009 Sudarsun Santhiappan

77

Search Engine architecture The inverted lists are divided amongst a number of servers, where each is known as a shard If an inverted list is required for a particular range of words, then that shard server is contacted Each shard server can be replicated as many times as required; each server in a shard is identical Documents are also divided amongst a number of servers Again, if a document is required within a particular range, then the appropriate document server is contacted Copyleft (ɔ) 2009 Sudarsun Santhiappan

78

Each document server can also be replicated as many times as required

Google, Case Study

Copyleft (ɔ) 2009 Sudarsun Santhiappan

79

Google Architecture

Copyleft (ɔ) 2009 Sudarsun Santhiappan

80

Components URL Server: Bunch of URLs (white-list) Crawler: Fetch the page Store Server: To store the fetched pages Repository: Compressed pages are put here Every unique page has a DocID Anchor: Page transition [to, from] information URLResolver: Copyleft Relative URLSanthiappan to Absolute (ɔ) 2009 Sudarsun URL

81

Indexer Parses the document Build Word-Frequency table {word, position, font, capitalization} [hits] Pushes the hits to barrels as partially sorted forward index Identifies anchors (page transition out info)

Copyleft (ɔ) 2009 Sudarsun Santhiappan

82

Searcher Forward Index to Inverted Index Maps keywords to DocIds DocIds mapped to URLs

Reranker Uses Anchor information to rank the pages for the given query keyword. Thumbrule: Fan In increases page rank Copyleft (ɔ) 2009 Sudarsun Santhiappan

83

Reranking

Copyleft (ɔ) 2009 Sudarsun Santhiappan

84

What about Ranking? Lots of variation here Often messy; details proprietary and fluctuating

Combining subsets of: IR-style relevance: Based on term frequencies, proximities, position (e.g., in title), font, etc. Popularity information Link analysis information

Most use a variant of vector space ranking to combine these. Here’s how it might work: Copyleft (ɔ) 2009 Sudarsun Santhiappan Make a vector of weights for each feature Multiply this by the counts for each feature

85

Relevance: Going Beyond IR Page “popularity” (e.g., DirectHit) Frequently visited pages (in general) Frequently visited pages as a result of a query

Link “co-citation” (e.g., Google) Which sites are linked to by other sites? Draws upon sociology research on bibliographic citations to identify “authoritative sources” Copyleft (ɔ) 2009 Sudarsun Santhiappan

86

Link Analysis for Ranking Pages Assumption: If the pages pointing to this page are good, then this is also a good page. References: Kleinberg 98, Page et al. 98

Draws upon earlier research in sociology and bibliometrics. Kleinberg’s model includes “authorities” (highly referenced pages) and “hubs” (pages containing good reference lists). Google model is a version with no hubs, and is closely related to work on influence weights by Copyleft (ɔ) 2009 Sudarsun Santhiappan Pinski-Narin (1976).

87

Link Analysis for Ranking Pages Why does this work? The official Toyota site will be linked to by lots of other official (or high-quality) sites The best Toyota fan-club site probably also has many links pointing to it Less high-quality sites do not have as many high-quality sites linking to them

Copyleft (ɔ) 2009 Sudarsun Santhiappan

88

PageRank Let A1, A2, …, An be the pages that point to page A. Let C(P) be the # links out of page P. The PageRank (PR) of page A is defined as: PR(A) = (1-d) + d ( PR(A1)/C(A1) + … + PR(An)/C(An) )

PageRank is principal eigenvector of the link matrix of the web. Can be computed as the fixpoint of the Copyleft (ɔ) 2009 Sudarsun Santhiappan above equation.

89

PageRank: User Model PageRanks form a probability distribution over web pages: sum of all pages’ ranks is one. User model: “Random surfer” selects a page, keeps clicking links (never “back”), until “bored”: then randomly selects another page and continues. PageRank(A) is the probability that such a user visits A d is the probability of getting bored at a page

Google computes relevance of a page for a given search by first computing an IR relevance and then modifying that by taking into account PageRank for the top pages.

Copyleft (ɔ) 2009 Sudarsun Santhiappan

90

Search Engine Optimization

Copyleft (ɔ) 2009 Sudarsun Santhiappan

91

How Search Engines Rank Pages? Location, Location, Location...and Frequency Tags (, <meta>, <b>, top of the page) How close words (from the query) are to each other on the website Quality of links going to and from a page Penalization for "spamming“, when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the listings. Off the Page ranking criteria: By analyzing how pages link to each other. Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 92<br /> <br /> Why do results differ ? Some search engines index more web pages than others. Some search engines also index web pages more often than others. The result is that no search engine has the exact same collection of web pages to search through. Different algorithms to compute relevance of the page to a particular query Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 93<br /> <br /> Search Engine Placement Tips Why is it important to be on the first page of the results? Most users do not go beyond the first page.<br /> <br /> How to optimize your website? Pick your target keywords: How do you think people will search for your web page? The words you imagine them typing into the search box are your target keywords. Pick target words differently for each page on your website. Your target keywords should always be at least two or Copyleft (ɔ) 2009 Sudarsun Santhiappan more words long.<br /> <br /> 94<br /> <br /> Position your Keywords Make sure your target keywords appear in the crucial locations on your web pages. The page's HTML <title> tag is most important. The titles should be relatively short and attractive. Several phrases are enough for the description. Search engines also like pages where keywords appear "high" on the page: headline, first paragraphs of your web page. Keep in mind that tables and large JavaScript sections can make your keywords less relevant because they appear lower on the page. Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 95<br /> <br /> Have Relevant Content Keywords need to be reflected in the page's content. Put more text than graphics on a page Don't use frames Use the <ALT…. rel="nofollow"> tag Make good use of <TITLE> and <H1> Consider using the <META> tag Get people to link to your page Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 96<br /> <br /> Hiding Web pages You may wish to have web pages that are not indexed (for example, test pages). It is also possible to hide web content from robots, using the Robots.txt file and the robots meta tag. Not all crawlers will obey this, so this is not foolproof.<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 97<br /> <br /> Submitting To Search Engines Search engines should find you naturally, but submitting helps speed the process and can increase your representation Look for Add URL link at bottom of home page Submit your home page and a few key “section” pages Turnaround from a few days to 2 months Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 98<br /> <br /> Deep Crawlers AltaVista, Inktomi, Northern Light will add the most, usually within a month Excite, Go (Infoseek) will gather a fair amount; Lycos gathers little Index sizes are going up, but the web is outpacing them…nor is size everything Here are more actions to help even the odds… Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 99<br /> <br /> “Deep” Submit A “deep” submit is directly submitting pages from “inside” the web site – can help improve the odds these will get listed. At Go, you can email hundreds of URLs. Consider doing this. At HotBot/Inktomi, you can submit up to 50 pages per day. Possibly worth doing. At AltaVista, you can submit up to 5 pages per day. Probably not worth the effort. Elsewhere, not worth doing a “deep” submit. Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 100<br /> <br /> Big Site? Split It Up Expect search engines to max out at around 500 pages from any particular site Increase representation by subdividing large sites logically into subdomains Search engines will crawl each subsite to more depth<br /> <br /> Here’s an example... Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 101<br /> <br /> Subdomains vs. Subdirectories I ns tead of<br /> <br /> Do thi s<br /> <br /> gold.ac.uk/science/ gold.ac.uk/english/ gold.ac.uk/admin/<br /> <br /> science.gold.ac.uk english.gold.ac.uk admin.gold.ac.uk<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 102<br /> <br /> I Was Framed Don't use them. Period. If you do use them, search engines will have difficulty crawling your site.<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 103<br /> <br /> Dynamic Roadblocks Dynamic delivery systems that use ? symbols in the URL string prevent search engines from getting to your pages http://www.nike.com/ObjectBuilder/ObjectBuilder.iwx ProcessName=IndexPage&Section_Id=17200& NewApplication=t<br /> <br /> ?<br /> <br /> Eliminate the ? symbol, and your life will be rosy Look for workarounds, such as Apache rewrite or Cold Fusion alternatives Before you move to a dynamic delivery system, check out any potential problems. Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 104<br /> <br /> How Directories Work Editors find sites, describe them, put them in a category Site owners can also submit to be listed A short description represents the entire web site Usually has secondary results from a crawler-based search engine Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 105<br /> <br /> The Major Directories Yahoo The Open Directory (Netscape, Lycos, AOL Search, others)<br /> <br /> LookSmart UK Plus Snap<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 106<br /> <br /> Submitting To Directories Directories probably won't find you or may list you badly unless you submit Find the right category (more in a moment), then use Add URL link at top or bottom of page Write down who submitted (and email address), when submitted, which category submitted to and other details You’ll need this info for the inevitable resubmission attempt – it will save you time. Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 107<br /> <br /> Submitting To Directories Take your time and submit to these right Write 3 descriptions: 15, 20 and 25 words long, which incorporate your key terms Search for the most important term you want to be found for and submit to first category that's listed which seems appropriate for your site Be sure to note the contact name and email address you provided on the submit form If you don't get in, keep trying Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 108<br /> <br /> Subdomain Advantage Directories tend not to list subsections of a web site. In contrast, they do tend to see subdomains as independent web sites deserving their own listings So, another reason to go with subdomains over subdirectories Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 109<br /> <br /> How to do Search ?<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 110<br /> <br /> What do we search ? Information Reviews, news Advice, methods Bugs Education stuff Examples: Access Violation 0xC0000005 Search Engine ppt Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 111<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 112<br /> <br /> Main Steps Make a decision about the search Formulate a topic. Define a type of resources that you are looking for Find relevant words for description Find websites with information Choose the best out of them Feedback: How did you search?<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 113<br /> <br /> Main Problems Why is it difficult to search? Know the problem, don’t know what to look for Lose focus (go to interesting but non-relevant sites) Perform superficial (shallow) search Search Spam<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 114<br /> <br /> Typical Problems Links are often out of date Usually too many links are returned Returned links are not very relevant The Engines don't know about enough pages Different engines return different results Political bias<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 115<br /> <br /> Typical Mistakes Unnecessary words in a query Unsuitable choice of keywords Not enough flexibility in changing keywords (Ses) Divide the time devoted to search and evaluation of search results “Your search did not match any documents. ” – Bad Query!<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 116<br /> <br /> Search Tricks What can we search for? Thematic resource (http://www.topicmaps.org) Community Collection of articles Forum Catalog of resources, links File (file types) Encyclopedia article Digital library Contact information (i.e. email) Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 117<br /> <br /> Improving Query Results To look for a particular page use an unusual phrase you know is on that page Use phrase queries where possible Check your spelling! Progressively use more terms If you don't find what you want, use another Search Engine!<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 118<br /> <br /> Useful words download pdf, ppt, doc, zip, mp3 forum, directory, links faq, for newbies, for beginners, guide, rules, checklist lecture notes, survey, tutorials how, where, correct, howto Copy-pasting the exact error message Have you tried http://del.icio.us/ ? Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 119<br /> <br /> Search Engine Features<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 120<br /> <br /> Features Indexing features Search features Results display Costs, licensing and registration requirements Unique features (if any)<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 121<br /> <br /> Indexing Features File/document formats supported: HTML, ASCII, PDF, SQL, Spread Sheets, WYSIWYG (MS-Word, WP, etc.) Indexing level support: File/directory level, multi-record files Standard formats recognized: MARC, Medline, etc Customization of document formats Stemming: If yes, is this an optional or mandatory feature? Stop words support: If yes, is this an optional or mandatory features ?<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 122<br /> <br /> Searching Features Boolean Searching: Use of Boolean operators AND, OR and NOT as search term connectors Natural Language: Allows users to enter the query in natural language Phrase: Users can search for exact phrase Truncation/wild card: Variations of search terms and plural forms can be searched Exact match: Allows users to search for terms exactly as it is entered Duplicate detection: Remove duplicate records from the retrieved records Proximity: With connectors such as With , Near, ADJacent one can specify the position of a search terms w.r.t to others<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 123<br /> <br /> Searching Features Field Searching: Query for a specific field value in the database Thesaurus searching: Search for Broader or Narrower or Related terms or Related concepts Query by example: Enables users to search for similar documents Soundex searching: Search for records with similar spelling as the search term Relevance ranking: Ranking the retrieved records in some order Search set manipulation: Saving the search results as sets and allowing users to view search history<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 124<br /> <br /> Results Display Formats supported: Can it display in native format or just HTML; Display in different formats, Display number of records retrieved Relevancy ranking: If the retrieved records are ranked, how the relevance score is indicated Keyword-in-context: KWIC or highlighting of matching search terms Customization of results display: allow users to select different display formats Saving options: Saving in different formats; number of records that can be saved at a time<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 125<br /> <br /> Evaluation of Search Engines<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 126<br /> <br /> CRITICAL EVALUATION Why Evaluate What You Find on the Web?<br /> <br /> Anyone can put up a Web page about anything<br /> <br /> Many pages not kept up-to-date No quality control most sites not “peer-reviewed” less trustworthy than scholarly publications<br /> <br /> no selection guidelines for search engines<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 127<br /> <br /> Web Evaluation Techniques<br /> <br /> Before you click to view the page... Look at the URL - personal page or site ? ~ or % or users or members<br /> <br /> Domain name appropriate for the content ? edu, com, org, net, gov, ca.us, uk, etc.<br /> <br /> Published by an entity that makes sense ? News from its source? www.nytimes.com<br /> <br /> Advice from valid agency? www.nih.gov/ www.nlm.nih.gov/ www.nimh.nih.gov/ Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 128<br /> <br /> Web Evaluation Techniques<br /> <br /> Scan the perimeter of the page Can you tell who wrote it ? name of page author organization, institution, agency you recognize e-mail contact by itself not enough<br /> <br /> Credentials for the subject matter ? Look for links to: “About us” “Philosophy” “Background” “Biography”<br /> <br /> Is it recent or current enough ? Look for “last updated” date - usually at bottom<br /> <br /> If no links or other clues... truncate back the URL http://hs.houstonisd.org/hspva/academic/Science/Thinkquest/gail/text/ethics.html Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 129<br /> <br /> Web Evaluation Techniques<br /> <br /> Indicators of quality Sources documented links, footnotes, etc. As detailed as you expect in print publications ?<br /> <br /> do the links work ?<br /> <br /> Information retyped or forged why not a link to published version instead ?<br /> <br /> Links to other resources biased, slanted ?<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 130<br /> <br /> Web Evaluation Techniques<br /> <br /> What Do Others Say ? Search the URL in alexa.com Who links to the site? Who owns the domain? Type or paste the URL into the basic search box Traffic for top 100,000 sites<br /> <br /> See what links are in Google’s Similar pages Look up the page author in Google<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 131<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 132<br /> <br /> Web Evaluation Techniques STEP BACK & ASK:<br /> <br /> Does it all add up ?<br /> <br /> Why was the page put on the Web ? inform with facts and data? explain, persuade? sell, entice? share, disclose? as a parody or satire?<br /> <br /> Is it appropriate for your purpose?<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 133<br /> <br /> Try evaluating some sites... Search a controversial topic in Google: "nuclear armageddon" prions danger “stem cells” abortion<br /> <br /> Scan the first two pages of results Visit one or two sites try to evaluate their quality and reliability<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 134<br /> <br /> Ufff, The End Have you learned something today ? Try whatever we've discussed today! If you need help, let me know at sudarsun@gmail.com<br /> <br /> Copyleft (ɔ) 2009 Sudarsun Santhiappan<br /> <br /> 135 </div> </div> <hr /> <h4>Related Documents</h4> <div class="row"> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/search-engines-09352gm6dd3e" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/09352gm6dd3e.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/search-engines-09352gm6dd3e" class="text-dark">Search Engines</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> May 2020</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 15</small> <div class="clearfix"></div> </div> </div> </div> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/search-engines-68o2k8eyewop" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/68o2k8eyewop.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/search-engines-68o2k8eyewop" class="text-dark">Search Engines</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> April 2020</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 22</small> <div class="clearfix"></div> </div> </div> </div> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/search-engines-w63y959n49om" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/w63y959n49om.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/search-engines-w63y959n49om" class="text-dark">Search Engines</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> June 2020</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 14</small> <div class="clearfix"></div> </div> </div> </div> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/list-of-search-engines-xlo67lnd7136" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/xlo67lnd7136.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/list-of-search-engines-xlo67lnd7136" class="text-dark">List Of Search Engines</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> June 2020</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 10</small> <div class="clearfix"></div> </div> </div> </div> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/how-search-engines-work-8m3ke0697k3n" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/8m3ke0697k3n.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/how-search-engines-work-8m3ke0697k3n" class="text-dark">How Search Engines Work</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> May 2020</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 15</small> <div class="clearfix"></div> </div> </div> </div> <div class="col-lg-2 col-md-4 col-sm-6 col-6"> <div class="card item-doc mb-4"> <a href="https://pdfcoke.com/documents/search-engines-22-11-05-g0ox2kkj5z6q" class="d-block"><img class="card-img-top" src="https://pdfcoke.com/img/crop/300x300/g0ox2kkj5z6q.jpg" alt=""/></a> <div class="card-body text-left"> <h5 class="card-title"><a href="https://pdfcoke.com/documents/search-engines-22-11-05-g0ox2kkj5z6q" class="text-dark">Search Engines 22-11-05</a></h5> <small class="text-muted float-left"><i class="fas fa-clock"></i> October 2019</small> <small class="text-muted float-right"><i class="fas fa-eye"></i> 14</small> <div class="clearfix"></div> </div> </div> </div> </div> </div> </div> </div> </div> <footer class="footer pt-5 pb-0 pb-md-5 bg-primary text-white"> <div class="container"> <div class="row"> <div class="col-md-3 mb-3 mb-sm-0"> <h5 class="text-white font-weight-bold mb-4">Our Company</h5> <ul class="list-unstyled"> <li><i class="fas fa-location-arrow"></i> 3486 Boone Street, Corpus Christi, TX 78476</li> <li><i class="fas fa-phone"></i> +1361-285-4971</li> <li><i class="fas fa-envelope"></i> <a href="mailto:info@pdfcoke.com" class="text-white">info@pdfcoke.com</a></li> </ul> </div> <div class="col-md-3 mb-3 mb-sm-0"> <h5 class="text-white font-weight-bold mb-4">Quick Links</h5> <ul class="list-unstyled"> <li><a href="https://pdfcoke.com/about" class="text-white">About</a></li> <li><a href="https://pdfcoke.com/contact" class="text-white">Contact</a></li> <li><a href="https://pdfcoke.com/help" class="text-white">Help / FAQ</a></li> <li><a href="https://pdfcoke.com/account" class="text-white">Account</a></li> </ul> </div> <div class="col-md-3 mb-3 mb-sm-0"> <h5 class="text-white font-weight-bold mb-4">Legal</h5> <ul class="list-unstyled"> <li><a href="https://pdfcoke.com/tos" class="text-white">Terms of Service</a></li> <li><a href="https://pdfcoke.com/privacy-policy" class="text-white">Privacy Policy</a></li> <li><a href="https://pdfcoke.com/cookie-policy" class="text-white">Cookie Policy</a></li> <li><a href="https://pdfcoke.com/disclaimer" class="text-white">Disclaimer</a></li> </ul> </div> <div class="col-md-3 mb-3 mb-sm-0"> <h5 class="text-white font-weight-bold mb-4">Follow Us</h5> <ul class="list-unstyled list-inline list-social"> <li class="list-inline-item"><a href="#" class="text-white" target="_blank"><i class="fab fa-facebook-f"></i></a></li> <li class="list-inline-item"><a href="#" class="text-white" target="_blank"><i class="fab fa-twitter"></i></a></li> <li class="list-inline-item"><a href="#" class="text-white" target="_blank"><i class="fab fa-linkedin"></i></a></li> <li class="list-inline-item"><a href="#" class="text-white" target="_blank"><i class="fab fa-instagram"></i></a></li> </ul> <h5 class="text-white font-weight-bold mb-4">Mobile Apps</h5> <ul class="list-unstyled "> <li><a href="#" class="bb-alert" data-msg="IOS app is not available yet! Please try again later!"><img src="https://pdfcoke.com/static/images/app-store-badge.svg" height="45" /></a></li> <li><a href="#" class="bb-alert" data-msg="ANDROID app is not available yet! Please try again later!"><img style="margin-left: -10px;" src="https://pdfcoke.com/static/images/google-play-badge.png" height="60" /></a></li> </ul> </div> </div> </div> </footer> <div class="footer-copyright border-top pt-4 pb-2 bg-primary text-white"> <div class="container"> <p>Copyright © 2024 PDFCOKE.</p> </div> </div> <script src="https://pdfcoke.com/static/javascripts/jquery.min.js"></script> <script src="https://pdfcoke.com/static/javascripts/popper.min.js"></script> <script src="https://pdfcoke.com/static/javascripts/bootstrap.min.js"></script> <script src="https://pdfcoke.com/static/javascripts/bootbox.all.min.js"></script> <script src="https://pdfcoke.com/static/javascripts/filepond.js"></script> <script src="https://pdfcoke.com/static/javascripts/main.js?v=1727921628"></script> <!-- Global site tag (gtag.js) - Google Analytics --> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-144986120-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-144986120-1'); </script> </body> </html>