Search Analytics For Fun And Profit

  • Uploaded by: Dan Previte
  • 0
  • 0
  • August 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Search Analytics For Fun And Profit as PDF for free.

More details

  • Words: 1,258
  • Pages: 6
Who I Am Search Analytics for Fun and Profit An Event Apart Chicago, Illinois August 27, 2007 Lou Rosenfeld www.rosenfeldmedia.com

Anatomy of a Search Log (from Google Search Appliance)

Information architecture consultant to Fortune 500s Publisher and founder, Rosenfeld Media Blog at www.louisrosenfeld.com Co-author, Information Architecture for the World Wide Web (3rd ed., 2006; O’Reilly) New book: Search Analytics for Your Site: Conversations with your customers (2008; Rosenfeld Media): www.rosenfeldmedia.com/books/searchanalytics

The Zipf Curve: Short Head, Middle Torso, Long Tail

Critical elements in pink: IP address, time/date stamp, query, and # of results: XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD %3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02 XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD %3AL%3Ad1&ie=UTF8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe= UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16 XXX.XXX.XX.130 - - [10/Jul/2006:10:24:38 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD %3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF8&proxystylesheet=www&q=regional+transportation+governance+ commission&ip=XXX.XXX.X.130 HTTP/1.1" 200 9718 62 0.17

Keep It In Proportion

7218 campus map 5859 map 5184

im west

4320

library study abroad

3745 3690 3584

schedule of courses bookstore

3575

spartantrak

3229 3204

angel cata

What’s the Sweet Spot? Rank

Cumul. %

Count

Query

1

1.40

7218 campus map

14

10.53

2464 housing

42

20.18

1351 webenroll

98

30.01

650 computer center

221 500

40.05 50.02

295 msu union 124 hotels

7877

80.00

7 department of surgery

1

Topical Patterns and Seasonal Changes

Where will you Capture Search Queries? 1.

2.

3.

Querying your Queries: Getting started 1. 2. 3. 4. 5. 6. 7. 8.

What are the most frequent unique queries? Are frequent queries retrieving quality results? Click-through rates per frequent query? Most frequently clicked result per query? Which frequent queries retrieve zero results? What are the referrer pages for frequent queries? Which queries retrieve popular documents? What interesting patterns emerge in general?

The search logs that your search engine naturally captures and maintains as searches take place Search keywords or phrases that your users execute, that you capture into your own local database Search keywords or phrases that your commercial search solution captures, records, and reports on (Mondosoft, Visual Sciences, Ultraseek, Google Appliance, etc.)

Tune your Questions: From generic to specific Netflix asks 1. 2. 3.

Which movies most frequently searched? Which of them most frequently clicked through? Which of them least frequently added to queue?

Diagnose This: Fixing and improving the UX

User Research: What do they want?…

1. User Research 2. Content Development 3. Interface Design: search entry interface, search results 4. Retrieval Algorithm Modification 5. Navigation Design 6. Metadata Development

SA is a true expression of users’ information needs (often surprising: e.g., SKU #s at clothing retailer; URLs at IBM) Provides context by displaying aspects of single search sessions

2

User Research: …what else do they want?… BBC provides reports to determine other terms searched within same session (tracked by cookies)

User Research: …who wants it?… Specific segments needs as determined by:     

Security clearance IP address Job function Account information Alternatively, you may be able to extrapolate segments directly from SA

Pages they initiate searches from

User Research: …who wants it?…

User Research: …and when do they want it? Time-based variation (and clustered queries) from MSU

BBC’s top queries report from children’s section of site

 By hour, by day, by season  Helps determine “best bets” development  Also can help tune main page and other editorial content

Content Development: Do we have the right content? Analyze 0 result queries

Content Development: Are we featuring the right stuff? Track clickthroughs to determine which results should rise to the top (example: SLI Systems)

 Does the content exist?  If so, there are titling, wording, metadata, or indexing problems  If not, why not?

Also suggests which “best bets” to develop to address common queries BBC removes navigation pages from search results From www.behaviortracking.com

3

Search Entry Interface Design: “The Box” or something else?

Search Results Interface Design: Which results where?

Identify “dead end” points (e.g., 0 hits, 2000 hits) where assistance could be added Query syntax helps you select search features to expose (e.g., use of Boolean operators)

#10 result is clicked through more often than #s 6, 7, 8, and 9 (ten results per page)

OR

From SLI Systems (www.sli-systems.com)

Search Results Interface Design: How to sort results?

Search System: What to change?

Financial Times has found that users often include dates in their queries Obvious but effective improvement: allow users to sort by date

Add functionality: Financial Times added spell checking Retrieval algorithm modifications  Financial Times weights company names higher  Netflix determines better weighting for unique terms and phrases

Deloitte, Barnes & Noble, Vanguard demonstrate that basic improvements (e.g., Best Bets) are insufficient (and justify increased $$$)

Navigation: Any improvements?

Navigation: Where does it fail?

Michigan State University builds A-Z index automatically based on frequent queries

Track and study pages (excluding main page) where search is initiated  What do they search? (e.g., acronyms, jargon)  Are there other issues that would cause a “dead end”? (e.g., tagging and titling problems)  Are there user studies that could test/validate problems on these pages? (e.g., “Where did you want to go next?)

4

Metadata Development: How do searchers express their needs? Tone and jargon (e.g., “cancer” vs. “oncology,” “lorry” vs. “truck,” acronyms) Syntax (e.g., Boolean, natural language, keyword) Length (e.g., number of terms/query; Long Tail queries longer and more complex than Short Head) Everything we know from analyzing folksonomic tags applies here, and vice versa

Metadata Development: Which values and attributes? Uncover hierarchy and identify  Metadata values (e.g., mobile vs. cell)  Metadata attributes (e.g., genre, region)  Content types (e.g., spec, price sheet)

SA combines with AI tools for clustering, enabling concept searching and thesaurus development

Metadata Development: Leveraging differences in the curve

Organizational Impact: Educational opportunities

Variations in information needs emerge between Short Head and Long Tail Example: Deloitte intranet’s “known-item” queries are common; research topics are infrequent

“Reverse engineer” performance problems

known-item queries

research queries

 Vanguard  Tests “best” results for common queries  Determines why these results aren’t retrieved or clicked-through  Demonstrates problem and solutions to content owners/authors benefits

 Sandia Labs does same, only with top results that are losing rank in search results pages

Organizational Impact: Reexamining assumptions

SA as User Research Method: Sleeper, but no panacea

Financial Times learns about breaking stories from their logs by monitoring spikes in company names and individuals’ names and comparing with their current coverage Discrepancy = possible breaking story; reporter is assigned to follow up Next step? Assign reporters to “beats” that emerge from SA

Benefits    

Non-intrusive Inexpensive and (usually) accessible Large volume of “real” data Represents actual usage patterns

Drawbacks  Provides an incomplete picture of usage: was user satisfied at session’s end?  Difficult to analyze: where are the commercial tools?

Complements qualitative methods (e.g., persona development, task analysis, field studies)

5

SA Headaches: What gets in the way?

Please Share Your SA Knowledge: Visit our book in progress site

Problems*

Search Analytics for Your Site: Conversations with your Customers by Louis Rosenfeld and Richard Wiggins (Rosenfeld Media, 2008)

 Lack of time  Few useful tools for parsing logs, generating reports  Tension between those who want to perform SA and those who “own” the data (chiefly IT)  Ignorance of the method  Hard work and/or boredom of doing analysis

Most of these are going away… * From summer 2006 survey (134 responses), available at book site.

Site URL: www.rosenfeldmedia.com/books/searchanalytics/ Feed URL: feeds.rosenfeldmedia.com/searchanalytics/

Contact Information Louis Rosenfeld Rosenfeld Media, LLC 705 Carroll Street, #2L Brooklyn, NY 11215 USA +1.718.306.9396 [email protected] www.louisrosenfeld.com www.rosenfeldmedia.com

6

Related Documents


More Documents from ""