Data/Document Creation Process
that exist in the
Physical World
Persons, Places Things, Ideas
e
describ
Data/Document Storage System
Descriptions
are represented by
t
en
make up
organized into ered by
Tables
Entries in the graph can reference each other. For instance, a request for a stock quote of an entry could point to the symbol, the symbol could point to the company name and the current quote, the current quote could point to the quote history, and the quote history could point to a quote histogram.
des
data that contains information about its format
as
are organized into
Submissions la
p ke u
are
e ar
ma
in a
te
ro
ni
y
db
be d
one
by
by
... gather all of the text on a given web page, then continue gathering text on all of the pages linked to the given page, then follow the links on those pages, and so on.
by
Any processes done on the data returned from the index. Could involve sorting, ordering, or compiling the data retrieved from the index.
ta
as
text analysis link analysis meta data analysis
ch
as
su
s
ha
suc
are “sensed” by
used
ad at
Organizing Principle
da
act
a
fro
use
Algorithms
are
Representations
to
sed
u are
by
ed
can
ir
as such
as ch su
d
ind
ex
Inktomi Google Alta Vista Lycos
n
Crawlers
ze
Relevance Ranking
ya
te
ra gene
ate cre
an
alphabetically by relevance by modification date
are
b ed
rg a
WWW Document Databases
m
la
are
pa
by
Sorted Lists
Direct Hit analyzes the time spent by users on a given result. It also keeps track of whether or not a user returns to the results page after viewing a result. If the user returns, the result receives a lower score.
n
thered
provide structure to
Grapevine allows users to actively rank results (a customization process) but then attempts to guess what the user will want based on what it has learned (a personalization process).
c ren efe is r
Page Databases
are ga
Groups of WWW Objects in context
are organized into
Google ranks results during the indexing process by examining the relationships created by the hyperlinks in the documents. More popular sites score higher.
t data
collec
send data to Sensors
ca
make up
Amazon ranks related books and CDs by keeping track of the purchasing habits of groups (group profiles).
Clever does the ranking during post processing. It uses an iterative process that looks at the popularity of a site (much like Google) but does so in regard to the keywords used in the search.
coul
of
WWW Pages
ned
ntai
co d be
A collection of technologies that attempt to rank the relative value of a set of results.
et
nt
can assign quality ratings such as editor’s choice or cool site of the day
Editors paid volunteer
y
can be
WWW Users
m
nd
a ate cre
by
b ed
d
ex
ze
te
ra gene
Yahoo! Open Directory Infoseek
Usenet Yahoo! Open Directory Dewey Decimal Library of Congress
Graph
nd
ain
t ain
could be contained in a
ei ar
ibe
ni
an
rg a
y
are
ro
red b
WWW Site Directory Databases
provide structure to
Hierarchical Taxonomies
te
athe
scr
de
are g
rep res e
Can be updated independently of the index that references it.
y
e
A repository of data that could be stored in RDF format.
db
crib
a
exe
are
ct
make up
ibe
Objects referenceable by a URI such as html, text, gif, jpeg, png, avi, mpeg, real, quicktime, wav, pdf, rss, xml, xul files
d in
Metadata
na
Alexa Bigfoot At Hand
scr
WWW Objects
aine
ca
WWW Sites
make up
cont
ind
generate
3rd Parties
coul
are
is
ate
cre
to
d use
News, Products, Related Links, Trademarks, Recirculation, Stocks
an
by
g
describe de
d
ate ener
Groups of WWW Pages in context
Attributes
ze
provide multiple views of data in a database
are
ni
Objects residing in databases that are not referenceable by a simple URI such as company names, addresses, phone numbers, names of people, personal profiles, horoscopes, credit card numbers, product names, prices, maps, event listings, news clips, weather
Databases
d be
or ga
Data/Document Retrieval System
provide structure to
Feeds
are gath
m
Data Objects
for
send data to
create an
Analyzers
sends data to a
Indexers
Post Processors
Index
gene
generate s If a graph is in place, the index functions as a lightweight lookup table. If no graph is used, the index contains all of the data in addition to the lookup table, in a list format.
Results Data
to
an
I-search and PLS are existing technologies.
as
Creates a lookup table from all of the data fed into it. If a graph is not in use all of the data fed into it is stored as entries in the lookup table.
su
... are the input devices of the data/document storage system
sends data to an
Stored proxies of physical world objects and ideas
rate s
ch
es pr re
ch
as
Raw Data
such as XML, XUL, or RDF Data is self-describing and separate from the form it will eventually take.
Interpreter
could be sent to a
se nd sd at a
Metadata and Data
su
Scraper
rec e dir ect ly
d dir ectly
Aggregator ro m
an
cou
ld
coul
cou
sent to an
be
re
could be the same as the
d co can be stored in
Customization Settings
bute to th
c
could
e specific
such as keyverbs, boolean operators, data type, media type, language or domain
be ould specify
ation of
options that are stored either locally or on a server
of a Queries
coul
d be
is made up of
involve
exist in the
and
sto
red
into
large collections of profiles that can be analyzed to produce trend information
Group Profiles
eir n th
ted
ttai
dat
of
lec
ctl yf ire ad ed at re c uld
co
an
le a
Environmental State
tain
peci he s
t rove imp n a at c a th
b
con
tion
fica
in
at c
data th
eop pp
browser type and version, language setting, IP address, current page, monitor size, color settings, javascript capability
ed tor es
tion of
ecifica
hel
on
col
Answers
uld
rely
be
Source
wer
Passive Searchers
n ca
Data/Document Retrieval Process
behaviors and states that are stored either locally or on a server
click paths, session tracking, decision tracking
on
can
in order to get
d in
taine
n be co
mpo
Scope is reduced by reducing choices from a found set, in a sequence of ever smaller sets. Best for document retrieval.
Personalization Profiles
can be stored in
the sp
could
Any UI page or widget that displays the results data
ld e shou
rely
contain
rove an imp
ed requir
Behavioral History
prove the
that can im
ion of specificat
rs
the vide ld pro shou
ta contains da
require
... can be described by their goals, age, gender, income, geographic location, education, hardware and software, connection speed, member status
de
Results Pages
make People
Provides the architecture, or “form” of the data for the creation of the results page. The templates can exist in multiple, localized formats.
sho
create
can contri
Templates
ain
cify
Options
tl
ont
spe
coul
can
n ntai
Active Searchers
made using a
Scope is reduced by specifying additional restrictions upfront. Best for data retrieval.
d
c re
an
ld c
in
ca
ul
da
di
m
cou
red
er of
in a
cify
pe ns
fica tion of
isplay ord
conta
to
Could be by relevance, creation date, modification date, alphabetical, by source, by media type, involve progressive disclosure, pagination, or custom look and feel (themes)
spec i
could
the
Any device that can receive data and render it for the user.
contains an
te to
in a
es
l the d
conta
nb
ren
could
ribu
ca
co
ive ce re
ta
ro yf
displays
cont
contro
an
Views
can
of
Output Device
awn by an could be redr
Keywords
s use
uence
can
can infl
influ ence
ce infl uen
ld
cou
Combines or interleaves data from multiple sources.
organize the form of
can
he
yt
nb
w dra
make
sends data to a
or ‘Layout Engine’ Combines form and content.
an
eiv
a from e dat receiv ld u co
Articulator
Scrapes unwanted form information from the data stream. Classifies remaining data.
ive
rece iv
e
ld r ece ive
sends data to an
Turns the user’s query into something readable by the index. Could handle: boolean operators spell checking word stemming case folding internationalization thesauri phrase searching related terms
Input Device
such as HTML streams Form and content are merged.
Information to enable Actions toward
Understanding Internet Search
Goals Could be organized as cultural responses to human needs (Malinowski) roughly, Food Or more simply, Kinship Work, Play, Learn Shelter Protection and Activities Food, Clothing, Training Shelter, Love Hygiene
User Context
Concepts, Systems and Processes 8 August 1999 The suggested starting point for reading is “People”. I would like to acknowledge Hugh Dubberly for his many suggestions, and Ken Hickman and Paul Pangaro for their contributions. Designed by Matt Leacock Search Concept Map, version 1.2