Finding Paths in Graphs Robert Sedgewick Princeton University
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Subtext: the scientific method is necessary in algorithm design and implementation Scientific method • create a model describing natural world • use model to develop hypotheses
• run experiments to validate hypotheses • refine model and repeat
hypothesis
model
Algorithm designer who does not run experiments risks becoming lost in abstraction
Software developer who ignores resource consumption risks catastrophic consequences Isolated theory or experiment can be of value when clearly identified
experiment
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Warmup: random number generation Problem: write a program to generate random numbers
hypothesis
model: classical probability and statistics hypothesis: frequency values should be uniform weak experiment: • generate random numbers • check for uniform frequencies better experiment: • generate random numbers
• use x test to check frequency values against uniform distribution 2
better hypotheses/experiments still needed • many documented disasters • active area of scientific research
model
experiment
int k = 0; while ( true ) System.out.print(k++ % V);
V = 10
012345678901234567 ... random?
int k = 0; while ( true ) { k = k*1664525 + 1013904223); System.out.print(k % V); } textbook algorithm that flunks
• applications: simulation, cryptography • connects to core issues in theory of computation
x2 test
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Warmup (continued) Q. Is a given sequence of numbers random?
average probes until duplicate is about 24
A. No. Q. Does a given sequence exhibit some property that random number sequences exhibit?
V = 365
Birthday paradox Average count of random numbers generated until a duplicate happens is about
pV/2
Example of a better experiment: • generate numbers until duplicate
• check that count is close to pV/2 • even better: repeat many times, check against distribution “Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin” — John von Neumann
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Finding an st-path in a graph is a fundamental operation that demands understanding Ground rules for this talk • work in progress (more questions than answers) • analysis of algorithms • save “deep dive” for the right problem Applications • graph-based optimization models • networks • percolation • computer vision • social networks • (many more)
s
t
Basic research • fundamental abstract operation with numerous applications • worth doing even if no immediate application • resist temptation to prematurely study impact
Motivating example: maxflow Ford-Fulkerson maxflow scheme • find any s-t path in a (residual) graph • augment flow along path (may create or delete edges) • iterate until no path exists
Goal: compare performance of two basic implementations • shortest augmenting path
• maximum capacity augmenting path Key steps in analysis • How many augmenting paths? • What is the cost of finding each path?
research literature
this talk
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Motivating example: max flow Compare performance of Ford-Fulkerson implementations • shortest augmenting path
• maximum-capacity augmenting path Graph parameters • number of vertices V • number of edges E
• maximum capacity C How many augmenting paths? worst case upper bound shortest
VE/2 VC
max capacity
2E lg C
How many steps to find each path? E (worst-case upper bound)
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Motivating example: max flow Compare performance of Ford-Fulkerson implementations • shortest augmenting path
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
• maximum-capacity augmenting path Graph parameters for example graph • number of vertices V = 177 • number of edges E = 2000
• maximum capacity C = 100 How many augmenting paths? worst case upper bound
for example
shortest
VE/2 VC
177,000 17,700
max capacity
2E lg C
26,575
How many steps to find each path? 2000 (worst-case upper bound)
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Motivating example: max flow Compare performance of Ford-Fulkerson implementations • shortest augmenting path
• maximum-capacity augmenting path Graph parameters for example graph • number of vertices V = 177 • number of edges E = 2000
• maximum capacity C = 100 How many augmenting paths? worst case upper bound
for example
actual
shortest
VE/2 VC
177,000 17,700
37
max capacity
2E lg C
26,575
7
How many steps to find each path? < 20, on average
total is a factor of a million high for thousand-node graphs!
Motivating example: max flow Compare performance of Ford-Fulkerson implementations • shortest augmenting path
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
• maximum-capacity augmenting path Graph parameters • number of vertices V • number of edges E
• maximum capacity C Total number of steps? worst case upper bound shortest
VE2/2 VEC
max capacity
2E2 lg C
WARNING: The Algorithm General has determined that using such results to predict performance or to compare algorithms may be hazardous.
Motivating example: lessons Goals of algorithm analysis worst-case bounds • predict performance (running time) • guarantee that cost is below specified bounds Common wisdom • random graph models are unrealistic • average-case analysis of algorithms is too difficult • worst-case performance bounds are the standard Unfortunate truth about worst-case bounds • often useless for prediction (fictional)
• often useless for guarantee (too high) • often misused to compare algorithms Bounds are useful in many applications:
which ones??
Open problem: Do better! actual costs
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Finding an st-path in a graph is a basic operation in a great many applications Q. What is the best way to find an st-path in a graph?
s
A. Several well-studied textbook algorithms are known • Breadth-first search (BFS) finds the shortest path
• Depth-first search (DFS) is easy to implement • Union-Find (UF) needs two passes t
BUT
• all three process all E edges in the worst case • diverse kinds of graphs are encountered in practice Worst-case analysis is useless for predicting performance Which basic algorithm should a practitioner use?
??
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Grid graphs Algorithm performance depends on the graph model complete
random
grid
neighbor
small-world
s s
s
s
t
s
t t
t
t
... (many appropriate candidates)
Initial choice: grid graphs • sufficiently challenging to be interesting • found in practice (or similar to graphs found in practice) • scalable
• potential for analysis if vertices have positions we can find short paths quickly with A* (stay tuned)
Ground rules • algorithms should work for all graphs • algorithms should not use any special properties of the model
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Applications of grid graphs
conductivity concrete granular materials porous media polymers
Example 1: Statistical physics • percolation model
s
• extensive simulations • some analytic results • arbitrarily huge graphs
forest fires epidemics Internet resistor networks evolution social influence Fermi paradox fractal geometry stereo vision image restoration
t
Example 2: Image processing • model pixels in images
• maxflow/mincut • energy minimization • huge graphs
s
object segmentation scene reconstruction . . .
t
Literature on similar problems
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Percolation
Random walk
Nonselfintersecting paths in grids
Graph covering
??
Which basic algorithm should a practitioner use to find a path in a grid-like graph?
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Finding an st-path in a grid graph M by M grid of vertices undirected edges connecting each vertex to its HV neighbors source vertex s at center of top boundary destination vertex t at center of bottom boundary Find any path connecting s to t s
t
M 2 vertices about 2M 2 edges M
vertices
edges
7 15 31 63 127 255 511
49 225 961 3969 16129 65025 261121
84 420 1860 7812 32004 129540 521220
Cost measure: number of graph edges examined
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Abstract data types separate clients from implementations
A data type is a set of values and the operations performed on them An abstract data type is a data type whose representation is hidden
Clients invoke operations
Interface specifies how to invoke ops
Implementations code that implements ops
Implementation should not be tailored to particular client Develop implementations that work properly for all clients Study their performance for the client at hand
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Graph abstract data type Vertices are integers between 0 and V-1 Edges are vertex pairs Graph ADT implements • Graph(Edge[]) to construct graph from array of edges • findPath(int, int) to conduct search from s to t
• st(int) to return predecessor of v on path found Example: client code for grid graphs int e = 0; Edge[] a = new Edge[E]; for (int i = 0; i < V; i++) {
M=5 s 20
21
22
23
24
if (i >= M) a[e++] = new Edge(i, i-M);
15
16
17
18
19
if ((i+1) % M != 0) a[e++] = new Edge(i, i+1);
10
11
12
13
14
5
6
7
8
9
0
1
2
3
4
if (i < V-M) a[e++] = new Edge(i, i+M);
if (i % M != 0) a[e++] = new Edge(i, i-1); } GRAPH G = new GRAPH(a); G.findPath(V-1-M/2, M/2); for (int k = t; k != s; k = G.st(k)) System.out.println(s + “-” + t);
t
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
DFS: standard implementation graph ADT constructor code for (int k = 0; k < E; k++) { int v = a[k].v, w = a[k].w; adj[v] = new Node(w, adj[v]); adj[w] = new Node(v, adj[w]);
graph representation vertex-indexed array of linked lists two nodes per edge
}
DFS implementation (code to save path omitted) void findPathR(int s, int t)
4
7
{ if (s == t) return; visited(s) = true; for(Node x = adj[s]; x != null; x = x.next)
7
4
if (!visited[x.v]) searchR(x.v, t); } void findPath(int s, int t) { visited = new boolean[V]; searchR(s, t); }
6
7
8
3
4
5
0
1
2
Basic flaw in standard DFS scheme cost strongly depends on arbitrary decision in client code! ...
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
for (int i = 0; i < V; i++) { if ((i+1) % M != 0) a[e++] = new Edge(i, i+1); if (i % M != 0) a[e++] = new Edge(i, i-1); if (i < V-M) a[e++] = new Edge(i, i+M); if (i >= M) a[e++] = new Edge(i, i-M);
order of these statements determines order in lists
} ... west, east, north, south
south, north, east, west
s
s order in lists has drastic effect on running time
t
~E/2
t
~E
1/2
bad news for ANY graph model
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Addressing the basic flaw Advise the client to randomize the edges? • no, very poor software engineering
• leads to nonrandom edge lists (!) Randomize each edge list before use? • no, may not need the whole list represent graph with arrays, not lists
Solution: Use a randomized iterator N
standard iterator
x
int N = adj[x].length; i
for(int i = 0; i < N; i++) { process vertex adj[x][i]; }
randomized iterator
i
int N = adj[x].length; x
for(int i = 0; i < N; i++) { exch(adj[x], i, i + (int) Math.random()*(N-i)); process vertex adj[x][i]; }
exchange random vertex from adj[x][i..N-1] with adj[x][i]
x i
N
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Use of randomized iterators turns every graph algorithm into a randomized algorithm Important practical effect: stabilizes algorithm performance s
s
cost depends on problem not its representation t t
s
s s
t
t
Yields well-defined and fundamental analytic problems • Average-case analysis of algorithm X for graph family Y(N)? • Distributions?
• Full employment for algorithm analysts
t
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
(Revised) standard DFS implementation graph ADT constructor code for (int k = 0; k < E; k++) { int v = a[k].v, w = a[k].w; adj[v][deg[v]++] = w; adj[w][deg[w]++] = v; }
graph representation vertex-indexed array of variablelength arrays
DFS implementation (code to save path omitted) void findPathR(int s, int t)
4
7
{ int N = adj[s].length; if (s == t) return; visited(s) = true;
4
7
for(int i = 0; i < N; i++) { int v = exch(adj[s], i, i+(int) Math.random()*(N-i)); if (!visited[v]) searchR(v, t); } } void findPath(int s, int t) { visited = new boolean[V]; findpathR(s, t); }
6
7
8
3
4
5
0
1
2
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
BFS: standard implementation Use a queue to hold fringe vertices s
while Q is nonempty
tree vertex
get x from Q done if x = t
fringe vertex
for each unmarked v adj to x put v on Q
unseen vertex
mark v t
void findPathaC(int s, int t)
FIFO queue for BFS
{ Queue Q = new Queue(); Q.put(s); visited[s] = true; while (!Q.empty()) { int x = Q.get(); int N = adj[x].length; if (x == t) return; for (int i = 0; i < N; i++)
randomized iterator
{ int v = exch(adj[x], i, i + (int) Math.random()*(N-i)); if (!visited[v]) { Q.put(v); visited[v] = true; }
}
} }
Generalized graph search: other queues yield A* and other graph-search algorithms
Union-Find implementation 1. Run union-find to find component containing s and t
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
s
initialize array of iterators initialize UF array while s and t not in same component choose random iterator choose random edge for union t
2. Build subgraph with edges from that component 3. Use DFS to find st-path in that subgraph s
t
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Experimental results for basic algorithms DFS is substantially faster than BFS and UF M 7
V 49
E 168
BFS .75
DFS .32
UF 1.05
15
225
840
.75
.45
1.02
31
961
3720
.75
.36
1.14
63
3969
15624
.75
.32
1.05
127
16129
64008
.75
.40
.99
255
65025
259080
.75
.42
1.08
BFS
Analytic proof? Faster algorithms available?
UF
DFS
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
A faster algorithm for finding an st-path in a graph Use two depth-first searches • one from the source s
• one from the destination t • interleave the two M 7
V 49
E 168
BFS .75
DFS .32
UF 1.05
two .18
15
225
840
.75
.45
1.02
.13
31
961
3720
.75
.36
1.14
.15
63
3969
15624
.75
.32
1.05
.14
127
16129
64008
.75
.40
.99
.13
255
65025
259080
.75
.42
1.08
.12
Examines 13% of the edges 3-8 times faster than standard implementations Not loglog E, but not bad!
Are other approaches faster? Other search algorithms • randomized? • farthest-first? Multiple searches? • interleaving strategy? • merge strategy?
• how many? • which algorithm? Hybrid algorithms • which combination?
• probabilistic restart? • merge strategy? • randomized choice? Better than constant-factor improvement possible? Proof?
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Experiments with other approaches Randomized search • use random queue in BFS • easy to implement Result: not much different from BFS
Multiple searchers • use N searchers
• • • •
one from the source one from the destination N-2 from random vertices Additional factor of 2 for N>2
Result: not much help anyway
BFS
1.40
.70 .40
DFS
.12
Best method found (by far): DFS with 2 searchers
1 2345
10
20
Introduction Motivating example Grid graphs Search methods Small world graphs Small-world Conclusion
Small-world graphs are a widely studied graph model with many applications A small-world graph has • large number of vertices
• low average vertex degree (sparse) • low average path length • local clustering
s
t
Examples: • Add random edges to grid graph • Add random edges to any sparse graph with local clustering • Many scientific models
s
t
Q. How do we find an st-path in a small-world graph?
Introduction Motivating example Grid graphs Search methods Small world graphs Small-world Conclusion
Small-world graphs model the six degrees of separation phenomenon Patrick Allen
Caligola
Glenn Close John Gielguld
The Stepford Wives
Portrait of a Lady
Murder on the Orient Express
Donald Sutherland
Cold Mountain
Vernon Dobtcheff
John Belushi
The Woodsman
Jude
Animal House
High Noon
Lloyd Bridges
Kathleen Quinlan
Bill Paxton
Wild Things
The River Wild
Kate Winslet
Paul Herbert
Tom Hanks
The Da Vinci Code Audrey Tautou
Titanic
Yves Aubert
Eternal Sunshine of the Spotless Mind
A tiny portion of the movie-performer relationship graph
Example: Kevin Bacon number
Joe Versus the Volcano
Apollo 13
Kevin Bacon
Meryl Streep
Enigma
Grace Kelly
To Catch a Thief
The Eagle has Landed
Nicole Kidman
An American Haunting
Hamlet
Dial M for Murder
Shane Zaza
Introduction Motivating example Grid graphs Search methods Small world graphs Small-world Conclusion
Applications of small-world graphs
social networks airlines roads neurobiology evolution
Example 1: Social networks • infectious diseases
• extensive simulations • some analytic results • huge graphs
. . .
An American Haunting
Hamlet
Donald Sutherland
Cold Mountain
Vernon Dobtcheff
John Belushi
The Woodsman
Jude
Animal House
Kathleen Quinlan
Bill Paxton The River Wild Paul Herbert
Meryl Streep
• natural process • experimental validation
Tom Hanks
The Da Vinci Code Audrey Tautou
Titanic
Yves Aubert
A tiny portion of the movie-performer relationship graph
Example 2: Protein interaction • small-world model
Joe Versus the Volcano
Apollo 13
Kevin Bacon
Wild Things
High Noon
Lloyd Bridges
The Eagle has Landed
Nicole Kidman
Grace Kelly
To Catch a Thief
Eternal Sunshine of the Spotless Mind
percolation
political trends
Murder on the Orient Express
Dial M for Murder
The Stepford Wives
Portrait of a Lady
Kate Winslet
protein interaction
electric power grids
Glenn Close John Gielguld
Enigma
social influence
internet
Patrick Allen
Caligola
Shane Zaza
Introduction Motivating example Grid graphs Search methods Small world graphs Small-world Conclusion
Finding a path in a small-world graph is a heavily studied problem Milgram experiment (1960)
Small-world graph models • Random (many variants)
• Watts-Strogatz • Kleinberg
add V random shortcuts to grid graphs and others A* uses ~ log E steps to find a path
How does 2-way DFS do in this model?
no change at all in graph code just a different graph model
Experiment: • add M ~ E1/2 random edges to an M-by-M grid graph • use 2-way DFS to find path Surprising result: Finds short paths in ~ E1/2 steps!
Finding a path in a small-world graph is much easier than finding a path in a grid graph Conjecture: Two-way DFS finds a short st-path in sublinear time in any small-world graph
Introduction Motivating example Grid graphs Search methods Small world graphs Small-world Conclusion s
Evidence in favor 1. Experiments on many graphs 2. Proof sketch for grid graphs with V shortcuts • step 1: 2 E1/2 steps ~ 2 V1/2 random vertices
• step 2: like birthday paradox two sets of 2V randomly chosen vertices are highly unlikely to be disjoint
t
1/2
s
Path length? Multiple searchers revisited? t
Next steps: refine model, more experiments, detailed proofs
More questions than answers
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Answers • Randomization makes cost depend on graph, not representation. • DFS is faster than BFS or UF for finding paths in grid graphs.
• Two DFSs are faster than 1 DFS — or N of them — in grid graphs. • We can find short paths quickly in small-world graphs Questions • What are the BFS, UF, and DFS constants in grid graphs? • Is there a sublinear algorithm for grid graphs?
• • • • • • •
Which methods adapt to directed graphs? Can we precisely analyze and quantify costs for small-world graphs? What is the cost distribution for DFS for any interesting graph family? How effective are these methods for other graph families? Do these methods lead to faster maxflow algorithms? How effective are these methods in practice? ...
Introduction Motivating example Grid graphs Search methods Small world graphs Conclusion
Conclusion: subtext revisited The scientific method is necessary in algorithm design and implementation
hypothesis
Scientific method • create a model describing natural world • use model to develop hypotheses
• run experiments to validate hypotheses • refine model and repeat
model
Algorithm designer who does not run experiments risks becoming lost in abstraction Software developer who ignores resource consumption risks catastrophic consequences We know much less than you might think about most of the algorithms that we use
experiment