Finding Optimum Branchings R. E. Tarjan Stanford University Stanford, California
ABSTRACT
Chu and Liu, Ecbnonds, and Bock have independently devised an e f f i c i e n t aZgorYithm t o f i n d an optimum branching i n a d i rected haph. We give an impzementation of the algorithm which runs i n O(m2ogn) time i f the problem graph has n vertices and m edges. A modification for dense graphs gives a running time 2
of O(n 1. We also show that the unmodified algorithm runs i n 2 O(n(Zogn ) + m) time on an average graph, asswning a uniform probabi Z i t y distribution. 1.
INTRODUCTION
A d i r e c t e d graph G = ( V I E ) c o n s i s t s of a set of vertices V of s i z e IVl = n and a set of edges E of s i z e IE1 = m. Each
edge is an ordered pair ( v , w ) of d i s t i n c t vertices v , w , called t h e endpoints of t h e edge. A semipath j o i n i n g x1 and 5 i n G
,...,\
i s a sequence of vertices x ,x 1 2 or ( X ~ + ~ , X E~ E ) f o r each i, 1< i
such t h a t (x i' xi+l) E E A set of v e r t i c e s W C V
i s weak29 connected i f t h e r e i s a semipath of v e r t i c e s i n W j o i n i n g any pair of vertices i n W. A maximal weakly connected s e t of vertices i s a weakly connected component of G . A path from x1 t o x i n G i s a sequence of edges k The p a t h i s simple i f x1 (x,-~,\). (x,,x2) (x2,x,)
...
,...,5-1
cycle i f x1 = 5. A s e t of vert i c e s S C V i s strongly connected i f t h e r e i s a p a t h whose edges
are distinct.
The p a t h i s a
have endpoints i n S from any v e r t e x i n S t o any o t h e r v e r t e x i n S. A maximal s t r o n g l y connected s e t i s a s t r o n g l y connected component of G. Networks, 7: 25-35 @ 1977 by John Wiley
&
Sons, Inc.
25
26
TARJAN A
bPU?zChing B of
(i) i f ( x l , y l ) ,
G i s a s e t of edges such t h a t
(x2 , y 2 ) a r e d i s t i n c t edges of B t h e n y1 # y 2 ;
(ii) B does n o t c o n t a i n a c y c l e . Given a real v a l u e c(v,w) d e f i n e d f o r each edge of G, w e d e s i r e
1
a branching B such t h a t
c(v,w) i s maximum.
Such a s e t
(vtw) EB
B is c a l l e d an optimum branching. Chu and Liu [31, Edmonds [51, and Bock [ l l have independ e n t l y given e f f i c i e n t a l g o r i t h m s f o r f i n d i n g an optimum branchi n g i n a graph G. The C h u - L i u and Edmonds algorithms are v i r t u a l l y i d e n t i c a l ; t h e Bock algorithm is s i m i l a r b u t s t a t e d as an a l g o r i t h m on matrices r a t h e r t h a n on graphs. This p a p e r g i v e s an e f f i c i e n t implementation of the Chu-Liu-Edmonds algorithm. I f G h a s n vertices and m edges t h e algorithm r u n s i n O(m1ogn)
time. time.
2 A m o d i f i c a t i o n f o r dense graphs g i v e s an O(n ) running
Also, assuming a uniform p r o b a b i l i t y measure, t h e algo-
r i t h m r u n s i n 0 (n ( l o g n ) + m) t i m e on t h e average. These t i m e bounds assume m > k n f o r some c o n s t a n t k. (This is not a serious > n-1. r e s t r i c t i o n ; a n y w e a k l y connected graph h a s m L
2.
AN ALGORITHM FOR OPTIMUM BRANCHINGS
L e t G = (V,E) be a weakly connected d i r e c t e d graph, having edge v a l u e s c(v,w).. The Chu-Liu-Edmonds a l g o r i t h m f i n d s an optimum b r a n c h i n g i n G. The a l g o r i t h m c o n s t r u c t s a s e t of edges H d e f i n i n g a subgraph G (H) = (V,H) A t a l l t i m e s d u r i n g the c o n s t r u c t i o n , G ( H ) i s such t h a t f o r each s t r o n g l y connected component s, t h e r e i s a t m o s t one edge (v,w) E H such t h a t W E S , veV-S. I f S i s such t h a t no edge (v,w) s H s a t i s f i e s w s S , v s V - S w e c a l l S a root component of G ( H ) . I n i t i a l l y H = 53; t h u s i n i t i a l l y each v e r t e x i n V d e f i n e s b o t h a weakly connected and a s t r o n g l y connected component of G(H). Here i s t h e algorithm for c o n s t r u c t i n g t h e s e t H.
.
A Zgozvithm BRANCh:
First Step: F1: F2: F3:
Choose any v e r t e x v w i t h an edge ( x , v ) of v a l u e c ( x , v ) > 0. S e l e c t t h e edge ( u , v ) of l a r g e s t v a l u e . Add ( u , v ) t o H.
FINDING OPTIMUM BRANCHINGS
27
General Step: GI:
G2 : G3 : G4 :
G5 :
Choose any root component S of G(H) having an unexamined edge (x,v) with v E S and c (x,v) > 0. Find the largest unexamined edge (u,v) such that v E S. If U E S , discard the edge and stop. Otherwise go to G4. u $ s . Let W be the weakly connected component of G(H) containing v. If u k W add (u,v) to H and stop. Otherwise go to G5. u$S, U E W . Find the sequence s,l (x1IY11 ,S2, (X2,YZ), #Sk’(\,y,) such that
-
each S. is a strongly connected component of 1
G(H)t all i,
(Xi,Yi) E: H, Yi
sk = s,
(\,yk)
E
Sit and Xi
E
Si+l for
= (u,v), and xk
’1’ G6 : Find the edge (x ,y.) with minimum value among the (xi,yi) j i G7 : For each unexamined edge (x,y) with yesi,
G8 :
modify the value of (x,y) as follows: c(x,y) := c(x,y) -c(xi,yi) +c(x ,y.). j i Add (u,v) to H. (This combines S1, ,Sk into
...
a single strongly connected component which is a root component of G(H). Repeat the general step until there is no root component S of G(H) having an unexamined edge (u,v) with v E S and c(u,v) > 0. Suppose this algorithm is applied to a graph G. The following results are implicit in the work of E w n d s and Karp [5,81. Let G(H) = (V,H).
Lema 1: Each strongly connected component S of G ( H ) has a t most one edge ( u , ~ )E H with v ES, u E V-S. Each weakly connected component W of G ( H ) contains exactly one root component. Lema 2: Let S be any root component of G ( H ) , l e t W be the weakly connected component containing S, and l e t v s S . Then f o r any w E: W there i s a unique simple path i n G ( H ) from v t o w.
28 TARJAN These lemmas follow easily by induction on the number of edges added to H. When the algorithm is completed, H contains an optimum branching, which we can extract with the help of Lemma 2. All we need to do is determine a unique vertex v in the root component of each weakly connected component of G(H); these vertices determine a branching consisting of the simple paths given by Lemma 2. If we determine these vertices using the following algorithm, the resultant branching is optimum [5,81.
A l g o s t h m ROOT: R1: R2:
R3: R4:
Find a root component R in G(H) containing more than one vertex. Find the sequence S1,(xl,yl) Sk, (x,,yk) determined
...
in BRANCH, Step G5, such that edge (\,yk) was added to H to form R. Find the edge (x ,y.) of minimum value among the (Xi,Yi). j i Delete (x.,y.) from H (this step makes S a root 7 7 j component).
Repeat Steps R1-R4 until every root component consists of a single vertex. If BRANCH keeps track of each minimum edge (x ,y,) found j i in Step G6 and of each strongly connected component formed in Step G8, then we do not need to actually carry out ROOT; BRANCH can explicitly keep track of the root vertex which would be returned by ROOT for any given root component. Our implementation uses this observation. 3.
EFFICIENT IMPLEMENTATION
We need several bookkeeping mechanisms to implement BRANCH efficiently. we must keep track of (a) the weakly connected components of G(H), (b) the strongly connected components of G(H), and (c) the unexamined edges entering each strongly connected component. For (a) and (b) we use a disjoint set union algorithm descrl.bed in [7,131. Given a collection of disjoint sets, the set union algorithm implements two operations:
FINDING OPTIMUM BRANCHINGS
29
(i) FIND(X) returns the name of the set containing element x; (ii) UNION(A,B) adds the elements of set A to set B, destroying set B. The time required for 0(m) FINDS and 0(n) UNIONS is O(n log* n+m)
*
[131, where log n = minIi
1 log .log ... log n<-l).
We use one
i times collection of sets, manipulated by WIND and WUNION, to represent the weakly connected components, and another collection of sets, manipulated by SFIND and SUNION, to represent the strongly connected components. To keep track of the edges entering each strongly connected component, we use a priority queue mechanism described in [2,91. Given a collection of elements, each with a value, and a collection of disjoint sets (called queues) of elements, the mechanism implements four operations: (iii) QUNION(C,D) adds the elements in queue D to queue C. Time required: 0 (log ICl + log ID1 ) (iv) MAX(C) returns the largest element in queue C, deleting this element from the queue. (If C is empty, MAX(C) returns a dummy element with value -a.) Time required: 0(log IC 1. (v) INIT(C,L) initializes a queue C to contain all elements in the list L. Time required: 0( L ) (vi) ADD(a,C) adds a constant a to the value of all elements in queue C. Time required: 0(1)
.
1
I 1 .
.
Here is an implementagion of the optimum branching algorithm, expressed in Algol-like notation. In the program, the set roots gives the root components of G(H) which may have entering edges of positive value. The set rset gives the root components of G(H) with no entering edges of positive value. For each i, enter(i) gives the unique edge in H entering strongly connected component i. (If there is no such entering edge, enter(i) = ( 0 , O ) .) The array mh(i) gives, for each i, the root vertex which algorithm ROOT will select if applied to root component i. The variable U a Z and vertez are used to select and record the minimum edge found in Step G6. The algorithm assumes that the graph G = (V,E) has vertex set V = (1,2,. ,n), and that, for each j , I ( j ) = ((i,j) E E ) is an incidence list for vertex j .
..
30 TARJAN BRANCH:
&Ok&%m
begin
:= 8 ; 60k i := 1 u n t i l n
roots
do
begin INIT(i,I
(ill;
i n i t i a l i z e an S-set named i c o n t a i n i n g i as i t s only element; i n i t i a l i z e a W-set named i c o n t a i n i n g i as i t s only element; e n t e r ( i ) := (0,O); roots := roots u { i l i m i n ( i ) := i;
end ; H := 8;
r s e t := 8; wkiee roots #
do
begin d e l e t e some e n t r y k from roots; ( i , j ) := MAX(k); i d v ( i , j ) < 0 Xthen rset := r s e t U { k ) e&e i d SFykD(i) = k lthen roots := POOtsU
{k)
e h e begin H := H U { ( i , j ) } i
.id
WFIND(~)
# WIND(^) ,then
begin WUNION (WIND(i) ,WIND (j) ) ; e n t e r ( k ) := ( i , j ) i
end &he
begin vaZ :=
01;
(x,y) := ( i , j ) ; lokiee (x,y) # ( 0 ~ 0 )do
begin id c ( x , y ) begin vaZ
<
vaZ t h e n
:= c ( x , y ) i
vertex
:= SFIND(y) ;
end i
(x,y) := enter (SFIND(x) 1 ;
end; ADD(VaZ
- c ( i , j ), k ) ;
:= min ( v e r t e x )i (x,y) := enter (SFIND (i)) ;
min ( k )
wkiee (x,y) # (0,O) do
FINDING OPTIMUM BRANCHINGS
begin
31
-
ADD ( V a l c (x,y) ,SFIND (y) ) ; QUNION (k,SFIND (y) ) ; SUNION (k,SFIND (y) ) ; (x,y) := enter(sFrND(x));
end; roots := roots end end end end
u(k};
BRANCH;
After BRANCH i s executed, t h e s e t of v e r t i c e s i E: P S e t ) defines an optimum branching, which may be found i n O(n+m) time by using a search [121 t o f i n d t h e s e t of simple paths from t h e v e r t i c e s i n R t o a l l o t h e r v e r t i c e s . The operations i n algorithm BRANCH a r e of t h r e e types: R =
(a) (b)
(c)
(min(i)
I
p r i o r i t y queue operations; d i s j o i n t s e t operations; o t h e r operations.
The p r i o r i t y queue operations required a r e : O(n) I N I T ope r a t i o n s , one f o r each vertex; O(m) MAX operations, O(n) QUNION operations, and O(n) ADD operations. Thus t h e t o t a l t i m e f o r t h e p r i o r i t y queue operations is O ( m log n). The d i s j o i n t set operat i o n s required are: O(m) WINDS, O(m) SFINDs, O(n) WUNIONs, and O h ) SUNIONs (the s e t H contains a t most 2n-2 edges). Thus
*
t h e t o t a l t i m e f o r t h e s e t operations i s O(n log n+m). The t o t a l t i m e f o r o t h e r operations i s c l e a r l y O(n+m). Thus t h e t o t a l t i m e required t o f i n d an optimum branching i s O(m1ogn) (assuming m > kn f o r some constant k). The space required i s 0 (m)
.
4.
A MODIFICATION FOR DENSE GRAPHS
The time bound given i n Section 2 f o r t h e optimum branch2
ing algorithm i s O(n l o g n ) i f t h e graph G is dense; i.e., m 2 i s n(n ) . However, by changing t h e p r i o r i t y queue representa2 t i m e . This t i o n , we can g e t t h e algorithm t o run i n O(n 2 modification i s analogous t o t h a t used i n a well-known O(n 1 m i n i m u m spanning t r e e algorithm [4,101. We use a d i f f e r e n t mechanism t o represent the p r i o r i t y queues. For each strongly connected component S, we have. a l i s t of edges ( i , j ) , a t most one f o r each i, such t h a t i t s , j E S , and f o r a l l edges ( i , k ) such t h a t k E S , c ( i , k ) L c ( i , j ) . This l i s t is ordered on i, and takes t h e place of t h e p r i o r i t y queue f o r component S. Associated with each edge ( i , j ) i n t h e l i s t i s i t s c u r r e n t value c ( i , j ) .
32
TARSAN
Since such a l i s t has l e n g t h a t m o s t n , a MAX operation req u i r e s O(n) t i m e . Furthermore, s i n c e every edge ( i , j ) on such a l i s t has i !j S , algorithm BRANCH executes only O(n) MAX opera2 t i o n s , and O(n ) time t o t a l is required f o r t h e MAX operations. A QUNION(C,D) operation r e q u i r e s O(nt t i m e p l u s t i m e f o r O h ) SFINDs. Thus t h e t o t a l t i m e f o r t h e QUNION operations i s 2 2 2 O(n 1. (O(n ) SFINDs p l u s O(n) SUNIONs r e q u i r e O(n ) t i m e [131.) A n I N I T ( IL1) operation r e q u i r e s 0 (n) t i m e s i n c e ILI < n ( t h e edges ( i , j ) E L may be s o r t e d on t h e value of i i n OTlLl + n) t i m e by using a r a d i x s o r t [ 9 1 ) . Thus t h e t o t a l t i m e f o r t h e 2 I N I T operations i s O(n 1 , and t h e running t i m e of t h e modified 2 version of BRANCH i s O(n 1 . The space required i s s t i l l O ( m ) .
5.
AVERAGE TIME ANALYSIS
I n this s e c t i o n w e show t h a t the average running t i m e of 2 t h e unmodified version of BRANCH i s O(n(1og n ) + m) assuming a uniform p r o b a b i l i t y measure. The a n a l y s i s i s analogous t o t h a t used i n [ l l l t o show t h e existence of an a l l - p a i r s s h o r t e s t path 2 algorithm with an average running t i m e of O ( ( n 1 o g n ) 1 . L e t G be s e l e c t e d according t o some p r o b a b i l i t y d i s t r i b u -
(n-l) ) l a b e l l e d graphs having v e r t e x set m {1,2, n} and m edges. A s s u m e t h a t , f o r a l l j , t h e probabili t y of t h e e x i s t e n c e of an edge ( i , j ) i n G is t h e same f o r each i. For each j , 1 < j < n , l e t p be a real-valued p r o b a b i l i t y
t i o n from among t h e (
...,
j d i s t r i b u t i o n , and l e t t h e values of t h e edges ( i , j ) i n G be independent v a r i a b l e s with p r o b a b i l i t y d i s t r i b u t i o n p j' With t h i s d e f i n i t i o n of a random graph, t h e p r o b a b i l i t y t h a t t h e r e e x i s t s a n edge ( i , j ) E E and t h a t ( i , j ) has maximum value among edges i n t h e s e t { ( k , j ) ( k , j ) E El i s t h e same f o r each i. Furthermore, a t any t i m e during t h e execution of BRANCH, t h e p r o b a b i l i t y t h a t t h e r e e x i s t s an edge ( i , j ) E E and t h a t ( i , j ) has maximum value among unexamined edges i n t h e set { ( k , j ) ( k , j ) E E ) i s t h e same f o r each i such t h a t ( i , j ) has n o t y e t been returned by a c a l l on MAX. This follows from t h e f a c t t h a t t h e only information known about ( i , j ) i s t h a t i f ( i , j ) i s an edge i n G then i t s value must be g r e a t e r than t h a t of a l l edges returned by previous MAX c a l l s on queues corresponding t o r o o t components containing v e r t e x j ; and t h i s information i s independent of i.
I
1
FINDING OPTIMUM BRANCHINGS
33
W e now d e r i v e an upper bound on t h e average number of MAX c a l l s used by BRANCH and hence on t h e average running t i m e of BRANCH. Suppose MAX(k) i s executed f o r some r o o t component k c o n t a i n i n g m vertices. Then t h e p r o b a b i l i t y of MAX(k) r e t u r n n-m i n g an edge ( i , j ) with i o u t s i d e component k i s a t least n Thus an upper bound on t h e average number of MAX(k) opera-
.
4,
t i o n s b e f o r e a new edge is added t o H, i s
The maximum number of edges which can be added t o H a f t e r some It follows t h a t an r o o t component reaches s i z e m is 2(n-m). M, t h e average t o t a l number of executions of M a x , upper bound on is
and t h e average t i m e r e q u i r e d by a l l t h e MAX o p e r a t i o n s i s 2
O(n(1og n ) 1 .
Combining t h i s w i t h t h e worst case t i m e bounds
on t h e o t h e r p a r t s of t h e algorithm g i v e s an O(n(log n ) 2 bound on t h e average running time of t h e algorithm. W e a l s o have
+
m)
and
- n-l1
var(M) <
4n
2
m = l (n-m)
2
= O(n
2
1.
Thus t h e s t a n d a r d d e v i a t i o n of M is O h ) , and t h e probabili t y t h a t t h e number of executions M of MAX i s g r e a t e r than M + k n i s O(1/k2).
Thus t h e p r o b a b i l i t y t h a t t h e running t i m e of BRANCH
i s n o t O(n(1og n l 2
+
m) i s
o
1
((log nl2)
.
34
TARJAN
6.
REMARKS
W e have p r e s e n t e d an implementation of Edmonds' optimum branching algorithm which h a s a worst-case running t i m e of 2 O(m l o g n ) , an average running time of 0 (n ( l o g n ) + m ) , and can 2 be modified t o run i n O(n ) t i m e , which i s an improvement i f t h e graph is dense. One n a t u r a l l y wonders i f t h e s e r e s u l t s are best p o s s i b l e . Recently discovered algorithms f o r f i n d i n g minimum spanning trees i n undirected graphs achieve a t i m e bound of 0 (m l o g l o g n ) 12,141 ; t h e y can be used t o f i n d minimum spanning trees i n O(n l o g l o g n + m) time on t h e average, u s i n g r e s u l t s i n 161. These r e s u l t s suggest t h e following r e s e a r c h questions: Can optimum branchings be found i n o(m1ogn) time? If n o t , how can one prove t h a t rmy optimum branching algorithm r e q u i r e s R (m l o g n ) t i m e i n t h e worst c a s e ?
1.
Bock, F., " A n Algorithm t o Construct a Minimum Directed Spanning Tree i n a Directed Network," Devezopments i n Operations Research, Gordon and Breach, New York, 1971, pp. 29-44.
2.
Cheriton, D. and R. Tarjan, "Finding Minimum Spanning Trees," SIAMJ. Cornput., 5, 1976, PP- 724-742.
3.
Chu, Y. J. and T. H. Liu, "On t h e S h o r t e s t Arborescence of a Directed Graph," S c i . Sinica, 14, 1965, pp. 1396-1400.
4.
D i j k s t r a , E. W., "A Note on Two Problems i n Connection with Graphs," NwneKsche Mthematik, 1, 1959, pp. 269-271.
5.
Edmonds, J., " O p t i m u m Branchings,"
6.
Jour. of Research of the NationaZ Bureau of Standurds, 71B, 1967, pp. 233-240.
Erdijs, P. and A. Renyi, "On t h e Evolution of Random Graphs,"
Pub. of t h e Math. I n s t . of the Hung. Acad. of Sciences, 5 , 1960, pp. 17-60.
7.
Hopcroft, J. and J. D. Ullman, "Set-Merging Algorithms,"
SIAM J . C o q u t . , 2 , 1973, pp. 294-303. 8.
"A Simple Derivation of Edmonds' Algorithm f o r Optimum Branchings," Networks, 1, 1971, pp. 265-272.
Karp, R. M.,
FINDING OPTIMUM BRANCHINGS
35
o f Computer Programring, VoZ. 3: Sorting and Searching, Addison-Wesley, Reading, Massachusetts,
9. Knuth, D., The A r t
1973. 10.
Prim, R. C., "Shortest Connection Networks and Some GeneraIizations," Be22 System Tech. J . , 1957, pp. 1389-1401.
11.
Spira, P., "A New Algorithm for Finding All Shortest Paths 2 2 in a Graph of Positive Arcs in Average Time O(n log n)," SIAM J . Cornput., 2, 1973, pp. 28-32.
12.
Tarjan, R., "Depth-First Search and Linear Graph Algorithms," SIAM J . Comput., l, 1972, pp. 146-160.
13.
Tarjan, R., "Efficiency of a Good but Not Linear Set Union Algorithm," J . ACM, 1975, pp. 215-225.
14.
Yao, A., "An 0 ( E log log Vl 1 Algorithm for Finding Minimum Spanning Trees," Info. P O C . Letters, 4 , 1975, pp. 21-23.
I 1
I
Research sponsored by National Science Foundation Grant GJ-35604x1; KZZer Research Fellowship a t the University o f California, BerkeZey; National Science Foundation G r a n t DCR72-03572 A02; and by the Office of Naval Research Contract NR 044-402 a t Stanford University, Stanford, CaZ ifornia. Paper received March 26, 1975.