Study guide on correctness proofs of greedy algorithms, Fall ’08, McConnell Updated on 10/14/08 In Chapter 4, we will skip Sections 4.3 and 4.9.
Optimization Problems An optimization problem has the following ingredients: • A set of constraints that a solution must satisfy. A solution is feasible if it satisfies the constraints. • An objective function that tells, for every feasible solution, how good the solution is. • An optimum solution to an instance of the problem is a feasible solution that scores at least as high as every other feasible solution to the instance. There may be many optimum solutions. The goal of an algorithm is to produce any optimum solution. The algorithm is correct if, for every instance of the problem, it either declares that it has no feasible solution, or else outputs some optimum solution. Example: Given the positions of a set S of gas stations along a route and a car with a gas tank that has a range of d miles, find a minimum-sized subset X of S that you can stop at without running out of gas. Assume you start out with a full gas tank. • The constraint is that no two consecutive elements of X can be separated by more than d miles. • A feasible solution is a set A of gas stations that meets the constraint, that is, a set of stops that won’t leave you stranded by the roadside. • A feasible solution exists if no two elements of S are separated by more than d miles. • The objective function is the number of elements in a feasible solution A; the smaller its value the better A is. • An optimum solution is a feasible set S of gas stations that’s as small as every other feasible set A of gas stations. (It minimizes the number of stops you have to make.)
1
Proving the Correctness of a Greedy Algorithm for an Optimization Problem Here is an algorithm for finding an optimum solution X for the gas-station problem. As a base case, if the destination is within d miles, just drive to it, and return the empty set of gas stations as your optimum solution. Otherwise, select y to be the farthest gas station that lies within d miles of the start. Recurse on the remainder of the trip to get an optimum solution Z for the rest of the trip and return X = {y} ∪ Z as an optimum solution for the whole trip. Here’s an illustration of the template for proving such an algorithm correct. For simplicity, we will always assume that a feasible solution exists, since modifying an algorithm to handle the case when it doesn’t is always trivial, and it’s an annoying nuisance. The template is a Frankenstein-like piece of surgery on an arbitrary optimum solution to turn it into the solution returned we returned. The operation shows that our solution is feasible, just as good, and therefore also an optimum solution. Let U be an arbitrary optimum solution. Imagine it lying on an operating table. 1. First, we perform a head transplant. Let v be the first stop in U , the “head,” and let W = U −{v} be the rest, the “body.” Replace v with a new head, y, the first stop our algorithm chose, and we still have feasible solution, A = {y} ∪ W . (Why is A feasible?) The head transplant didn’t increase the size of the solution, and the patient is starting to look a little more like our solution. 2. Next, we perform a body transplant. Replace W with Z in A, to obtain X = (A − W ) ∪ Z, where Z is what our algorithm returned in the recursive call. That is, we replaced everything but the head. By induction on the number of gas stations, the new body, Z, is an optimum solution to the recursive call. The old body, W , is a feasible solution to the recursive call. (Why?) Therefore |Z| ≤ |W |, and the body transplant didn’t increase the size of the solution. The new solution, X, is exactly what our algorithm produced, it is feasible, and no transplant made the solution any worse, so it’s just as good as U . Since U was an optimum solution, so is X. Every greedy proof that we will study follows this format. The only thing that differs from one proof to the next are the answers to the kinds of questions I put in parentheses, namely, why the solution remains feasible after each transplant, and why no transplant made the solution any worse.
2
Interval Scheduling (Section 4.1) You’re given a set S of intervals, and we have to find a maximum subset of intervals that don’t intersect. Here a feasible solution is any set of pairwise disjoint intervals, and the objective function is the number of intervals in the set. Here’s a greedy algorithm: As a base case, if there are no intervals, return the empty set; this is the best we can do. Otherwise, select the interval y with the earliest ending time, toss out all intervals that intersect y, recurse on the remaining intervals to get an optimum solution Z to this recursive call, and then return X = {y} ∪ Z as our solution. For the proof, let U be an optimum solution. Let v be the earliest-ending interval in U . The head is v and the body is W = U − {v}. Perform the head transplant: swap v with y to obtain A = {y} ∪ W . This is still feasible. (Fill in the reasons here.) Next, perform the body transplant: swap Z for W in A to obtain the solution X returned by our algorithm. By induction, Z was an optimum solution to the recursive problem, W was a feasible solution to the recursive problem, so |Z| ≥ |W |. The new solution X is feasible (fill in reasons here), at least as large as U , and therefore also an optimum solution. For each greedy proof you read in the book, try to see if you can translate the authors’ argument into a variant of this story, with only a few details changed to fit the problem at hand.
Minimum Spanning Tree An undirected graph satisfies one of the following if and only if it satisfies all of them: 1. It is connected and has one fewer edges than vertices. 2. It is connected and has no cycles. 3. It has no cycles and has one fewer edges than vertices. 4. Between every pair of vertices, there is a unique path. 5. It has no cycles and adding a new edge between any two vertices creates a cycle. Convince yourself that a graph that satisfies one of these conditions satisfies all of them. To do this, try to disprove the claim with counterexamples, and explain what sabotages your efforts. A graph that satisfies one (hence all) of these conditions is called an (unrooted) tree. 3
For the minimum spanning tree problem, an instance is an undirected graph where the edges have weights, the output is a subset of the edges, the constraint is that the subset of edges connect the vertices, and the objective is to minimize the total weight of the selected edges. There is a feasible solution if and only if the graph is connected. An optimum solution connects the vertices and has no cycle, since, if it has a cycle, any edge of the cycle can be removed without disconnecting the vertices, yielding a better solution. The edges of an optimum solution must therefore form a tree. That’s why an optimum solution for this problem is nicknamed a minimum spanning tree. Our algorithm will contract an edge e, merging its endpoints into a single node. The edges incident to this new node are all edges except e that were previously incident to the endpoints of e.
e6
v e5 e
z e3
2
v e6 e
e u 7 w
x
4
y u
e5 e
z
v e3
2
e6 e
e7
t
e3
e5
4
y u
e7
s
e
4
e
1
Contract e
1
Contract e
2
The figure shows the result of a contraction on e1 followed by a contraction on e2 . One problem is that this can result in a multigraph, where a pair of vertices can have multiple edges between them, and an edge can be a loop, where the two endpoints are the same vertex. To make the induction go through, if we assume that our recursive call operates on a multigraph, we have to show that the main call works on a multigraph too. As is often the case with induction, it is often easier to solve a more general problem in order to get a more general induction hypothesis. In this case, we find a minimum spanning tree of a multigraph. This solves our original problem on graphs, since a graph is just a special case of a multigraph. The goal of our algorithm is to return the identifiers (names) of the edges of the MST of the original graph. When we contract an edge, the endpoints of some other edges can change, but we can still keep track of what edges of the original graph they correspond to since they retain their original identifiers. Our algorithm works by induction on the number of vertices. As a base case, if there is only one vertex in our multigraph, there are zero edges in the spanning tree, so return the empty set. Suppose our multigraph G has n > 1 vertices. Partition the vertices into two nonempty sets S and T of your choosing. Such a partition is called a cut. Find a minimum-weight edge e that crosses the cut. Choose e to be in your minimum spanning tree. To obtain the rest of the tree, contract e to obtain a multigraph G0 . Since this has one fewer vertices, we can make a 4
y
recursive call on G0 . Adding the identifier of e to this list of edges this call returns, and returning the result gives a minimum spanning tree of G. For the proof, let U be any optimum solution (set of edges of an MST), and let X be the set of edges our algorithm picked. As before, we will show how to perform two transplants on U to turn it into X without making the solution any worse (heavier). This will show that X is also an MST. It is a minimum-weight edge across a cut {S, T }. Adding e to U creates a cycle C, and since the endpoints of e are on opposite sides of the cut, C must re-cross the cut on some other edge f . The old head will be f , the new head will be e, the old body will be W = U − {f }, and the new body will be the result Z = X − {e} returned by our recursive call. This summarizes our surgical tasks. For the head transplant, removing f from U ∪ {e} doesn’t disconnect it, because endpoints of f are still connected by the rest of C. The new set is connected and has n − 1 edges on n vertices. It is once again a spanning tree. Moreover, since e and f both cross {S, T } and e is a minimum-weight edge that crosses {S, T }, the weight of f is at least the weight of e, (U ∪ {e}) − {f } = W ∪ {e} is also a minimum spanning tree. For the body transplant, contracting e makes W connected, since W ∪ {e} is connected in the original multigraph. It has n − 2 edges, and since it spans G0 , which has n − 1 vertices, it is a tree. By our induction hypothesis that the algorithm is correct for smaller multigraphs, it returned a minimum spanning tree Z of G0 . Since W is a spanning tree of G0 and Z is a minimum spanning tree of G0 , the weight of Z is at most the weight of W . Uncontracting e and adding it to Z doesn’t disconnect it, X = {e} ∪ Z has n − 1 edges and connects G, which has n vertices. It is therfore a spanning tree of G. Since the weight of Z is at most the weight of W , this tree, X = Z ∪ {e} weighs at most what the tree W ∪ {e} did. X is what the algorithm returned for the original multigraph G, and it is a minimum spanning tree. Efficient implementation Let’s call the above algorithm generic algorithm. The generic algorithm gives you complete freedom in picking the cut {S, T }. To get a good time bound, we will exercise this freedom to get special cases of the generic algorithm that can be implemented efficiently. Kruskal’s algorithm picks the minimum weight edge uv that isn’t a loop. There is always a cut {S, T } that this crosses. For instance, S = {u} and T = V − {u} is such a cut. This edge is fair game for contraction under the generic algorithm. It uses the union-find structure to figure out efficiently whether an edge is a loop. In Prim’s algorithm, a privileged vertex s is passed into the call. It picks a minimum non-loop edge sx incident to s to contract. The cut {{s}, V − {s}} is an example of one that sx crosses, so it’s fair game under the generic algorithm. After contracting sx to obtain a new vertex s0 , it passes s0 as the privileged vertex to the 5
recursive call. To find a minimum edge incident to s efficiently, it uses a priority queue on V −{s}, where the heap key of each vertex y is the weight of the minimum edge between y and s. When we contract sx to obtain s0 , we have to remove x from the priority queue and possibly decrease the heap keys of neighbors of x so that they reflect the least-weight edge to s0 . We can charge these heap operations at O(log n) per edge incident to the vertex x that gets merged into s. This happens only once per edge, so the total time is O(m log n).
Single-source shortest paths The strategy of Dijkstra’s algorithm for finding the distances from a vertex s to all others in an undirected graph is almost the same as Prim’s. All edge weights must be non-negative, however. Only the strategy for updating the weights in for the recursive call, G0 , is different. As before, we pick a minimum-weight edges sx and contract it. Edge sx is obviously a least-weight (shortest) path from s to x. This creates a new vertex s0 . Some of the edges out of s0 were previously out of s, and some were out of x. For the recursive call on G0 , we must add the weight of sx to the ones that were out of x, because to use them in G, we must must first go through sx. Other than this detail, when the graph is undirected, the algorithm is the same as Prim’s, and so is the proof. However, it generalizes to directed graphs. Work out the details on your own, using these remarks and the book for help. A possible exam question is to ask you to write out a proof of correctness of Dijkstra’s algorithm, even though I won’t go over it in class. Incidentally, when Dijkstra first published this algorithm, he pointed out that in the special case where all the edges have weight 1, you can replace the priority queue with just a simple queue, giving an O(n + m) algorithm. Thus was born BFS.
Scheduling for minimum lateness. A bunch of customers’ jobs are waiting for you to do them. Each job requires a certain number of time units to complete, and has a deadline for it to be done by. Once you’ve started a job, you have to finish it. If you finish a customer’s job more than k time units past the deadline, he will complain and you will be fired. To avoid this, if possible, you come up with a strategy that minimizes the maximum tardiness of any job – you don’t care how many customers go away grumpy, as long as you don’t get fired. (I’m convinced that the construction firm that’s building our new building uses this algorithm.) If this algorithm produces a schedule where no job is more than k great; otherwise, 6
you should visit the unemployment office during your lunch hour. All that’s required is to know an optimum order in which to schedule the jobs. You start out at time T0 . As a base case, if there are no jobs, return the empty list. Otherwise, schedule the job with the earliest deadline first, then recurse on the rest of the schedule with a new T0 equal to the old one plus the time required by the first job to get the rest of a tardiness-minimizing schedule. For the proof, let (j1 , j2 , . . . , jn ) be an optimum solution. This time, instead of cutting and adding things, the strategy of our proof will be to shuffle the order around in ways that don’t hurt the maximum tardiness, yielding the order returned by the algorithm. The job with the earliest deadline is jk . Let’s show that we can move jk into the first position without hurting the maximum lateness. Suppose you are the job jk with the earliest deadline, and I’m the one just ahead of you in the queue. If we swap positions, then I will finish in the new schedule at the time when you finished in the old schedule. Since my deadline is at least as late as yours, I’m no more tardy in the new schedule than you were in the old. You aren’t any tardier, since you are now getting done earlier. You should cut ahead of me in line. Iterating this argument, if you cut all the way to the front of the line, we get a schedule where you’re first, and this new schedule is no worse than (j1 , j2 , . . . , jn ). Similarly, the rest of this new schedule is a solution to the recursive problem. Replacing it with the order of the remaining elements produced by the recursive call, which is an optimum solution to the recursive problem, doesn’t cause any harm. The result is the solution to the original problem returned by our algorithm, and it’s no worse than the optimum solution (j1 , j2 , . . . , jn ) we started out with.
Huffman codes The following gives a Huffman tree for a message of length 100, where a occurs 45 times, e occurs 12 times, b occurs 13 times, f occurs 5 times, e occurs 9 times, and d occurs 16 times:
7
100 0
1 55
45 a
1
0
30
25 0
1
0
1
12
13
c
b
14 0
1
5
9
f
e
16 d
By induction from the leaves to the root, the leaves have been labeled with the frequency of their letter. Each internal tree node has been labeled with the sum of its two children. Convince yourself that this compels each internal node to be labeled with the sum of occurrences of its leaf descendants. The bit representation of each letter is the sequence of labels on the path from the root to the letter. For instance, the bit representation of b is 101 and the bit representation of e is 1100. Suppose the message begins with 101011010110011111011001010 . . .. Start at the root and follow the path given by the initial letters of the string, until you reach a leaf. In this case, 101 takes you to b, so the first letter of the message is b. Recurse on the rest of the bit string to get the rest of the message. Make sure you can decode the above bit string as baeaf decba . . .. The reason it’s possible to decode the message is because no letter is an ancestor of another. You can always tell when you’ve finished a letter. Another way to say this is that the no letter’s encoding is a prefix of another’s. The length of the message is obtained by summing over each character the number of occurrences times its depth in the tree. Convince yourself that this is compelled to be equivalent to just adding up the labels at all the tree nodes, excluding the root. The length of the encoding of this message using the tree in the figure is 224 bits, which compares favorably with the 300 bits it would take if each character required three bits, which is what it would need if we encoded all characters with equal-length bits. The saving comes from making more frequent characters have shorter representations, at the expense of less frequent characters, which have longer representations. We want to build a tree that minimizes the length of the encoding. Here is Huffman’s algorithm. We work by induction on the size of the alphabet. In the base case, there are two letters, create a tree with one root and two leaves. Otherwise, find the two least-frequent characters, call them e and f , and replace all occurrences of e and f in the message with a new character, x. This 8
reduces the size of the alphabet by one. Recursively build an optimum tree for this message. Then make the two least frequent characters the children of the node corresponding to x to get an optimum tree for the original message. For the proof, note that the algorithm makes e and f siblings. For the head operation, we show that we can take an arbitrary optimum tree T and shuffle it so that e and f are siblings. Let nodes v1 and v2 be two deepest siblings in T . Swap e with the letter in v1 and swap f with the letter in v2 . Convince yourself that the resulting tree, T 0 , can be no worse than T . For the body operation, and trim off e and f from T 0 , naming their parent, which is now a leaf, with new letter x. This gives an encoding tree T1 for the reduced message. Let k be the number of occurrences of e and f . The length of the message encoded by T1 is shorter by k bits than the one encode by T . By induction, a recursive call on the reduced message produces a tree T2 that is at least as good as T1 , since they are both encoding trees for the reduced message and T2 is an optimum one. Adding back e and f increases the length of T2 ’s message by k again. The final tree T 00 is at least as good as T 0 is.
9