The Load Distance Balancing Problem Eddie Bortnikov (Yahoo!) Samir Khuller (Maryland) Yishay Mansour (Google) Seffi Naor (Technion)
The Load-Distance Balancing Problem Given
n clients and k servers s1, s2,…sk we need to assign each client to a server. Cost for client i assigned to server sj is as follows: Cost(i) = Distance(i,sj) + Delay(j,Lj) Delay(j,Lj) is a FUNCTION of the number of clients Lj assigned to sj. OBJECTIVE: Min Max Cost(i)
An Example s2
s1 C A
s3 D
E
B
Each server has its own delay function (can be arbitrary, we just assume its non-decreasing). Note that C is closer to s1, but prefers to attach to s2 since Dist(C,s2)+Delay(s2,2)
Related Work Lots
of research on locating facilities. Here the facilities are all given; we just have to compute assignment of clients to facilities. Notion of capacities has been used for various covering problems such as Vertex Cover, K Centers, Facility Location etc.
Main Results
The problem is NP-hard. We develop a polynomial time 2 approx. We show that the bound 2 cannot be improved to 2-ε unless NP=P. With triangle inequality in the distance function the hardness reduces to 5/3-ε. When all clients and servers are on a line its solvable in polynomial time. For Min Sum Cost(i), we can solve in polynomial time using Min-Weight Matching.
NP-hardness by Exact Set Cover
Given N elements and a collection S of K sets (each set has size m). Does there exist a subset S’ of S, such that each element belongs to exactly one set in S’? In other words, we need to pick exactly N/m subsets from S, to cover each element once.
Example of Exact Cover
Here we have N=16 elements and 9 sets (m=4). The FOUR blue sets form an EXACT COVER, and we discard the FIVE orange sets.
Reduction from Exact Cover
Each element is a client. In addition we create a collection of M(K-N/m) dummy clients. Subset Sj in S corresponds to server sj.
Dist(dummy,server)=d1
Dist(i,sj) = d2 if i ε Sj, o.w. ∞
d2 >> d1
d1 Dummy clients
d2 i Clients (elements)
sj
Reduction from Exact Cover
Delay functions for servers are basically a step function. Delay(j,Lj) = Δ-d2, when load is at most m. Delay(j,Lj)= Δ-d1, when load exceeds m, but is at most M. Δ-d1
Δ-d2 m
M
Reduction from Exact Cover
Suppose there is a solution to exact cover, then there is a solution to the LDB problem with delay at most Δ. For each chosen subset Sj, the corresp. server sj gets m clients each at distance d2.Since the delay is Δ-d2, the total cost is at most Δ. For the remaining subsets, those dummies are all assigned to the remaining servers (K-N/m), each gets M dummies. The proof in the other direction requires some work!
Proof (cont.)
Each server can support at most M dummy clients if the total cost does not exceed Δ, and no more than m real clients. Suppose a server supports both real and dummy clients; then the total number of servers with real clients is k’ > N/m. These serve at most (mk’-N) dummy clients, while the rest can serve only M(k-k’) dummy clients. Adding the two shows (some algebra needed) that we can only assign < M(k-N/m) dummy clients if M>m.
Hardness Results follow…. We
can set d1=ε and d2=Δ-ε. A solution to EXACT COVER exists if and only if a solution with cost Δ exists for LDB. If there is no solution to EXACT COVER, then every solution to LDB has cost 2(Δ-ε). However, here we do violate triangle inequality in the distance function.
Hardness results with triangle inequality We
need to set d1=⅓ Δ, and d2= Δ-ε. With these parameters, the distance between a (real) client i and a server sj such that i is not in Sj, is at least 5/3Δ- ε. A solution to EXACT COVER exists if and only if a solution with cost Δ exists for LDB. If there is no solution to EXACT COVER, then every solution to LDB has cost 5/3Δ-ε.
Approximation Algorithm with factor 2
Suppose a solution exists with maximum cost Δ. For each server sj, we can compute an upper bound on the number of clients that can be served with a delay of at most Δ (say L*j). For each client i we can compute the subset of servers that are within distance Δ (S*i). Now its just a flow problem to check if an assignment exists where each client i is assigned to a server from S*i and each sj has load at most L*j. Minimizing Δ gives a trivial 2 approximation.
All servers and clients on a line Use
dynamic programming!
Minimizing the Sum of Costs
We reduce this to min cost matching in a bipartite graph. Let G=(X,Y,E) where nodes in X correspond to n clients and there are nk nodes in Y. We have n nodes corresp. to each server. We ask for a min cost matching to find a solution.
Capacitated K Centers
The related problem of choosing K facilities has been considered (each client should be assigned to a closeby facility and the load on the facility should not be too high): [Khuller & Sussman] (K,L,5R) or ((2/c)K, cL, 2R) related to K-centers clustering.
Conclusions Can
we improve the 2 approximation when triangle inequality holds? Can we improve the 2 approximation when Delay functions satisfy specific properties? What is a natural delay function? Are there other special cases that can be solved in polynomial time?