The (Conditional) Biased Coin Problem CH June 19, 2009
The (Conditional) Biased Coin Problem In the past few days, I’ve had a few phone interviews which asked some variant of the biased coin problem. A couple of them went really smoothly, others didn’t because when I’m nervous I do silly things like exchange multiplication signs with addition signs, or fail to properly reduce fractions. Like so many of my professors, when it comes to doing things like counting actual physical items (like hour-to-hour intervals on a clock face) or adding two integers, I am woefully error-prone. So I decided, once and for all, to solve the coin problem in full generality, knowing that it will be asked again; so that I need only plug in for some numbers and make one error-prone calculation, rather than an entire sequence (which reduces the probability of a correct answer geometrically with the number of steps in the calculation, independent of whether the mathematical reasoning behind the sequence of steps was correct). Here’s the problem. There are m coins, of which h are head-biased with probability p > 12 for coming up heads, and c are tail-biased with probability q > 21 of coming up tails. You draw r coins at random and flip each of them k times. All of them come up heads k times. What is the probability that at least ` of these coins are head-biased? Granted, this question is far more complicated than that which is typically asked (often, m = 5 and h = 1, p = 1, while c = 0 and k ∈ {2, 5}), so there’s no worry about multiple anything, and the coin is double-headed rather than simply biased. But the solution to the (generalized) simple problem must be solved first and then we can move on to the heftier things. So we let the event d denote that a coin is biased towards heads (and d¯ denotes not biased towards heads). H k denotes k heads observed. First, we want to calculate P[d|H k ].
1
This is a simple application of Bayes’ rule, which states: P[H k |d]P[d] P[H k ]
P[d|H k ] = Then we can just plug in: = =
h pk m h c pk m + qk m + 2−k m−h−c m
hpk hpk + cq k + (m − h − c)2−k
which given that p > 1/2 > q converges to 1 with k, as we expect. Since typically p = 1 and q = 0 we might write: (double − sided coins)
=
h h+
2−k (m
− h − c)
Now a follow-up question is, you draw r coins, all of which pass this test; what is the conditional probability that all are double-headed? This is given by: h Y pk i r k P[d |H each draw] = ipk + cq k + (m − i − c)2−k i=h−r+1
which can be verified as the first draw has h possible head-biased coins, the second draw has h − 1 possible head-biased coins, and the final draw has h − r + 1 possible head-biased coins remaining in the set. Now we build to the following question. In a set with no tails-biased coins (bet you can guess what the next question is), We draw r coins, but ask only for the probability that exactly f are heads, where f ≤ r. That is: P[exactly f coins head biased|all give H k ] Now because we are not drawing with replacement, the order of draws actually matters, so we need so sum over permutations of the draw, which we can look at as a partitioned set: ¯ d, ¯ . . . , d¯ | d, d, . . . , d} {d, | {z } | {z } r−f
f
¯ is the biased-ness of the j th coin So we introduce an index set If,r , where If,r (j) ∈ {d, d} in a sequence which contains f head-biased coins and r total coins. Then the probability that exactly f of r coins are biased is given by: X P[Sequence If,r occurred|all in sequence give H k ] If,r
2
And the probability of said sequence can be given by: r Y
P[If,r (j)|I; I(j) gives H k ]
j=1
Which is simple enough. We introduce a function: If,r (j) = d¯ 0 IDd (If,r (j)) = 1 If,r (j) = d and let ˜j =
Pj
i=1 IDd (If,r (j)).
Then we can write:
P[If,r (j)|I; I(j) gives H k ] = pk (h − ˜j) pk (h − ˜j) IDd (I(j)) + 1 − k (1 − IDd (I(j))) pk (h − ˜j) + 2−k (m − h + ˜j − j) p (h − ˜j) + 2−k (m − h + ˜j − j) Which means of course that if there are no tail-biased coins, the probability that ` or more are heads of r draws is given by: r XY r X pk (h − ˜j) pk (h − ˜j) IDd (If,r (j)) + 1 − k (1 − IDd (If,r (j))) pk (h − ˜j − j) + 2−k (m − h + ˜j) p (h − ˜j) + 2−k (m − h + ˜j − j) j=1 f =` I f,r
Adding tail-biased coins back in will necessitate a third term in this probability equation; which comes from one of the coins being tail-biased but still popping up heads k times. So we will introduce IDt (·), which functions just like IDd (·) but with P t’s instead of d’s. Similarly, we need some new notation for ˜j, so now we let ˜jd = ji=1 IDd (If,r (i)) and P ˜jt = j IDt (If,r (i)). Then we will have that the probability that ` or more of the drawn i=1 coins are head-biased is: r XY r X
[P1 IDd (If,r (j)) + P2 IDt (If,r (j)) + P3 (1 − IDd (If,r (j)) − IDt (If,r (j))]
f =` If,r j=1
where: P1 =
pk (h − ˜jd ) pk (h − ˜jd ) + q k (c − ˜jc ) + 2−k (m − h − c + ˜jd + ˜jc − j)
P2 =
q k (c − ˜jt ) pk (h − ˜jd ) + q k (c − ˜jc ) + 2−k (m − h − c + ˜jd + ˜jc − j)
and P3 = 1 − P1 − P2 . Perhaps the only item that needs justification is writing the number of remaining fair coins as (m − h − c + ˜jd + ˜jc − j). We have m − h − c fair coins to start, we “assume” that each draw is fair, removing j of them, but if ˜jd are actually heads, we 3
can correct our estimate by an amount ˜jd , and if ˜jc are actually tails, we can correct our estimate by an amount ˜jc . In other words, of j draws if ˜jc and ˜jd are the tails and heads removed, then j − ˜jd − ˜jc are the number of fair coins removed; subtracting this gives the value subtracted from the starting number of coins. Basically any question about number of heads, say (between 3 and 5 or between 7 and 9) heads appearing ,only changes the initial summation over f , so this is a nice general expression. One further question might be, rather than having H k for every coin, what if we allow that to change, having a (possibly) different H α for each coin chosen?. But then we replace k with K(j) where K(j) is the number of heads appearing on the j th coin. A better question is: Can we approximate this for m large and r << h, c << m? You betcha, we say that the probability for j = 1, . . . , r of the j th coin of the sequence is headbiased, given H k , changes negligibly with j, so first we can write that for these conditions P1 ≈
pk h pk h + q k c + 2−k (m − h − c)
qk c pk h + q k c + 2−k (m − h − c) and as always P3 ≈ P1 − P2 , and these hold for all j. Then we can write the product over j as a multinomial, that is: P2 ≈
r Y
[P1 IDd (If,r (j)) + P2 IDt (If,r (j)) + P3 (1 − IDd (If,r (j)) − IDt (If,r (j))] ≈ P1f P2t P3r−f −t
j=1
where t is the number of tail-biased coins in index set If,r . Then, if we break our index t where t indicates the number of tails in the index set, then we can write sets down by If,r that the probability that ` or more coins are head-biased is given by: r min[r−f,c] X X f =`
=
t=0
r min[r−f,c] X X f =`
t=0
t |Ir,f |P1f P2t P3r−f −t
r! P f P t P r−f −t f !t!(r − f − t)! 1 2 3
And as we would want, taking c = 0 gives us back the marginal of the binomial distribution: r X r P f P r−f f 1 3 f =`
In closing, all I’ve really done down is write down some conditional probability formulations which we all probably knew. But it’s of course a fun exercise in probability. 4
Sample Question Here’s an example of a question that’s simple enough, especially following the above analysis, but which would thrash me if ever asked in an interview. Suppose I have eight coins, three of which are two-headed. I choose three coins at random, and flip them each some number of times, not necessarily the same for each coin. All coins land heads-up. What is the minimum number of flips I need to be more than 90% sure the coins I have are the double-headed coins? To be more specific: I pick a coin, flip it k1 times, all heads. Then I pick a second coin, flip it k2 times, all heads. Then I pick a third coin, flip it k3 times, all heads. Minimize k1 + k2 + k3 subject to P[d1 , d2 , d3 |H1k1 , H2k2 , H3k2 ] ≥ 0.9. We can rewrite our probability as: P[d3 |d1 , d2 , H3k3 ]P[d2 |d1 , H2k2 ]P[d1 |H1k1 ] 1 2 3 = 1 + 2−k3 (8 − 2) 2 + 2−k2 (8 − 1) 3 + 2−k1 8 6 = (1 + 6K3 )(2 + 7K2 )(3 + 8K3 ) 6 = (1 + 6K3 )(6 + 21K2 + 16K1 + 56K2 K1 ) 6 = 336K3 K2 K1 + 126K3 K2 + 96K3 K1 + 56K2 K1 + 36K3 + 21K2 + 16K1 + 6 So we have: 6 6 2 ≥ 0.9 ⇒ −6≥α⇒α≤ α+6 0.9 3 So we have: 2 336K3 K2 K1 + 126K3 K2 + 96K3 K1 + 56K2 K1 + 36K3 + 21K2 + 16K1 ≤ 3 Now we have a decision to make. We can hope the search-space is small, and that we can find k3 , k2 , k1 such that 2−k3 , 2−k2 , 2−k1 = K3 , K2 , K1 satisfies this inequality and minimizes k3 + k2 + k1 . Or, we can take a logarithm and reason about the 72 = 49 inequalities that arise from it. The best idea is to do both; looking at: 36K3 ≤
2 3
21K2 ≤
2 3
16K1 ≤
2 3
5
We can easily calculate that k1 ≥ 5, k2 ≥ 5, k3 ≥ 6, but using these points doesn’t satisfy the general inequality. But 7, 7, 7 does. Quick calculation shows that we can obtain a better probability by going to 6, 7, 8. Thus we need at least 21 flips, and to maximize our certainty subject to 21 flips, we flip the first coin 6 times, the second coin 7 times, and the third coin 8 times.
6