Friday, January 4, 2013

A Quick Probability Conundrum

In my discussion of Kingdom Builder yesterday, I mentioned that only after a few months of playing the game did we finally draw all twelve of the scoring objective cards, and that seemed unusual. That got me thinking: how unusual is it really? How many games of Kingdom Builder should I expect to have played before I encountered all twelve at random? Here's the same problem, rephrased to make it more general:

I have twelve cards, each marked with a different integer 1 through 12. The cards are identical except for the markings. Every day, I draw three cards at random, without replacement. I record the number on each card then replace the cards into the deck. Then, I shuffle the deck. The next day, I repeat the same steps: draw without replacement, record the numbers, replace, and shuffle. We'll say for the sake of argument that I shuffle thoroughly enough to make the deck truly random, and that today's picks are independent of what yesterday's were.

How long should I expect to keep drawing cards before I've drawn all twelve? A couple of attributes of the probability distribution are intuitively obvious. The minimum number of days would be four: if you drew three different cards for each of four different days, you would get to twelve. But that has to be a pretty unlikely event; by day three, you're more likely than not to repeat at least one number. At the other end, the distribution should trail toward infinity: after an infinite number of days, you're guaranteed to have drawn all twelve, but the "guarantee" only comes at infinity. After a hundred days, or a thousand, it's less and less likely but still possible that there's one recalcitrant card that hasn't shown up.

I'd be interested in seeing the entire probability distribution, either the pdf (differentiated form) or cdf (integrated form). In particular, I want to know where the cdf crosses 0.5--that is, at what day it becomes more likely than not that I have seen all twelve. It would be an easy enough system to model, but I'm also interested in an analytical solution if it exists.