You’re going to play a game. Like many probability games, this one involves an infinite supply of ping-pong balls. No, this game is not quite beer pong.

The balls are numbered 1 through N. There is also a group of N cups, labeled 1 through N, each of which can hold an unlimited number of ping-pong balls. The game is played in rounds. A round is composed of two phases: throwing and pruning.

During the throwing phase, the player takes balls randomly, one at a time, from the infinite supply and tosses them at the cups. The throwing phase is over when every cup contains at least one ping-pong ball. Next comes the pruning phase. During this phase the player goes through all the balls in each cup and removes any ball whose number does not match the containing cup. Every ball drawn has a uniformly random number, every ball lands in a uniformly random cup, and every throw lands in some cup. The game is over when, after a round is completed, there are no empty cups.

How many rounds would you expect to need to play to finish this game? How many balls would you expect to need to draw and throw to finish this game?

(fivethirtyeight)

Solution

This is best broken down into two problems:

  • How many tosses are expected before each cup has been “retired” – filled for the first time with a ball of the same number?
  • How many tosses are expected per round?

Expected total tosses

When $R$ cups have already been retired (for each $R$ from $0$ to $N-1$), the chance that a new ball will land in a matching, unretired cup is the chance, $(N-R)/N$, that we draw a ball matching an unretired cup, times the chance, $1/N$, that the ball lands in its matching cup. So the chance that we retire a cup throwing a ball with $R$ cups already retired is $(N-R)/N^2$, and the total expected number of tosses is:

\[\sum_{R=0}^{N-1} \frac{N^2}{N-R} = N^2 \sum_{i=1}^N \frac{1}{i} = N^2H_N\]

where $H_N$ is the $N$th harmonic number. This is a coupon collector problem.

Expected tosses to retire j cups

The number of tosses expected to retire j cups can be expressed similarly:

\[T_j = \sum_{R=0}^{j-1} \frac{N^2}{N-R} = N^2 \sum_{i=N-j+1}^N \frac{1}{i} = N^2(H_N - H_{N-j+1})\]

Expected tosses per round

A cup is “visited” in a round if a ball lands in it.

If $R$ cups have been retired before the start of the round, and $V$ of the $N-R$ unretired cups have been visited in it so far, the probability that a new, unretired cup is visited on a given toss is $(N-R-V)/N$, and so the expected number of tosses in the round is:

\[\sum_{V=0}^{N-R-1} \frac{N}{N-R-V} = N \sum_{M=1}^{N-R} \frac{1}{M} = N(H_{N-R})\]

Let $E_i$ be the expected number of tosses in round $i$, and let $R_i$ be the expected number of cups retired by the end of round $i$ (let $R_0$ be $0$). Then:

\[E_i = NH_{N-R_{i-1}}\]

The value $R_i$ is the greatest integer $j$ such that $T_j$ is less than or equal to $E_i$.

Therefore $E_1$ is $NH_N$.