27. November 2022

Pub combinatorics: the joy of rediscovery

Learning is often a process of rediscovering things. I enjoyed this process recently when trying to solve a combinatorial puzzle, in the pub, in the bus, and at home. A tale of allowing myself to explore, to tinker, and to be awed by the great thinkers of long gone times.

I got the following puzzle from my family: Let’s assume you have three positions: _ _ _. You have five chips. You may distribute up to five chips over these three positions. For example, you can put three chips on the first position, zero chips on the second, one chip on the third, and discard the remaining chip: 3 0 1. This forms the number 301. How many such numbers can you form with the five chips?

If you had asked me this question during secondary education, I might have been able to remember the approach on how to count the number of solutions, simply because I might have recently learnt it in maths class. If you had asked me to provide a solution under time pressure today, I might have tried to look it up on Wikipedia. However, my years at school are long gone. There’s also some joy in trying to solve the puzzle first without resorting to research, whether online or in a book made of actual paper.

My first thought was to establish some bounds for the number of solutions. First, there are only 1000 numbers with three digits, including the zero, thus there are fewer than 1000 solutions. Second, the biggest number you can form is 5*100=500, thus there are fewer than 501 solutions, which includes the zero. Third, you can put zero or up to five chips on the first position, and will have up to six choices when deciding on the second and third position. Thus, there are fewer than 6³ = 216 solutions. I put the puzzle aside, and went to work instead.

In the evening, I went to a pub with a friend after work. We discussed the puzzle for a bit, exploring a recursive approach: What if we have zero chips and zero positions? What if we add a chip? What if we add a position? What if we just list a few solutions and try to spot a pattern? However, our conversation moved on, agreeing that we will solve it another time. Thus, I put the puzzle aside, and went home instead.

The next day, I thought, why not write a small program? Counting to 500 is a simple problem for a computer, and writing a small Go program still squarely within my skillset:

func main() {
    s := []int{}
    for i := 0; i <= 500; i++ {
        h, t, o := i/100, (i%100)/10, (i % 10)
        if h+t+o <= 5 {
            s = append(s, i)
        }
    }
    fmt.Printf("%d: %v\n", len(s), s)
}

Spoiler alert, above program told me that there are 56 solutions (no unit tests, would you trust it?). But the actual number of solutions is not as interesting as understanding why there are 56 solutions. The puzzle isn’t solved yet. This is just the beginning.

Sleep is a mysterious and wonderful thing. A day after engaging with the problem in the pub, I intuitively stumbled over a different, recursive algorithm which seemed to produce the correct answer (56), at least for inputs p=3 (positions) and c=5 (chips).

func s(p, c int) int {
    if p == 0 || c == 0 {
        return 1
    }
    return s(p, c-1) + s(p-1, c)
}

In short, above program expresses the following (so far unproven) idea: reduce the problem of counting the solutions to the smaller (3, 4)-problem of three positions and only four chips, as well as the problem of counting the solutions to the smaller (2, 5)-problem of only two positions and five chips. If you have no positions (p=0), then the only number you can form is zero: this makes sense. If you have no chips, then the only number you can form is zero: this also makes sense. In other words:

   s(p, c) = s(p, c-1) + s(p-1, c)    [for p > 0, c > 0]
   s(p, 0) = 1
   s(0, c) = 1

But as I said, above proposal came to me intuitively. Is it correct? If correct, why? Is it wrong? If wrong, how so? Let’s find out. Time to think.

After copious amounts of thinking (it’s the weekend, I say to myself), I realize that we can explain the recursive algorithm as follows, for p>1,c>1: To generate a solution, pick any of the p positions. You can choose to either leave that position empty in the solution, or to have at least one chip on that position.

  1. If you decide to leave the position empty, there are s(p-1, c) ways of distributing the remaining c chips over the p-1 remaining positions.
  2. If you place a chip, there are s(p, c-1) ways of distributing the remaining c-1 chips over p positions.

NOTE: There is a constraint, which I only discovered after thinking about generalizations beyond the (3, 5)-problem: if we want to represent digits in base 10, then we cannot use the algorithm for c>9. Otherwise, we would be able to generate solutions such as 3 10 0. Even if interpreted 3 10 0 as 3*100+10*10=400, the algorithm would start “double-counting” solutions because 4 0 0 also represents 400 in such an interpretation.

Time to think? Or time to plot? In order to see how the function s(p,c) computes the result for the (3, 5)-problem, I let the program construct the graph during execution: linking node s(p,c) with its two children s(p, c-1) and s(p-1,c), showing the number of solutions (= ...) in each node, and showing the partial results contributed to the sum on each edge.

Beautiful. But what’s going on? By this time, I had a hunch that I had seen these numbers before. This is the point where I decided to do some research, and looked up Pascal’s triangle on Wikipedia. Which I painstakenly reproduced here by hitting the spacebar the right number of times, in the right places:

               1
             1   1
           1   2   1
         1   3   3   1
       1   4   6   4   1
     1   5  10  10   5   1
   1   6  15  20  15   6   1
 1   7  21  35  35  21   7   1

Do the numbers look familiar? They do. The Unfortunately, the solution (56) to the (3, 5)-problem would appear in the ninth row, which was no longer shown in the example with eight rows on Wikipedia. So let’s extend Pascal’s triangle into something that looks vaguely like a Christmas tree:

               1
             1   1
           1   2   1
         1   3   3   1
       1   4   6   4   1
     1   5  10  10   5   1
   1   6  15  20  15   6   1
 1   7  21  35  35  21   7   1
          56      56

Can we thus simply read off the values of s(p,c)in Pascal’s triangle? As Wikipedia goes on to lecture, the n-th row and k-th column in Pascal’s triangle has the value of the binomial coefficient “n choose k”, with n=0 denoting the first row, and k=0 denoting the first column in the triangle. For example, with n=8 and k=3

c2f5454022bdc2c79437578fa988b4b26b61b428

Tada. Time to substitute. Let n=p+c and k=c, and fill in the values from the original (3, 5)-puzzle:

89fb44642dd17e4f76707192045d00f9b73f650d

On to something! The binomial coefficient “8 choose 3” counts the number of possible subsets with three elements, chosen out of a set with eight elements. But what does it mean in this context? We have five chips and three positions, after all. How come that this puzzle reduces to calculating “n choose k”?

After a detour into Donald Knuth’s book on The Art of Computer Programming: Volume 4A: Combinatorial Algorithms (where I learned some historical context around the term “multiset”), I looked up how to count multisets, again in Wikipedia. A multiset allows repetition of an element. Why multisets? I realized that we can formulate the original problem (“distribute up to five chips over three positions”) differently. Imagine how you can represent a three-digit number as just a multiset of repeated hundreds, repeated tens, repeated ones.

Let’s say there are four positions p' for the digits (hundreds, tens, ones, zeros): _ _ _ _, and five chips. Placing a chip on position denoting zero is equivalent to not using a chip in the original formulation. How many ways to distribute all five chips over four positions? It’s equialent to asking how many ways to distribute up to five chips over three positions.

The number of k-subsets taken from a set of cardinality n is “n multichoose k”. How to count multisets tells us about the following identity, and provides the missing link to why this puzzle reduces to calculating binomial coefficients:

867bacbc9dd4ba87c28cfdc2195a61797e04a328

How many ways to distribute all c=5 chips over p'=p+1=4 positions? This is a “p’ multichoose c’” problem. Or in the concrete case at hand, a “4 multichoose 5” problem:

1b8a1d2330e1d4cc269b3b43cdd628a42586b2e1

Tada, again. Gone full circle? Almost. Why almost? Well, because I said that if you had asked me this question during secondary education, I might have been able to remember the approach… when I got this far, I started remembering what Wikipedia called the Stars and bars approach. Donald Knuth also gave me the name of the table that we learnt at school, but whose content I had forgotten: The Twelvefold way. With the exception of the pre-work of reframing the problem slightly such that all chips must be used, we can find the the approach and formula for the puzzle in the twelvefold way.

Finally, I tested the recursive algorithm s(p,c) with larger inputs (ignoring the caveat that the result no longer counts the number of p-digit numbers if c>9). Bad experience: the recursive algorithm performed terribly, with exponential complexity. So I rewrote it, with the rediscovered knowledge in mind, to just compute the binomial coefficient:

func s(p, c int64) int64 {
    if p > c {
        return s(c, p)
    }
    var n, d, i int64
    n, d = 1, 1
    for i = 1; i <= p; i++ {
        n *= i + c
        d *= i
    }
    return n / d
}

With the usual caveat that you should not copy-paste untested code from the Internet into your spacecraft, that concludes the long-winded journey of solving the puzzle. This puzzle would have probably taken very little time for someone else, but who cares! For me, it was a great experience of discovery and rediscovery, remembering my passion for combinatorics that lay dormant for so long. If you find small puzzles, remember that there is value and joy to be found in the journey.