Counterfeit Coin Puzzles

September 10, 2023

Daniel Kline has been playing around with an in-person version of Panning for Gold and was wondering what other puzzle variants could be done on a balance scale. This post is an attempt to answer that question!

Setup

Panning for Gold is an activity based on the classic counterfeit coin balance puzzles where the solver is charged with identifying a counterfeit coin using as few weighings as possible on a balance scale. The two classic examples are as follows:

The \(9\)-coin problem

You have \(9\) coins, one of which is counterfeit, weighing slightly less than the others. How many weighings do you need to identify the counterfeit?

The \(12\)-coin problem

You have \(12\) coins, one of which is counterfeit, weighing either slightly less or slightly more than the others -- but you don't know which! How many weighings do you need to identify the counterfeit?

We always mean how many weighings in the worst case -- that is to say, the solver should not expect to get lucky in weighing two coins against each other and see one is lighter with a single weighing!

The JRMF exchanges the roles of counterfeit and real so that there are several counterfeit gold bars and only one real gold bar. This substitution doesn't impact the analysis, so I'll stick with the classic formulation here. The JRMF version is currently app-dependent, however, with the app acting as the scale.

I've run versions of this activity in the past where a facilitator acts as the scale. In this implementation, there is a collection of tokens that are identical except for their labels. The facilitator memorizes a label to be the counterfeit but tells no one. Then students charge the facilitator with weighing different groups of tokens using their hands as the bowls, lowering the hand that "weighs" more and raising the hand that "weighs" least. This works pretty well, but requires a pretty high ratio of facilitators to students unless the students are working in large groups.

They're not the cheapest, but there are some toy balance scales that work reasonably well, which is what we'll be testing at festivals. Here are some options:

If playing with tokens of actually differing weights and a balance, it is useful to have a way to randomize them. For example, for problems where the counterfeit could be light or heavy, you can place a light token and a heavy token in a bag, shake it up, and then choose one randomly to pair with the standard-weight tokens.

Challenges

A traditional question sequence looks something like this:

You have \(8\) coins where one is heavier than the others. Can you devise a strategy for finding this counterfeit coin?
How many weighings does your strategy require in the worst case? Would it ever require \(10\) weighings, for example?
Can you devise a strategy to find the counterfeit coin that never requires more than \(3\) weighings?
Is there a strategy to always find it in \(2\) weighings? What about \(1\) weighing?
How does your strategy change for \(9\) coins? \(10\) coins?
Does having more coins always mean more weighings?
If there are \(n\) coins, can you predict how many weighings you'll need to find the light coin?
If the counterfeit coin is lighter than the others instead of heavier, how does this change your strategy?
If the counterfeit coin weighs a different amount than the others, but you don't know whether it's lighter or heavier, how does this change your strategy?
If there are \(n\) coins, can you predict how many weighings you'll need to find the heavy-or-light fake?

Here are some other problems to consider. (One cannot be solved, but why?)

You have \(4\) coins and \(2\) are heavier than the others. Can you devise a strategy for identifying these counterfeit coins?
Find a strategy that requires only \(2\) weighings in the worst case.
Devise a strategy for finding \(2\) heavy counterfeit coins among \(5\) coins. What's the largest number of weighings you need in the worst case?
What if you have \(3\) heavy counterfeits among \(5\) coins?
\(2\) heavy counterfeits among \(6\) coins?
\(3\) heavy counterfeits among \(6\) coins?
\(2\) counterfeits among \(4\) coins, both of them heavy or both of them light, but you don't know which?
\(2\) counterfeits among \(5\) coins, same conditions.
\(2\) counterfeits among \(4\) coins, one heavy and one light, with the two counterfeits together weighing the same as two non-counterfeits.
\(2\) counterfeits among \(5\) coins, same conditions.
\(2\) counterfeits among \(4\) coins, one heavy and one light, with the two counterfeits together weighing more than two non-counterfeits.
\(2\) counterfeits among \(5\) coins, same conditions.
\(2\) counterfeits among \(4\) coins, one heavy and one light, but we don't know anything about the relative weights of the counterfeits.
\(2\) counterfeits among \(5\) coins, same conditions.
\(2\) counterfeits among \(5\) coins, each independently heavy or light with no known relation.
\(1-2\) heavy counterfeits among \(4\) coins, but we don't know whether it's \(1\) or \(2\).
\(1-2\) heavy counterfeits among \(5\) coins, but we don't know whether it's \(1\) or \(2\).
\(1-2\) heavy or light counterfeits among \(5\) coins, either all heavy or all light
\(1-2\) heavy or light counterfeits among \(5\) coins, independently heavy or light, a heavy and a light weight the same as two non-counterfeits
\(1-2\) heavy or light counterfeits among \(5\) coins, independently heavy or light, no known relation between heavy and light

Notes

The main observation is that anytime you make a weighing, there are up to three possible outcomes:
1. The left bowl is heavier than the right bowl
2. The right bowl is heavier than the left bowl
3. The bowls are balanced
This means that in constructing an algorithm, each step involves a weighing, observing one of the three possible outcomes, and making a decision for what to do next in the next step. That decision could be another weighing or it could be declaring which coins are which.

This gives an algorithm the structure of a ternary tree, with each node either terminal or branching into three new nodes. Since identifying which coins are which requires us to be able to tell the difference between all possible configurations (i.e., possible numberings of the counterfeits among the coins), every possible configuration would have to be a terminal node (or leaf) in this tree. If \(n\) configurations are possible, our tree needs \(n\) leaves.

Here's such a tree for \(5\) coins, where one is heavier than the others:

The way to read it is the \(5\) coins are labeled \(0-4\). The symbol \((a_1, ..., a_i \, | \, b_1, ..., b_j )\) means weigh coins \(a_1, ..., a_i\) in the left bowl against coins \(b_1, ..., b_j\) in the right bowl. Follow an \(L\) arrow if the left bowl is heavier, \(R\) if the right bowl is heavier, and \(B\) if the bowls are balanced. \(H:[c_1, ..., c_k]\) is the conclusion that \(c_1, ..., c_k\) are the heavy coins.

For such a decision tree, the worst case number of weighings is the depth of the tree -- i.e., the maximal number of edges that must be traversed to get from the root to a leaf. The shallowest this tree could be for \(n\) configurations is \(\lceil \log_3 n \rceil\), since this is how deep a tree would be if leaves do not occur in earlier layers than necessary. In this sense, the tree above is the best possible, since there are \(5\) possible configurations (one where each of coins \(0-4\) is the heavy one) and
\[ \lceil \log_3 5 \rceil = \lceil 1.465 \rceil = 2, \]
so a worst case of \(2\) weighings is the best we could have hoped for, which the tree accomplishes.
If there is exactly one counterfeit coin that is known to be heavier, then the best we can do is \(k\) weighings precisely when \(3^{k-1} < n \leq 3^k\), as expected. One way to do this is as follows:

Starting with a pile of \(n \leq 3^k\) coins, we divide it into three piles of sizes \(q,q,r\), where \(q\) is as large as possible but \(\leq 3^{k-1}\). (Since \(n \leq 3^k\), this forces \(r = n-2q \leq 3^{k-1}\).) We then weigh the piles of size \(q\) against each other. If one is heavier, we keep this heavier pile and discard the other two. If they are balanced, we keep the pile of size \(r\) and discard the other two. Now we have a pile of size \(\leq 3^{k-1}\) to work with and repeat the procedure until we finally keep a pile of size \(3^0 = 1\), which must be the heavy coin.

This takes at most \(k\) iterations and thus at most \(k\) weighings.
If there are \(2\) heavy counterfeits, it's much harder to give a descriptive general algorithm with a minimal number of weighings in the worst case. When there is only one counterfeit, each of the three feedback indicators \(L,B,R\) points to the pile that the heavy coin is in. When there are two counterfeits, they are less decisive
- \(L\): both heavies are on the left or one is on the left and the other off the scale
- \(R\): both heavies are on the right or one is on the right and the other off the scale
- \(B\): both heavies are off the scale one is on the left and the other on the right
A further complication when there are \(2\) heavy counterfeits among \(n\) coins is that the total number of possible configurations is instead
\[ \left(\begin{matrix} n \\ 2 \end{matrix}\right) = \frac{n(n-1)}{2}, \]
which grows more quickly than \(n\). For example,
\[ \left(\begin{matrix} 5 \\ 2 \end{matrix}\right) = 10, \]
which is just slightly larger than \(3^2\), so the best we could hope for is \(3\) weighings. Here's a tree that does it:

\[ \left(\begin{matrix} 6 \\ 2 \end{matrix}\right) = 15 \qquad \textrm{ and } \qquad \left(\begin{matrix} 7 \\ 2 \end{matrix}\right) = 21 \]
are both larger than \(3^2\) but smaller than \(3^3\), so it's plausible that we could find two counterfeits among \(6\) or \(7\) coins in \(3\) weighings, and it turns out both are possible! Here are solutions for each:

On the other hand,
\[ \left(\begin{matrix} 8 \\ 2 \end{matrix}\right) = 28 > 3^3, \]
so we will need at least one more weighing to find two fakes among \(8\) coins. An algorithm that uses no more than \(4\) weighings exists, but we're already at a scale where my embedded trees are straining my eyes!
We can grow the number of counterfeits among \(n\) coins. By symmetry, if there are \(k\) counterfeits and \(k > n/2\), then we can adapt a solution for finding \(n-k\) counterfeits. This means the interesting cases are \(1 \leq k \leq n/2\). For example,
\[ \left(\begin{matrix} 6 \\ 3 \end{matrix}\right) = 20, \]
so there could be an algorithm that requires no more than \(3\) weighings to detect \(3\) counterfeits among \(6\) coins. This is possible, and here is a solution:
I've been careful to talk about when nice solutions might plausibly exist versus when they do. As far as I know, if \(n\) coins have \(c\) possible counterfeit configurations with \(3^{k-1} < c \leq 3^k\), whether those \(c\) configurations can be distinguished in \(k\) or fewer weighings is generally an open problem. Every example we've looked at so far has had such a solution.

By our argument above, when there is exactly \(1\) heavy counterfeit we have such an algorithm. When there are \(2\) heavy counterfeits among \(n\) coins, Ratko Tošić proved in 1981 that the worst-case number of weighings \(k\) satisfies
\[ \left\lceil \log_3 \left(\begin{matrix} n \\ 2 \end{matrix}\right) \right\rceil \leq k \leq \left\lceil \log_3 \left(\begin{matrix} n \\ 2 \end{matrix}\right) \right\rceil + 1 \]
This lower bound corresponds to the case where the ternary tree is as shallow as possible, so the proof is that we never need one more than the best plausible number of weighings. Tošić also proved in that paper that for infinitely many \(n\) values, the lower bound can be attained. As far as I know, nobody has found a proof that we can always get away with the ideal number of weighings or identified a number of coins \(n\) where we need the extra weighing.
Alexander Chudnov explored the case when there is a range of possible numbers of counterfeit coins and counterfeits can be heavy or light. He noticed that in the case where there are either \(0\), \(1\), or \(2\) counterfeits among 11 coins, each independently heavier or lighter (heavies weight the same, lights weight the same, but no known relation between heavy and light), there are
\[ 2^2 \cdot \left(\begin{matrix} 11 \\ 2 \end{matrix}\right) + 2^1 \cdot \left(\begin{matrix} 11 \\ 1 \end{matrix}\right) + 2^0 \cdot \left(\begin{matrix} 11 \\ 0 \end{matrix}\right) = 243 = 3^5 \]
configurations. Since this is a power of \(2\), a natural question is whether they can be distinguished in at most \(5\) weighings, and Chudnov found such an algorithm related to the Ternary Golay code.
When both heavy and light counterfeits are in the mix, the number of configurations doesn't change but the feedback we get from weighings does. One way to think about the information gathered from a weighing \(w\) is as a partition of the space of all configurations into three chunks. If we say \(L_w\) is the subset of configurations where the left bowl is heavier, we can similarly define \(R_w\) and \(B_w\) so that if \(C\) is all configurations, \(C\) is the disjoint union
\[ C = L_w \cup B_w \cup R_w \]
If we have two weighings \(v\) and \(w\), then
\[ \begin{align*} C & = (L_v \cup B_v \cup R_v) \cap (L_w \cup B_w \cup R_w) \\ & = (L_v \cap L_w) \cup (L_v \cap B_w) \cup (L_v \cap R_w) \\ & \qquad \cup (B_v \cap L_w) \cup (B_v \cap B_w) \cup (B_v \cap R_w) \\ & \qquad \cup (R_v \cap L_w) \cup (R_v \cap B_w) \cup (R_v \cap R_w) \end{align*} \]
and further weighings partition the space into more buckets. In the language of these buckets, our goal is to cleverly choose weighings so that by the time we are done, there is no bucket that contains both a counterfeit and a noncounterfeit coin. Once that's the case, we can distinguish counterfeits from everything else.
If you look at our solution for finding \(2\) counterfeit coins among \(6\), there are only three types of weighings made:
- Coins \(0,1,2\) against coins \(3,4,5\)
- Coin \(0\) against coin \(2\)
- Coin \(3\) against coin \(5\)
This gives us a sort of autopilot algorithm, since we really just need to make these three weighings in any order and the results can be interpreted to tell us the configuration. A question in a different direction is what is the minimum number of autopilot weighings we need to make before we can distinguish between any two configurations.

(In the language of buckets, the goal is to find weighings so that every bucket contains \(0\) or \(1\) configurations.)

While \(9\) coins had a nice decision-dependent algorithm, we could also carry out the following weighings for coins \(0,1,2,3,4,5,6,7,8\):
- Coins \(0,3,6\) left, coins \(1,4,7\) right, coins \(2,5,8\) off
- Coins \(0,1,2\) left, coins \(3,4,5\) right, coins \(6,7,8\) off
If coin \(m\) is the counterfeit, writing \(m\) in base-\(3\) as
\[ m = a_0 \cdot 3^0 + a_1 \cdot 3^1, \qquad a_i \in \{0,1,2\}, \]
then the first weighing allows us to compute \(a_0\) and the second to compute \(a_1\). (In both cases, \(L\) means \(a_i = 0\), \(R\) means \(a_i = 1\), and \(B\) means \(a_i = 2\).)