19 | June | 2012 | Yet Another Mathblog

In Stanislaw Ulam’s 1976 autobiography (Adventures of a mathematician, Scribner and Sons, New York, 1976), one finds the following problem concerning two players playing a generalization of the childhood game of “20 questions”.
(As a footnote: the history of who invented this game is unclear. It occurred in writings of Renyi, for example “On a problem in information theory” (Hungarian), Magyar Tud. Akad. Mat. Kutató Int. Közl. 6 (1961)505–516, but may have been known earlier under the name “Bar-kochba”.)

Problem: Player 1 secretly chooses a number from $1$ to $M$ ( $M$ is large but fixed, say $M = 1000000$ ). Player 2 asks a series of “yes/no questions” in an attempt to determine that number. Player 1 can lie at most $e$ times ( $e\geq 0$ is fixed). Assuming best possible play on both sides, what is the minimum number of “yes/no questions” Player 2 must ask to (in general) be able to correctly determine the number Player 1 selected?

There is a very large literature on this problem. Just google “searching with lies” (with the quotes) to see a selection of the papers (eg, this paper). It turns out this children’s game, when suitably generalized, has numerous applications to computer science and engineering.

We start with a simple example. Pick an integer from 0 to 1000000. For example, suppose you pick $2^{19}+1 = 524289$ .

Here is how I can ask you 20 True/False questions and guess your number.

Is your number less than 524288? False

Is your number less than 655360? True
Is your number less than 589824? True

Is your number less than 557056? True
Is your number less than 540672? True
Is your number less than 532480? True
Is your number less than 528384? True
Is your number less than 526336? True
Is your number less than 525312? True
Is your number less than 524800? True
Is your number less than 524544? True
Is your number less than 524416? True
Is your number less than 524352? True
Is your number less than 524320? True
Is your number less than 524304? True
Is your number less than 524296? True
Is your number less than 524292? True
Is your number less than 524290? True
Is your number less than 524289? False

This is the general description of the binary search strategy:
First, divide the whole interval of size $2^{20}$ into half and find which half the number lies in. Discard the one it does not lie in.
Second, divide the subinterval of size $2^{19}$ into half and find which half the number lies in. Discard the one it does not lie in.
Third, divide the subinterval of size $2^{18}$ into half and find which half the number lies in. Discard the one it does not lie in.
And so on.
This process must stop after 20 steps, as you run out of non-trivial subintervals.

This is an instance of “searching with lies” where no lies are told and “feedback” is allowed.

Another method:
I don’t know what your number is, but whatever it is, represent it in binary as a vector

$b = (b_0, b_1, ..., b_N)\ \ {\rm so}\ \ M = b_02^0 + b_12^1 + ... + b_N2^N,$

where $N = ceiling(\log_2(N))=20$ , where $2^{20}-1 = 1048574$ . The 20 questions are:
Is $b_0 = 0$ ?
Is $b_1 = 0$ ?
and so on
Is $b_N = 0$ ?
You may determine the value of the $b_i$ ‘s from this (since each $b_i$ is either 0 or 1).

This is an instance of “searching with lies” where no lies are told and “feedback” is not allowed.

Let’s go back to our problem where Player 1 secretly chooses a number from $1$ to $M$ ( $M$ is large but fixed, say $M = 1000000$ ). Use the following notation: If the answers between successive questions is possible and then questions may be adapted to these answers (“feedback allowed”), call this minimum number of “yes/no questions” Player 2 must ask, $f(M,e)$ . If feedback is not allowed, call this minimum number $g(M,e)$ .

Clearly, $f(M,e)\leq g(M,e).$

In practice, we not only want to know $f(M,e)$ (resp., $g(M,e$ ) but an efficient algorithm for determining the “yes/no questions” Player 2 should ask.

In his 1964 PhD thesis (E. Berlekamp, Block coding with noiseless feedback, PhD Thesis, Dept EE, MIT, 1964), Berlekamp was concerned with a discrete analog of a problem which arises when trying to design a analog-to-digital converter for sound recording which compensates for possible feedback.

To formulate this problem in the case when lies are possible, we introduce some new terms.

Definitions

A binary code is a set $C$ of $n$ -tuples of $0$ ‘s and $1$ ‘s, i.e., a subset $C\subset GF(2)^n$ , where $GF(2)={\bf Z}/2{\bf Z}=\{0,1\}$ is the field of integers $\pmod{2}$ . If we assume, in addition, $C$ is a vector space over $GF(2)$ then we say $C$ is a linear binary code. A codeword is an element of a binary code.
The Hamming distance between $n$ -tuples $w,w'\in GF(2)^n$ is the number $d(w,w')$ of non-zero coordinates of $w-w'$ .
The minimum distance of $C$ is the smallest integer $d$ such that $d(c,c')\geq d$ for all distinct $c,c'\in C$ .

For example, let

$C = \{ (0, 0, 0), (1, 1, 1) \} \subset GF(2)^2.$

This is a 1-error correcting code of minimum distance $d=3$ .

We say that $C$ is $e$ -error correcting if the following property always holds: given any $w\in GF(2)^n$ and any $c\in C$ , if $d(w,c)\leq e$ then every $c'\in C$ , $c\not= c'$ , must satisfy $d(w,c')\ge e$ (in other words, there is at most one codeword within distance $e$ from any $w\in GF(2)^n$ ).

Let $A(n,d)$ denote the largest possible size of a binary code $C\subset GF(2)^n$ of minimum distance $d$ .

Fact: If a binary code $C\subset GF(2)^n$ is $e$ -error correcting then $|C|\leq A(n,2e+1)$ .

Let $B(n,d)$ denote the largest possible size of a binary linear code $C\subset GF(2)^n$ of minimum distance $d$ . It is known that $B(n,d)=2^k$ , for some $k$ (called the dimension of $C$ ).

Clearly, $A(n,d)\leq B(n,d)$ .

A Reduction

Fact: Fix any $e$ and $M$ . $g(M,e)$ is the smallest $n$ such that $A(n,2e+1)\geq M$ .

Record the $n$ answers to all the “yes/no questions” as an $n$ -tuple of $0$ ‘s (for “no”) and $1$ ‘s (for “yes”). The binary code $C$ is the set of all codewords corresponding to correct answers. Intuitively, if you are Player 2 and you ask Player 1 what the coordinates of his codeword $c$ associated to his secret number is, and if he lies $e$ times then you will receive from him a binary vector $w$ of Hamming distance $e$ from a codeword. Since $C$ is $e$ -error correcting, you can determine $c$ from $w$ .

We want the smallest possible $n$ satisfying the condition $A(n,2e+1)\geq M$ . The values of $B(n,2e+1)$ (or $\log_2 B(n, d)$ , where we take $d=2e+1$ , to be more precise) have been tabulated for “small” values of $M$ and $e$ online at the MinT website (U. of Salzburg, Wolfgang Ch. Schmid and Rudolf Schürer), codetables.de (Markus Grassl), and Boston U. (Ari Trachtenberg). This gives us an upper bound for $g(M,e)$ , and hence also for $f(M,e)$ for “small” values of $M$ and $e$ .

Here’s a recipe for using online tables to find a good upper bound the smallest number Q of questions Player 2 must ask Player 1, if Player 1 gets to lie $e$ times:

Find the smallest k such that $2^k \geq M$ ,
go to the table minT and go to the row $d=2e+1$ ; find the smallest column number n which contains an entry greater than or equal to k.
that column number n is an upper bound for Q.

Yet Another Mathblog

Daily Archives: 2012/06/19

Searching with lies, 1