Expected maximums and fun with Faulhaber’s formula

A recent Futility Closet post inspired this one. There, Greg Ross mentioned a 2020 paper by P Sullivan titled “Is the Last Banana Game Fair?” in Mathematics Teacher. (BTW, it’s behind a paywall and I haven’t seen that paper).

Suppose Alice and Bob don’t want to share a banana. They each have a fair 6-sided die to throw. To decide who gets the banana, each of them rolls their die. If the largest number rolled is a 1, 2, 3, or 4, then Alice wins the banana. If the largest number rolled is a 5 or 6, then Bob wins. This is the last banana game. In this post, I’m not going to discuss the last banana game specifically, but instead look at a related question.

Let’s define things more generally. Let $I_n=\{1,2,...,n\}$, let $X,Y$ be two independent, uniform random variables taken from $I_n$, and let $Z=max(X,Y)$. The last banana game concerns the case $n=6$. Here, I’m interested in investigating the question: What is $E(Z)$?

Computing this isn’t hard. By definition of independent and max, we have
$P(Z\leq z)=P(X\leq z)P(Y\leq z)$.
Since $P(X\leq z)=P(Y\leq z)={\frac{z}{n}}$, we have
$P(Z\leq z)={\frac{z^2}{n^2}}$.
The expected value of $Z$ is defined as $\sum kP(Z=k)$, but there’s a handy-dandy formula we can use instead:
$E(Z)=\sum_{k=0}^{n-1} P(Z>k)=\sum_{k=0}^{n-1}[1-P(Z\leq k)]$.
Now we use the previous computation to get
$E(Z)=n-{\frac{1}{n^2}}\sum_{k=1}^{n-1}k^2=n-{\frac{1}{n^2}}{\frac{(n-1)n}{6}}={\frac{2}{3}}n+{\frac{1}{2}}-{\frac{1}{6n}}.$
This solves the problem as stated. But this method generalizes in a straightforward way to selecting m independent r.v.s in $I_n$, so let’s keep going.

First, let’s pause for some background and history. Notice how, in the last step above, we needed to know the formula for the sum of the squares of the first n consecutive positive integers? When we generalize this to selecting m integers, we need to know the formula for the sum of the m-th powers of the first n consecutive positive integers. This leads to the following topic.

Faulhaber polynomials are, for this post (apparently the terminology is not standardized) the sequence of polynomials $F_m(n)$ of degree m+1 in the variable n that gives the value of the sum of the m-th powers of the first n consecutive positive integers:

$\sum_{k=1}^{n} k^m=F_m(n)$.

(It is not immediately obvious that they exist for all integers $m\geq 1$ but they do and Faulhaber’s results establish this existence.) These polynomials were discovered by (German) mathematician Johann Faulhaber in the early 1600s, over 400 years ago. He computed them for “small” values of m and also discovered a sort of recursive formula relating $F_{2\ell +1}(n)$ to $F_{2\ell}(n)$. It was about 100 years later, in the early 1700s, that (Swiss) mathematician Jacob Bernoulli, who referenced Faulhaber, gave an explicit formula for these polynomials in terms of the now-famous Bernoulli numbers. Incidentally, Bernoulli numbers were discovered independently around the same time by (Japanese) mathematician Seki Takakazu. Concerning the Faulhaber polys, we have
$F_1(n)={\frac{n(n+1)}{2}}$,
$F_2(n)={\frac{n(n+1)(2n+1)}{6}}$,
and in general,
$F_m(n)={\frac{n^{m+1}}{m+1}}+{\frac{n^m}{2}}+$ lower order terms.

With this background aside, we return to the main topic of this post. Let $I_n=\{1,2,...,n\}$, let $X_1,X_2,...,x_m$ be m independent, uniform random variables taken from $I_n$, and let $Z=max(X_1,X_2,...,X_m)$. Again we ask: What is $E(Z)$? The above computation in the $m=2$ case generalizes to:

$E(Z)=n-{\frac{1}{n^m}}\sum_{k=1}^{n-1}k^m=n-{\frac{1}{n^m}}F_m(n-1).$

For m fixed and n “sufficiently large”, we have

$E(Z)={\frac{m}{m+1}}n+O(1).$

I find this to be an intuitively satisfying result. The max of a bunch of independently chosen integers taken from $I_n$ should get closer and closer to n as (the fixed) m gets larger and larger.

Jesse Douglas – yet another Beautiful Mind?

Nobel prize winner John Nash struggled with mental illness for most of his life. His struggles were described Sylvia Nasar’s well-known biography, A Beautiful Mind, as well as a film of the same name starring Russell Crowe. Now, John Nash is practically a household name, honored for his contributions to the mathematics of game theory and economics.

While Jesse Douglas isn’t nearly as well-known as Nash, and didn’t win a Nobel Prize, he was (with Lars Ahlfors in 1936) the first recipient of the Fields Medal. According to one source, he too struggled with mental illness. However, before talking about the life of Jesse Douglas, let’s talk about his mathematical research.

Jesse Douglas in his 40s.

His Mathematics

Jesse Douglas solved a geometrical problem formulated in the 1700’s now called Plateau’s problem.  Plateau’s problem (also known as the soap bubble problem) is to show the existence of a minimal surface with a given boundary. Plateau was a a Belgian physicist who lived in the 1800s, known for his experiments with soap films. A minimal surface is a surface that locally minimizes its area. For example, the catenoid (see below) obviously doesn’t have minimum area for its (disconnected) boundary, but any piece of it bounded by a (connected) closed curve does have minimum area.

His research was not universally recognized at first. According to Reid [R76]:

Arriving in the year 192901930 at the end of a European tour as a National Research Fellow, he proposed to talk about his not-yet-published work on Plateau’s Problem at the weekly meeting of the mathematische Gesellschaft [at Gottingen University]. The problem had been around for a long time. Many outstanding mathematicians, including Riemann himself, had worked on it. The members of the Gesellschaft simply did not believe that an American had solved it. … When he had finished his presentation, some of the members of the Gesellschaft took him severely to task on almost every detail of the proof. Let left Gottingen deeply offended …

Douglas solved this geometrical problem in his paper “Solution of the problem of Plateau,” Trans. Amer. Math. Soc. 33 (1931)263–321. Gray and Micallef [GM07] do an excellent job of describing Jesse’s mathematical approach to the problem. Jesse continued working on generalizing and refining aspects of his research on this problem for the next 10 years or so. Some of his papers in this field include

• A method of numerical solution of the problem of Plateau, Annals of Mathematics 29(1927 – 1928)180-188
• Solution of the problem of Plateau, Trans. Amer. Math. Soc. 33 (1931) 263–321.
• Systems of K-dimensional manifolds in an N-dimensional space, Mathematische Annalen 105 (1931) 707-733.
• The Problem of Plateau for Two Contours, Studies in Applied Mathematics 10(1931)315-359.
• One-sided minimal surfaces with a given boundary, Trans. Am Math Soc 34(1932)731–756.
• A Jordan space curve no arc of which can form part of a contour which bounds a finite area, Annals of Mathematics 35(1934) 100-103.
• Green’s function and the problem of Plateau, American Journal of Mathematics. 61 (1939) 545–589.
• The most general form of the problem of Plateau, American Journal of Mathematics. 61 (1939)590–608.
• Solution of the inverse problem of the calculus of variations, Proceedings of the National Academy of Sciences. 25(1939) 631–637.
• The analytic prolongation of a minimal surface across a straight line, Proc Natl Acad Sci USA. 25(1939)375-377.
• The higher topological form of Plateau’s problem, Annali della Scuola Normale Superiore di Pisa , 8 (1939)195-218.
• Minimal surfaces of higher topological structure, Annals of Mathematics 40(1939) 205-298 .
• Theorems in the inverse problem of the calculus of variations, Proc Natl Acad Sci USA, 26(1940) 215–221.

He also wrote papers in other fields in mathematics, for example, the theory of finite groups. For example, in 1951 he published a series of short articles On finite groups with two independent generators, all in the Proc. Natl. Acad. Sci. USA. Also, in 1953, he published a series of short articles On the basis theorem for finite abelian groups, again, all in the Proc. Natl. Acad. Sci. USA.

His Life

Jesse’s father, Louis, was a Polish immigrant who moved to Canada sometime in the late 1800s. The family name was changed at Canadian Immigration and I don’t know his original name. At some point, Louis married Sarah Kommel (I don’t know when the move occurred, nor when and where they met) and moved to New York City. I think both Louis and Sarah were Jewish. Jesse was born in NYC on July 3rd, 1897. According to a geneology website, Sarah’s parents were Leazar Louis Kommel (1851-1923) and Chaya Lande, who died in 1906. Jesse’s mother Sarah passed away in 1939, a year before Jesse married (as we will see below).

Jesse was educated at public schools in New York City. According to Steinhardt [St92], Jesse had a photographic memory. After graduation from high school, he entered the City College of New York and won the Belden Medal for excellence in mathematics in his first year at City College of New York, the youngest recipient at the time. He graduated with a B.Sc. cum laude in February 1916 at the age of 18. Then he entered Columbia University to undertake research under the supervision of Edward Kasner. He took part in Kasner’s seminar on differential geometry and it was there that Jesse first heard of Plateau’s Problem. He submitted his doctoral thesis On Certain Two-Point Properties of General Families of Curves in 1920 (some references say 1921). After getting his PhD, he was an instructor at Columbia from 1920 to 1926. (Was this in the statistics department or the mathematics department? I don’t know.)

He won a National Research Fellowship and as a result, traveled widely for four years, visiting Princeton and Harvard in 1927, Chicago in 1928, and Paris from 1928 to 1930, with trips to Gottingen, Hamburg, and Rome. Jesse was offered an Assistant Professor position at MIT in January 1930. He took leave of absence for a term in 1932. After his promotion to Associate Professor at MIT in 1934, he almost at once took leave of absence again to be a Research Worker at the Institute for Advanced Study in Princeton from 1934 to 1935.  After returning to MIT, Jesse was awarded the first Fields Medal, together with Lars Ahlfors, in 1936. Once again, Jesse took leave of absence from MIT in 1936, and on 1 July 1938, he resigned. According to Aronson [A13], Jesse “became ill” at this time and Levinson was eventually hired as his replacement.

This may seem like the romantic life of a world-class mathematician, but it really is very unusual to be “homeless” 18 years after your get your PhD degree. Something is going on. According to Hersh [H10]:

Douglas’s name is almost forgotten today. He is a rather tragic figure, one of several important mathematicians gravely handicapped by what are now called bipolar, and used to be called manic-depressive, symptoms. He had a junior position at M.I.T., which he lost as a result of inability to perform consistently in the classroom.

According to Steinhardt [St92],

… the volatility of this mix of great gifts and sharp emotions [in which his “over-sized sensitivity surfaced not infrequently in intense feelings articulated vigorously to colleagues”] in Douglas’ makeup worked to the disadvantage of the great American mathematician’s external fortunes.

I don’t know what he did professionally in the 1939-1940 period, but he married Jessie Nayler on June 30th, 1940 (they had one son Lewis Philip Douglas). The husband named Jesse and the wife named Jessie! He often taught summer courses at Columbia University, as well as courses at Yeshiva University. The following year, Jesse was a Guggenheim fellow at Columbia University from 1940 to 1941. He taught at Brooklyn College from 1942 to 1946, as an assistant professor. His teaching during these war years went well according to a letter of commendation (did he teach students from the US Military Academy during this time?), and he received a Distinguished Service Award. In 1945, Jesse wrote an application letter for a teaching position saying [St92]:

This semester I have a teaching load of 19 hours per week at Brooklyn College, including two advances courses: complex variables and calculus of variations.

It is no surprise that he published nothing in the period 1942-1950. According to Micallef [M13],

There is a blank in Douglas’s career from 1946 to 1950, and the later part of Douglas’ life seems to have been troubled. His marriage ended in divorce in 1950.

In 1951, Jesse published a string of papers in a completely different field, finite group theory. Perhaps the end of Jesse’s marriage to Jessie caused him to reinvent himself, mathematically?

According to some sources, Douglas harbored resentment towards his research “competition” Rado (who he strongly beieved should not be credited with solving Plateau’s Problem [St92]) and Courant, as well as J.F. Ritt at Columbia, who may have blocked his appointment to a permanent position at his alma mater. (Sadly, anti-Semitic prejudices common of those times may have played a role in Jesse’s troubles in this regard.) Never-the-less, he is remembered as a helpful, sympathetic, and approachable colleague [St92].

His resentment of Courant was unfortunate because some 10 years earlier, in March 1935, Courant invited Jesse to speak at NYU. According to Reid [R76]:

Courant’s invitation to Douglas to speak at NYU was to a certain extent an olive branch. … Remembering this past history [Jesse’s reception at Gottingen 5 years earlier], Courant did his best to make Douglas’ lecture at NYU an event.

Clearly, Courant appreciated Jesse’s work and tried to treat him with due respect.

Other than Jesse’s mathematical research, this does not appear to have been a happy period in Jesse’s life. According to O’Conner and Robertson [OCR],

His [ex-]wife, Jessie Douglas died in 1955, the year in which Douglas was appointed professor of mathematics at the City College of New York. He remained in that post for the final ten years of his life, living in Butler Hall, 88 Morningside Drive in New York.

Steinhardt [St92] recalls the difficulty that Abraham Schwartz had in having CCNY give tenure at the full professor level to Jesse, when he was 1958. Except for the logician Emil Post, there was no one at CCNY at the time close to Jesse’s stature as a researcher. Ironically, a number of CCNY mathematicians “felt threatened by the appointment”, presumably fearing Jesse would require them to work harder.

There are a number of anecdotes of Jesse Douglas and his interactions with students. They aren’t universally flattering and I hesitate to repeat the negative ones, as I don’t know how typical they are. A number of sources suggest that when he was feeling well, he could be an impressive, charismatic, self-confident expositor and a lucid  teacher. While I don’t know the teaching load at Columbia University the time (where he belonged), I do know that, compared to today’s teaching loads, they were often quite heavy (see for example Parikh’s biography, The Unreal Life of Oscar Zariski). Are these unflattering reports of Jesse’s interactions with students merely due to his oppressive teaching load and inability to handle that volume of students? I don’t know. Perhaps he held others to the same very high standards that he held himself in his own mathematical research.

References

[A13] D.G. Aronson, Norman Levinson: 1912-1975, Biographical memoirs, Nat. Acad. Sci., 2013.

[GM07] Jeremy Gray, Mario Micallef, The work of Jesse Douglas on minimal surfaces, 2013. (appeared in Bull AMS 2008)  pdf

[H10] R. Hersh, Under-represented then over-represented, College J. Math, 2010.

[M13] M. Micallef, The work of Jesse Douglas on Minimal Surfaces, talk at Universidad de Granada, 2013-02-07.

[OCR] John J. O’Connor and Edmund F. Robertson, Jesse Douglas.

[R76] C. Reid, Courant, Springer-Verlag, 1976.

[St92] F. Steinhardt, “Jesse Douglas as teacher and colleague”, in The Problem of Plateau, ed. T. Rassias, World Scientific, 1992.

Elizebeth Friedman and the Holmwood case

Elizebeth Smith Friedman was the top cryptographer for the Coast Guard (then the enforcement arm of the Department of the Treasury) during the Prohibition Era.

According to wrecksite.eu, the SS Holmewood (that ESF spelled as Holmwood), also registered as the SS Ara, was a Swedish refrigerated cargo steamer [edited 2021 – see comments below]. Read more at wrecksite: https://www.wrecksite.eu/wreck.aspx?31920

.

The Holmwood case is a prohibition-era legal investigation concerning events up to and including the seizure of the SS Holmwood in October 5, 1933, in the Hudson River, and the subsequent trial of the smugglers. The New York Intelligence Office (of the Coast Guard, as part of the Treasury Department) used Elizebeth Friedman’s cryptanalysis of telegraph messages between the Holmwood and shore-based agents.

The best source of information from the ESF collection at the Marshall library, Box 6, File 24, “Notes on the solution of cipher and code used by the Holmwood” (14 pages), dated October 11, 1934. This document describes the investigation of the liquor smuggling operations of the Holmwood from 1930 to 1934. The first interception was November, 1930, by the New York Intelligence Office of the Treasury Department. The Coast Guard operated radio stations which monitored rum-runner telegraph stations. Usually these were in a telegraph cipher-code but sometimes they were in plain English. For example, it was reported that at 1505 on October 3, 1933, Radioman First Class B. E. Howell, of the New York Intelligence Unit, intercepted the following message

Anchor the boat in good place immediately. Take all men off in one of life boats. Hide the life boat if possible. Come ashore on New York side. Call [undecoded phone number] when you come ashore. PA code.

This message was preceded by a telegraph cipher-code

JDSLE 2221 1612 WJJE …. DEMPY.

which was decoded (by the office of ESF) as

Heave your anchor immediately and get underway. Stand up the river towards Albany.

This led to the seizure of the Holmwood. ESF wrote a strong commendation letter for Radioman Howell for his hard work and dedication to service.

For more on the life of Elizebeth Friedman, see [1], [2], or the (to be published?) book Divine Fire, by Katie Letcher Lyle, with significant additional material by myself.

[1] Smith, G. Stuart, A Life in Code: Pioneer Cryptanalyst Elizebeth Smith Friedman, McFarland & Company, Jefferson, NC, 2017.

[2] Fagone, Jason, The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine Who Outwitted America’s Enemies, Dey Street Books, New York, 2017

Simple unsolved math problem, 5

This is now almost completely solved! Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and Terrance Tao solved the conjecture below in the “interior” of Pascal’s triangle (see T. Tao’s blog post for further details, with the link to the paper and a discussion).

It seems everyone’s heard of Pascal’s triangle. However, if you haven’t then it is an infinite triangle of integers with 1‘s down each side and the inside numbers determined by adding the two numbers above it:

First 6 rows of Pascal’s triangle

The first 6 rows are depicted above. It turns out, these entries are the binomial coefficients that appear when you expand $(x+y)^n$ and group the terms into like powers $x^{n-k}y^k$:

First 6 rows of Pascal’s triangle, as binomial coefficients.

The history of Pascal’s triangle pre-dates Pascal, a French mathematician from the 1600s, and was known to scholars in ancient Persia, China, and India.

Starting in the mid-to-late 1970s, British mathematician David Singmaster was known for his research on the mathematics of the Rubik’s cube. However, in the early 1970’s, Singmaster made the following conjecture [1].

Conjecture: If $N(a)$ denotes the number of times the number $a > 1$ appears in Pascal’s triangle then $N(a) \leq 12$ for all $a>1$.

In fact, there are no known numbers $a>1$ with $N(a)>8$ and the only number greater than one with $N(a)=8$ is a=3003.

References:

[1] Singmaster, D. “Research Problems: How often does an integer occur as a binomial coefficient?”, American Mathematical Monthly, 78(1971) 385–386.

Hill verses Hamming

It’s easy to imagine the 19th century Philadelphia wool dealer Frank J. Primrose as a happy man. I envision him shearing sheep during the day, while in the evening he brings his wife flowers and plays games with his little children until bedtime. However, in 1887 Frank J. Primrose was not a happy man. This is because in June of that year, he had telegraphed his agent in Kansas instructions to buy a certain amount of wool. However, the telegraph operator made a single mistake in transmitting his message and Primrose unintentionally bought far more wool than he could possibly sell. Ordinarily, such a small error has little consequence, because errors can often be detected from the context of the message. However, this was an unusual case and the mistake cost him about a half-million dollars in today’s money. He promptly sued and his case eventually made its way to the Supreme Court. The famous 1894 United States Supreme Court case Primrose v. Western Union Telegraph Company decided that the telegraph company was not liable for the error in transmission of a message.

Thus was born the need for error-correcting codes.

Introduction

Lester Hill is most famously known for the Hill cipher, frequently taught in linear algebra courses today. We describe this cryptosystem in more detail in one of the sections below, but here is the rough idea. In this system, developed and published in the 1920’s, we take a $k\times k$ matrix K, composed of integers between 0 and 25, and encipher plaintext p by $p\longmapsto c=Kp$, where the arithmetical operations are performed mod 26. Here K is the key, which should be known only to the sender and the intended receiver, and c is the ciphertext transmitted to the receiver.

On the other hand, Richard Hamming is known for the Hamming codes, also frequently taught in a linear algebra course. This will be describes in more detail in one of the sections below, be here is the basic idea. In this scheme, developed in the 1940’s, we take a $k\times k$ matrix G over a finite field F, constructed in a very particular way, and encode a message m by $m\longmapsto c=mG$, where the arithmetical operations are performed in F. The matrix G is called the generator matrix and c is the codeword transmitted to the receiver.

Here, in a nutshell, is the mystery at the heart of this post.

These schemes of Hill and Hamming, while algebraically very similar, have quite different aims. One is intended for secure communication, the other for reliable communication. However, in an unpublished paper [H5], Hill developed a hybrid encryption/error-detection scheme, what we shall call “Hill codes” (described in more detail below).

Why wasn’t Hill’s result published and therefore Hill, more than Hamming, known as a pioneer of error-correcting codes?

Perhaps Hill himself hinted at the answer. In an overly optimistic statement, Hill wrote (italics mine):

Further problems connected with checking operations in finite fields will be treated in another paper. Machines may be devised to render almost quite automatic the evaluation of checking elements $c_1,\dots,c_q$ according to any proposed reference matrix of the general type described in Section 7, whatever the finite field in which the operations are effected. Such machines would enable us to dispense entirely with tables of any sort, and checks could be determined with great speed. But before checking machines could be seriously planned, the following problem — which is one, incidentally, of considerable interest from the standpoint of pure number theory — would require solution.

– Lester Hill, [H5]

By my interpretation, this suggests Hill wanted to answer the question below before moving on. As simple looking as it is, this problem is still, as far as I know, unsolved at the time of this writing.

Question 1 (Hill’s Problem):
Given k and q, find the largest r such that there exists a $k\times r$ van der Monde matrix with the property that every square submatrix is non-singular.

Indeed, this is closely related to the following related question from MacWilliams-Sloane [MS77], also still unsolved at this time. (Since Cauchy matrices do give a large family of matrices with the desired property, I’m guessing Hill was not aware of them.)

Question 2: Research Problem (11.1d)
Given k and q, find the largest r such that there exists a $k\times r$ matrix having entries in GF(q) with the property that every square submatrix is non-singular.

In this post, after brief biographies, an even more brief description of the Hill cipher and Hamming codes is given, with examples. Finally, we reference previous blog posts where the above-mentioned unpublished paper, in which Hill discovered error-correcting codes, is discussed in more detail.

Short biographies

Who is Hill? Recent short biographies have been published by C. Christensen and his co-authors. Modified slightly from [C14] and [CJT12] is the following information.

Lester Sanders Hill was born on January 19, 1890 in New York. He graduated from Columbia University in 1911 with a B. A. in Mathematics and earned his Master’s Degree in 1913. He taught mathematics for a few years at Montana University, then at Princeton University. He served in the United States Navy Reserves during World War I. After the WWI, he taught at the University of Maine and then at Yale, from which he earned his Ph.D. in mathematics in 1926. His Ph.D. advisor is not definitely known at this writing but I think a reasonable guess is Wallace Alvin Wilson.

In 1927, he accepted a position with the faculty of Hunter College in New York City, and he remained there, with one exception, until his resignation in 1960 due to illness. The one exception was for teaching at the G.I. University in Biarritz in 1946, during which time he may have been reactivated as a Naval Reserves officer. Hill died January 9, 1961.

Thanks to an interview that David Kahn had with Hill’s widow reported in [C14], we know that Hill loved to read detective stories, to tell jokes and, while not shy, enjoyed small gatherings as opposed to large parties.

Who is Hamming? His life is much better known and details can be readily found in several sources.

Richard Wesley Hamming was born on February 11, 1915, in Chicago. Hamming earned a B.S. in mathematics from the University of Chicago in 1937, a masters from the University of Nebraska in 1939, and a PhD in mathematics (with a thesis on differential equations)
from the University of Illinois at Urbana-Champaign in 1942. In April 1945 he joined the Manhattan Project at the Los Alamos Laboratory, then left to join the Bell Telephone Laboratories in 1946. In 1976, he retired from Bell Labs and moved to the Naval Postgraduate School in Monterey, California, where he worked as an Adjunct Professor
and senior lecturer in computer science until his death on January 7, 1998.

Hill’s cipher

The Hill cipher is a polygraphic cipher invented by Lester S. Hill in 1920’s. Hill and his colleague Wisner from Hunter College filed a patent for a telegraphic device encryption and error-detection device which was roughly based on ideas arising from the Hill cipher. It appears nothing concrete became of their efforts to market the device to the military, banks or the telegraph company (see Christensen, Joyner and Torres [CJT12] for more details). Incidently, Standage’s excellent book [St98] tells the amusing story of the telegraph company’s failed attempt to add a relatively simplistic error-detection to telegraph codes during that time period.

Some books state that the Hill cipher never saw any practical use in the real world. However, research by historians F. L. Bauer and David Kahn uncovered the fact that the Hill cipher saw some use during World War II encrypting three-letter groups of radio call signs [C14]. Perhaps insignificant, at least compared to the practical value of Hamming codes, none-the-less, it was a real-world use.

The following discussion assumes an elementary knowledge of matrices. First, each letter is first encoded as a number, namely

$A \leftrightarrow 0, B \leftrightarrow 1, \dots, Z \leftrightarrow 25$. The subset of the integers $\{0, 1, \dots , 25\}$ will be denoted by Z/26Z. This is closed under addition and multiplication (mod 26), and sums and products (mod 26) satisfy the usual associative and distributive properties. For R = Z/26Z, let GL(k,R) denote the set of invertible matrix transformations $T:R^k\to R^k$ (that is, one-to-one and onto linear functions).

The construction

Suppose your message m consists of n capital letters, with no spaces. This may be regarded an n-tuple M with elements in R = Z/26Z. Identify the message M as a sequence of column vectors ${\bf p}\in R^k$. A key in the Hill cipher is a $k\times k$ matrix K, all of whose entries are in R, such that the matrix K is invertible. It is important to keep K and k secret.

The encryption is performed by computing ${\bf c} = K{\bf p},$ and rewriting the resulting vector as a string over the same alphabet. Decryption is performed similarly by computing ${\bf p} = K^{-1} {\bf c}.$.

Example 1: Suppose m is the message “BWGN”. Transcoding into numbers, the plaintext is rewritten $p_0=1, p_1=22, p_2=6, p_3=13$. Suppose the key is
$K=\left(\begin{array}{rr} 1 & 3 \\ 5 & 12 \end{array}\right).$
Using Hill’s encryption above gives $c_0=7,c_1=3,c_2=24,c_3=3$. (Verification is left to the reader as an exercise.)

Security concerns: For example, this cipher is linear and can be broken by a known plaintext attack.

Hamming codes

Richard Hamming is a pioneer of coding theory, introducing the binary
Hamming codes in the late 1940’s. In the days when an computer error could crash the computer and force the programmer to retype his punch cards, Hamming, out of frustration, designed a system whereby the computer could automatically correct certain errors. The family of codes named after him can easily correct one error.

Hill’s unpublished paper

While he was a student at Yale, Hill published three papers in Telegraph and Telephone Age [H1], [H2], [H3]. In these papers Hill described a mathematical method for checking the accuracy of telegraph communications. There is some overlap with these papers and [H5], so it seems likely to me that Hill’s unpublished paper [H5] dates from this time (that is, during his later years at Yale or early years at Hunter).

In [H5], Hill describes a family of linear block codes over a finite field and an algorithm for error-detection (which can be easily extended to error-correction). In it, he states the construction of what I’ll call the “Hill codes,” (defined below), gives numerous computational examples, and concludes by recording Hill’s Problem (stated above as Question 1). It is quite possibly Hill’s best work.

Here is how Hill describes his set-up.

Our problem is to provide convenient and practical accuracy checks upon
a sequence of n elements $f_1, f_2, \dots, f_r$ in a finite algebraic
field F. We send, in place of the simple sequence $f_1, f_2, \dots, f_r$, the amplified sequence $f_1, f_2, \dots, f_r, c_1, c_2, \dots, c_k$
consisting of the “operand” sequence and the “checking” sequence.

– Lester Hill, [H5]

Then Hill continues as follows. Let F=GF(p) denote the finite field having p elements, where $p>2$ is a prime number. The checking sequence contains k elements of F as follows:
$c_j = \sum_{i=1}^r a_{i}^jf_i,$
for $j = 1, 2, \dots, k$. The checks are to be determined by means of a
fixed matrix
$A = \left( \begin{array}{cccc} a_{1} & a_{2} & \dots & a_{r} \\ a_{1}^2 & a_{2}^2 & \dots & a_{r}^2 \\ \vdots & & & \vdots \\ a_{1}^k & a_{2}^k & \dots & a_{r}^k \\ \end{array} \right)$
of elements of F, the matrix having been constructed according to the criteria in Hill’s Problem above. In other words, if the operand sequence (i.e., the message) is the vector ${\bf f} = (f_1, f_2, \dots, f_r)$, then the amplified sequence (or codeword in the Hill code) to be transmitted is

${\bf c} = {\bf f}G,$
where $G = \left( I_r, A \right)$ and where $I_r$ denotes the
$r\times r$ identity matrix. The Hill code is the row space of G.

We conclude with one more open question.

Question 3:
What is the minimum distance of a Hill code?

The minimum distance of any Hamming code is 3.

Do all sufficiently long Hill codes have minimum distance greater than 3?

Summary

Most books today (for example, the excellent MAA publication written by Thompson [T83]) date the origins of the theory of error-correcting codes to the late 1940s, due to Richard Hamming. However, this paper argues that the actual birth is in the 1920s due to Lester Hill. Topics discussed include why Hill’s discoveries weren’t publicly known until relatively recently, what Hill actually did that trumps Hamming, and some open (mathematical) questions connected with Hill’s work.

For more details, see these previous blog posts.

Acknowledgements: Many thanks to Chris Christensen and Alexander Barg for
helpful and encouraging conversations. I’d like to explicitly credit Chris Christensen, as well as historian David Kahn, for the original discoveries of the source material.

Bibliography

[C14] C. Christensen, Lester Hill revisited, Cryptologia 38(2014)293-332.

[CJT12] ——, D. Joyner and J. Torres, Lester Hill’s error-detecting codes, Cryptologia 36(2012)88-103.

[H1] L. Hill, A novel checking method for telegraphic sequences, Telegraph and
Telephone Age (October 1, 1926), 456 – 460.

[H2] ——, The role of prime numbers in the checking of telegraphic communications, I, Telegraph and Telephone Age (April 1, 1927), 151 – 154.

[H3] ——, The role of prime numbers in the checking of telegraphic
communications, II, Telegraph and Telephone Age (July, 16, 1927), 323 – 324.

[H4] ——, Lester S. Hill to Lloyd B. Wilson, November 21, 1925. Letter.

[H5] ——, Checking the accuracy of transmittal of telegraphic communications by means of operations in finite algebraic fields, undated and unpublished notes, 40 pages.
(hill-error-checking-notes-unpublished)

[MS77] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes, North-Holland, 1977.

[Sh] A. Shokrollahi, On cyclic MDS codes, in Coding Theory and Cryptography: From Enigma and Geheimschreiber to Quantum Theory, (ed. D. Joyner), Springer-Verlag, 2000.

[St98] T. Standage, The Victorian Internet, Walker & Company, 1998.

[T83] T. Thompson, From Error-Correcting Codes Through Sphere Packings to Simple Groups, Mathematical Association of America, 1983.

A 1932 memorandum on rum-runner’s cryptosystems

During the prohibition era, organized crime made phenomenal amounts of money through illegal smuggling. Eventually, their messages were enciphered. By the late 1920s and early 1930s, these cryptosystems became rather sophisticated. Here is a memorandum form Elizebeth Friedman to CMDR Gorman in January or 1932 which gives an indication of this sophistication.

Many thanks to the George C. Marshall Foundation, Lexington, Virginia, for providing this reproduction!

Lester Hill’s “The checking of the accuracy …”, part 9

We may inquire into the possibility of undisclosed errors occurring in the transmittal of the sequence:

$\begin{array}{ccccccccccccccc} f_1 & f_2 & f_3 & f_4 & f_5 & f_6 & f_7 & f_8 & f_9 & f_{10} & f_{11} & f_{12} & c_1 & c_2 & c_3 \\ 5 & 17 & 13 & 21 & 0 & 8 & 6 & 0 & 11 & 0 & 11 & 11 & 6 & 15 & 2 \\ X & T & Y & P & V & Z & R & V & H & V & H & H & R & I & F \\ \end{array}$

Invoking the theorem established in sections 4 and 5, and formulated at the close of section 5, we may assert:

• (1) If not more than three errors are made in transmitting the fifteen letters of the sequence, and if the errors made affect the $f_i$ only, the $c_j$ being correctly transmitted, then the presence of error is certain to be disclosed.
• (2) If not more than three errors are made, all told, but at least three of them affect the $f_i$, then the presence of error will enjoy only a

$1-{\rm in}-22^3\ \ \ \ \ (1-{\rm in}-10648)$
chance of escaping disclosure.

These assertions result at once from the theorem referred to. But a closer study of the reference matrix employed in this example permits us to replace them by the following more satisfactory statements:

• (1′)
If errors occur in not more than three of the fifteen elements of the sequence $f_1f_2\dots f_{12}c_1 c_2 c_3$, and if at least one of the particular elements $f_{11}f_{12}c_2$ is correctly transmitted, the presence of error will certainly be disclosed. But if exactly three errors are made, affecting presicely the elements $f_{11}f_{12}c_2$, the presence of error will enjoy a $1$-in-$22^2$ ($1$-in-$484$) chance of escaping disclosure.
• (2′)
If more than three errors are made, then whatever the distribution of errors among the fifteen elements of the sequence, the presence of error will enjoy only a
$1$-in-$22^3$ ($1$-in-$10648$) chance of escaping disclosure.

Assertions of this kind will be carefully established below, when a more important finite field is under consideration. The argument then made will be applicable in the case of any finite field. But it is worthwhile here to look more carefully into the exceptional distribution of errors which is italicized in (1′). This will help us note any weakness that ought to be avoided in the construction of reference matrices.

Suppose that exactly three errors are made, affecting precisely $f_{11}f_{12}c_2$. If the mutilated message is to check up, and the errors to escape disclosure, we must have (for error notations, see sections 4,5):

$11\epsilon_{11}+12\epsilon_{12}=0,\ \ \ 11^2\epsilon_{11}+12^2\epsilon_{12}=\delta_2,\ \ \ 11^3\epsilon_{11}+12^3\epsilon_{12}=0.$
These equations may be written:

$11\epsilon_{11}+12\epsilon_{12}=0,\ \ \ 6\epsilon_{11}+6\epsilon_{12}=\delta_2,\ \ \ 20\epsilon_{11}+3\epsilon_{12}=0.$
But $11\epsilon_{11}=-12\epsilon_{12}$ can be written $11\epsilon_{11}=11\epsilon_{12}$, or $\epsilon_{11}=\epsilon_{12}$.
Etc. In this way, we find that the errors can escape disclosure if and only if

$\epsilon_{11}=\epsilon_{12},\ \ \ {\rm and}\ \ \ \delta_{2}=12\epsilon_{11}.$
The error $\epsilon_{11}$ can be made quite arbitrarily. But then the values of $\epsilon_{12}$ and $\delta_2$ are then completely determined. There is evidently a $1$-in-$484$ chance – and no more – that the errors will fall out just right.

The trouble arises from the vanishing, in our reference matrix, of the two-rowed
determinant

$\big{|} \begin{array}{cc} 11 & 12 \\ 11^3 & 12^3 \end{array} \big{|} .$
Note that

$\big{|} \begin{array}{cc} 11 & 12 \\ 11^3 & 12^3 \end{array} \big{|} = 11\cdot 12\cdot \big{|} \begin{array}{cc} 1 & 1 \\ 11^2 & 12^2 \end{array} \big{|} = 17 \big{|} \begin{array}{cc} 1 & 1 \\ 11^2 & 12^2 \end{array} \big{|} =0,$
since $11^2=12^2$.

From the fact that $\big{|} \begin{array}{cc} 11 & 12 \\ 11^3 & 12^3 \end{array} \big{|}$ is the only vanishing determinant of any order in the matrix employed, all other assertions made in (1′) and (2′) are readily justified. This will be made clear in the following sections.

It will be advantageous, as shown more completely in subsequent sections, to employ reference matrices which contain the smallest possible number of vanishing determinants of any orders.

Lester Hill’s “The checking of the accuracy …”, part 7

Backstory: Lester Saunders Hill wrote unpublished notes, about 40 pages long, probably in the mid- to late 1920s. They were titled “The checking of the accuracy of transmittal of telegraphic communications by means of operations in finite algebraic fields”. In the 1960s this manuscript was given to David Kahn by Hill’s widow. The notes were typewritten, but mathematical symbols, tables, insertions, and some footnotes were often handwritten. I thank Chris Christensen, the National Cryptologic Museum, and NSA’s David Kahn Collection, for their help in obtaining these notes. Many thanks also to Rene Stein of the NSA Cryptologic Museum library and David Kahn for permission to publish this transcription. Comments by transcriber will look his this: [This is a comment. – wdj]. I used Sage (www.sagemath.org) to generate the tables in LaTeX.

Here is just the seventh section of his paper. I hope to post more later. (Part 6 is here.)

Section 7: The general form of reference matrix

Let $a_1, a_2, \dots a_n$ be $n$ different elements of our finite field $F$, and let $a_i$ be different from the zero element of $F$. Then we shall emply a reference matrix of the form

$Q = \left( \begin{array}{cccc} a_1 & a_2 & \dots & a_{n} \\ a_1^2 & a_2^2 & \dots & a_{n}^2 \\ \vdots & & & \vdots \\ a_1^q & a_2^q & \dots & a_{n}^q \\ \end{array} \right)$

or of the form

$Q' = \left( \begin{array}{cccc} 1 & 1 & \dots & 1 \\ a_1 & a_2 & \dots & a_{n} \\ a_1^2 & a_2^2 & \dots & a_{n}^2 \\ \vdots & & & \vdots \\ a_1^{q-1} & a_2^{q-1} & \dots & a_{n}^{q-1} \\ \end{array} \right) \, .$

The actual procedure involved, and the real equivalence for checking purposes, of $Q$ and $Q'$, will presently be illustrated. We note here merely that each of the matrices $Q$, $Q'$ is of index $q$. Let us examine $Q'$. A similar argument will be applicable to $Q$.

Suppose, for argument, that $Q'$ contains a vanishing $q$-rowed determinant. Without loss of generality, we suppose that it is

$Q' = \left| \begin{array}{cccc} 1 & 1 & \dots & 1 \\ a_1 & a_2 & \dots & a_{q} \\ a_1^2 & a_2^2 & \dots & a_{q}^2 \\ \vdots & & & \vdots \\ a_1^{q-1} & a_2^{q-1} & \dots & a_{q}^{q-1} \\ \end{array} \right| \, .$
Since this determinant is equal to zero, Lemma 4 guarantees the existence, in our field $F$, of the elements

$\lambda_1, \lambda_2, \dots \lambda_q,$

not all equal to zero, for which we may write

$\lambda_1 + \lambda_2 a_i +\lambda_3 a_i^2 + \dots + \lambda_q a_q^{q-1} = 0,$
($i=1,2, \dots, q$). The $a_i$ being, as assumed, all different, this implies that the equation

$\lambda_1 + \lambda_2 x +\lambda_3 x^2 + \dots + \lambda_q x^{q-1} = 0,$

has $q$ different roots in $F$, an implication which contradicts Lemma 5.

Thus $Q'$ contains no vanishing $q$-rowed determinant, and if of index $q$. In the same way, $Q$ may be shown to be of index $q$. But unless these matrices are constructed with care, they will, of course, generally contain vanishing determinants of various orders less than $q$. We defer for the moment the further consideration of this matter.

Lester Hill’s “The checking of the accuracy …”, part 6

Backstory: Lester Saunders Hill wrote unpublished notes, about 40 pages long, probably in the mid- to late 1920s. They were titled “The checking of the accuracy of transmittal of telegraphic communications by means of operations in finite algebraic fields”. In the 1960s this manuscript was given to David Kahn by Hill’s widow. The notes were typewritten, but mathematical symbols, tables, insertions, and some footnotes were often handwritten. The manuscript is being LaTeXed “as we speak”. I thank Chris Christensen, the National Cryptologic Museum, and NSA’s David Kahn Collection, for their help in obtaining these notes. Many thanks also to Rene Stein of the NSA Cryptologic Museum library and David Kahn for permission to publish this transcription. Comments by transcriber will look his this: [This is a comment. – wdj]. I used Sage (www.sagemath.org) to generate the tables in LaTeX.

Here is just the sixth section of his paper. I hope to post more later. (Part 5 is here.)

Section 6: Arbitrary distrbution of errors affecting the $f_i$ and the $c_j$

It is highly desirable, although not strictly necessary, to fashion the reference matrix $Q$ in such a manner that every determinant contained in it, of every order, be different from zero.

Reference matrices of a practical type, and without a vanishing determinant of any order, may be construted in certain finite fields $F$ by very direct methods which can be more easily illustrated than described. It will be expedient at this poin, even at the expense of breaking the continuity of our discussion, to introduce concrete examples of operations in finite fields, and to show how reference matrices be conveniently set up in particular cases. The desirability of contriving that such matrices contain no vanishing determinant can then be brought out more effectively. The examination of arbitrary distributions of errors among the $n+q$ elements of a telegraphic sequence

$f_1, f_2, \dots, f_n, c_1, c_2, \dots, c_q$

can be deferred, therefore, until later.

But, before going into details, we will describe the general form of reference matrix to be employed in all cases.

Lester Hill’s “The checking of the accuracy …”, part 5

Backstory: Lester Saunders Hill wrote unpublished notes, about 40 pages long, probably in the mid- to late 1920s. They were titled “The checking of the accuracy of transmittal of telegraphic communications by means of operations in finite algebraic fields”. In the 1960s this manuscript was given to David Kahn by Hill’s widow. The notes were typewritten, but mathematical symbols, tables, insertions, and some footnotes were often handwritten. The manuscript is being LaTeXed “as we speak”. I thank Chris Christensen, the National Cryptologic Museum, and NSA’s David Kahn Collection, for their help in obtaining these notes. Many thanks also to Rene Stein of the NSA Cryptologic Museum library and David Kahn for permission to publish this transcription. Comments by transcriber will look his this: [This is a comment. – wdj]. I used Sage (www.sagemath.org) to generate the tables in LaTeX.

Here is just the fifth section of his paper. I hope to post more later. (Part 4 is here.)

Section 5: The error distribution with at least $q$ errors $\epsilon_i$

Let the sequence (footnote: Arbitrary error distributions will be considered later. Cf. sections 15-18.)

$f_1, f_2, \dots, f_n,\ c_1, c_2, \dots, c_q$
be transmitted in the form

$f_1', f_2', \dots, f_n',\ c_1', c_2', \dots, c_q' \, ,$
containing $t\geq q$ errors which affect the $f_i$, and $h\geq 0$ errors which affect the $c_j$.

Let $t=q+s$. Then, to avoid discussing again the case already considered, we assume that at least one of the two integers $s, h$
is greater than $0$.

Let the errors among the $c_j$ be $\delta_{j_1}, \delta_{j_2}, \dots , \delta_{j_h}$, affecting the $c_{j_1}, c_{j_2}, \dots , c_{j_h}$. Without loss in generality, we may assume that the $t$ errors affecting the $f_i$ are $\epsilon_1, \epsilon_2, \dots, \epsilon_t$, so that the first $t$ of the $f_i$ are transmitted in error.

Suppose that the mutilated sequence is in full check. Then:

$\sum_{i=1}^t k_{ji}(f_i+\epsilon_i)+\sum_{i=t+1}^n k_{ji}f_i =c_j+\delta_j$

or

$\sum_{i=1}^n k_{ji}f_i+\sum_{i=1}^t k_{ji}\epsilon_i =c_j+\delta_j$

whence

$\sum_{i=1}^t k_{ji}\epsilon_i=\delta_j$

where $j=1,2,\dots, q$ and where $\delta_j$ in $c_j$ and may, of course, be
equal to $0$.

It follows that

$\sum_{i=1}^q k_{ji}\epsilon_i = \delta_j - \sum_{i=1}^t k_{ji}\epsilon_i = b_j = \phi_j(\epsilon_{q+1}, \epsilon_{q+2}, \dots , \epsilon_t, \delta_j),$
where $j=1,2,\dots, q$ and where $\phi_j$ denotes a polynomial in the errors $\epsilon_{q+1}, \epsilon_{q+2}, \dots , \epsilon_t, \delta_j$.

The index of the reference matrix being $q$, the determinant

$\left| \begin{array}{cccc} k_{11} & k_{12} & \dots & k_{1n} \\ k_{21} & k_{22} & \dots & k_{2n} \\ \vdots & & & \vdots \\ k_{q1} & k_{q2} & \dots & k_{qn} \\ \end{array} \right|$

of the systems of equations

$\sum_{i=1}^q k_{ji}\epsilon_i = b_j \ \ \ \ \ \ \ \ \ \ (j=1,2,\dots, q)$

is not zero. Hence if all the $b_j$ are equal to zero, Lemma 1 demands that all the $\epsilon_i$, for $i=1, 2, \dots, q$, be equal to zero, contrary to hypothesis.

If at least one of the $b_j$ is different from $0$, Lemma 3 shows that $\epsilon_{1}, \epsilon_{2}, \dots , \epsilon_q$ are uniquely determined as functions of the $b_j$, and therefore as functions of $\epsilon_{q+1}, \epsilon_{q+2}, \dots , \epsilon_t, \delta_1, \delta_2, \dots, \delta_q$:

$\epsilon_i = \xi_i (\epsilon_{q+1}, \epsilon_{q+2}, \dots , \epsilon_t, \delta_1, \delta_2, \dots, \delta_q),\ \ \ \ \ \ \ \ \ \ (j=1,2,\dots, q)$

the functions $\xi_i$ being, of course, rational functions. But, by assumption, all of the $\delta_j$, except $\delta_{j_1}, \dots, \delta_{j_h}$, vanish. Hence $\epsilon_i$ ($i=1,2,\dots, q$) is a rational function of $\epsilon_{q+1}, \epsilon_{q+2}, \dots , \epsilon_t, \delta_{j_1}, \delta_{j_2}, \dots, \delta_{j_h}$.

We conclude that if the mutilated message, containing $t+h$ errors, is to check up, so that the presence of error can escape detection, $q$ of the errors must be carefully adjusted as rational functions of the remaining $t+h-q$ errors.

It is easy to measure the chance that mere accidents of erroneous telegraphic transmittal might effect this adjustment. Let $\Gamma$ denote the total number of elements in the finite algebraic field $F$ which we are dealing. When an error is made in transmitting a sequence of elements of $F$, the error may evidently have any one of $\Gamma -1$ values — an error with the value $0$ being no real error at all. It is therefore clear that our set of $t+h$ errors will enjoy only $1$ chance in $(\Gamma -1)^q$ of being accidentally adjusted so as to escape detection.

When $t=q+s$ errors occur among the $f_i$ and $h$ errors occur among the $c_j$ in the transmittal of the sequence $f_1, f_2, \dots, f_n,\ c_1, c_2, \dots, c_q$, the errors being all of accidental telegraphic origin, the presence of error will infallibly be disclosed if $h=s=0$; and will enjoy a $1$-in-$(\Gamma -1)^q$ of escaping disclosure if either of the two integers $h$, $s$ is greater than $0$.

This statement summarizes the results of the present section and those of section 4.