Hill verses Hamming

It’s easy to imagine the 19th century Philadelphia wool dealer Frank J. Primrose as a happy man. I envision him shearing sheep during the day, while in the evening he brings his wife flowers and plays games with his little children until bedtime. However, in 1887 Frank J. Primrose was not a happy man. This is because in June of that year, he had telegraphed his agent in Kansas instructions to buy a certain amount of wool. However, the telegraph operator made a single mistake in transmitting his message and Primrose unintentionally bought far more wool than he could possibly sell. Ordinarily, such a small error has little consequence, because errors can often be detected from the context of the message. However, this was an unusual case and the mistake cost him about a half-million dollars in today’s money. He promptly sued and his case eventually made its way to the Supreme Court. The famous 1894 United States Supreme Court case Primrose v. Western Union Telegraph Company decided that the telegraph company was not liable for the error in transmission of a message.

Thus was born the need for error-correcting codes.


Lester Hill is most famously known for the Hill cipher, frequently taught in linear algebra courses today. We describe this cryptosystem in more detail in one of the sections below, but here is the rough idea. In this system, developed and published in the 1920’s, we take a k\times k matrix K, composed of integers between 0 and 25, and encipher plaintext p by p\longmapsto c=Kp, where the arithmetical operations are performed mod 26. Here K is the key, which should be known only to the sender and the intended receiver, and c is the ciphertext transmitted to the receiver.

On the other hand, Richard Hamming is known for the Hamming codes, also frequently taught in a linear algebra course. This will be describes in more detail in one of the sections below, be here is the basic idea. In this scheme, developed in the 1940’s, we take a k\times k matrix G over a finite field F, constructed in a very particular way, and encode a message m by m\longmapsto c=mG, where the arithmetical operations are performed in F. The matrix G is called the generator matrix and c is the codeword transmitted to the receiver.

Here, in a nutshell, is the mystery at the heart of this post.

These schemes of Hill and Hamming, while algebraically very similar, have quite different aims. One is intended for secure communication, the other for reliable communication. However, in an unpublished paper [H5], Hill developed a hybrid encryption/error-detection scheme, what we shall call “Hill codes” (described in more detail below).

Why wasn’t Hill’s result published and therefore Hill, more than Hamming, known as a pioneer of error-correcting codes?

Perhaps Hill himself hinted at the answer. In an overly optimistic statement, Hill wrote (italics mine):

Further problems connected with checking operations in finite fields will be treated in another paper. Machines may be devised to render almost quite automatic the evaluation of checking elements c_1,\dots,c_q according to any proposed reference matrix of the general type described in Section 7, whatever the finite field in which the operations are effected. Such machines would enable us to dispense entirely with tables of any sort, and checks could be determined with great speed. But before checking machines could be seriously planned, the following problem — which is one, incidentally, of considerable interest from the standpoint of pure number theory — would require solution.

– Lester Hill, [H5]

By my interpretation, this suggests Hill wanted to answer the question below before moving on. As simple looking as it is, this problem is still, as far as I know, unsolved at the time of this writing.

Question 1 (Hill’s Problem):
Given k and q, find the largest r such that there exists a k\times r van der Monde matrix with the property that every square submatrix is non-singular.

Indeed, this is closely related to the following related question from MacWilliams-Sloane [MS77], also still unsolved at this time. (Since Cauchy matrices do give a large family of matrices with the desired property, I’m guessing Hill was not aware of them.)

Question 2: Research Problem (11.1d)
Given k and q, find the largest r such that there exists a k\times r matrix having entries in GF(q) with the property that every square submatrix is non-singular.

In this post, after brief biographies, an even more brief description of the Hill cipher and Hamming codes is given, with examples. Finally, we reference previous blog posts where the above-mentioned unpublished paper, in which Hill discovered error-correcting codes, is discussed in more detail.

Short biographies

Who is Hill? Recent short biographies have been published by C. Christensen and his co-authors. Modified slightly from [C14] and [CJT12] is the following information.

Lester Sanders Hill was born on January 19, 1890 in New York. He graduated from Columbia University in 1911 with a B. A. in Mathematics and earned his Master’s Degree in 1913. He taught mathematics for a few years at Montana University, then at Princeton University. He served in the United States Navy Reserves during World War I. After the WWI, he taught at the University of Maine and then at Yale, from which he earned his Ph.D. in mathematics in 1926. His Ph.D. advisor is not definitely known at this writing but I think a reasonable guess is Wallace Alvin Wilson.

In 1927, he accepted a position with the faculty of Hunter College in New York City, and he remained there, with one exception, until his resignation in 1960 due to illness. The one exception was for teaching at the G.I. University in Biarritz in 1946, during which time he may have been reactivated as a Naval Reserves officer. Hill died January 9, 1961.

Thanks to an interview that David Kahn had with Hill’s widow reported in [C14], we know that Hill loved to read detective stories, to tell jokes and, while not shy, enjoyed small gatherings as opposed to large parties.

Who is Hamming? His life is much better known and details can be readily found in several sources.

Richard Wesley Hamming was born on February 11, 1915, in Chicago. Hamming earned a B.S. in mathematics from the University of Chicago in 1937, a masters from the University of Nebraska in 1939, and a PhD in mathematics (with a thesis on differential equations)
from the University of Illinois at Urbana-Champaign in 1942. In April 1945 he joined the Manhattan Project at the Los Alamos Laboratory, then left to join the Bell Telephone Laboratories in 1946. In 1976, he retired from Bell Labs and moved to the Naval Postgraduate School in Monterey, California, where he worked as an Adjunct Professor
and senior lecturer in computer science until his death on January 7, 1998.

Hill’s cipher

The Hill cipher is a polygraphic cipher invented by Lester S. Hill in 1920’s. Hill and his colleague Wisner from Hunter College filed a patent for a telegraphic device encryption and error-detection device which was roughly based on ideas arising from the Hill cipher. It appears nothing concrete became of their efforts to market the device to the military, banks or the telegraph company (see Christensen, Joyner and Torres [CJT12] for more details). Incidently, Standage’s excellent book [St98] tells the amusing story of the telegraph company’s failed attempt to add a relatively simplistic error-detection to telegraph codes during that time period.

Some books state that the Hill cipher never saw any practical use in the real world. However, research by historians F. L. Bauer and David Kahn uncovered the fact that the Hill cipher saw some use during World War II encrypting three-letter groups of radio call signs [C14]. Perhaps insignificant, at least compared to the practical value of Hamming codes, none-the-less, it was a real-world use.

The following discussion assumes an elementary knowledge of matrices. First, each letter is first encoded as a number, namely

A \leftrightarrow 0, B \leftrightarrow 1, \dots, Z \leftrightarrow 25. The subset of the integers \{0, 1, \dots , 25\} will be denoted by Z/26Z. This is closed under addition and multiplication (mod 26), and sums and products (mod 26) satisfy the usual associative and distributive properties. For R = Z/26Z, let GL(k,R) denote the set of invertible matrix transformations T:R^k\to R^k (that is, one-to-one and onto linear functions).

The construction

Suppose your message m consists of n capital letters, with no spaces. This may be regarded an n-tuple M with elements in R = Z/26Z. Identify the message M as a sequence of column vectors {\bf p}\in R^k. A key in the Hill cipher is a k\times k matrix K, all of whose entries are in R, such that the matrix K is invertible. It is important to keep K and k secret.

The encryption is performed by computing {\bf c} = K{\bf p}, and rewriting the resulting vector as a string over the same alphabet. Decryption is performed similarly by computing {\bf p} = K^{-1} {\bf c}..

Example 1: Suppose m is the message “BWGN”. Transcoding into numbers, the plaintext is rewritten p_0=1, p_1=22, p_2=6, p_3=13. Suppose the key is
K=\left(\begin{array}{rr} 1 & 3 \\ 5 & 12 \end{array}\right).
Using Hill’s encryption above gives c_0=7,c_1=3,c_2=24,c_3=3. (Verification is left to the reader as an exercise.)

Security concerns: For example, this cipher is linear and can be broken by a known plaintext attack.

Hamming codes

Richard Hamming is a pioneer of coding theory, introducing the binary
Hamming codes in the late 1940’s. In the days when an computer error could crash the computer and force the programmer to retype his punch cards, Hamming, out of frustration, designed a system whereby the computer could automatically correct certain errors. The family of codes named after him can easily correct one error.

Hill’s unpublished paper

While he was a student at Yale, Hill published three papers in Telegraph and Telephone Age [H1], [H2], [H3]. In these papers Hill described a mathematical method for checking the accuracy of telegraph communications. There is some overlap with these papers and [H5], so it seems likely to me that Hill’s unpublished paper [H5] dates from this time (that is, during his later years at Yale or early years at Hunter).

In [H5], Hill describes a family of linear block codes over a finite field and an algorithm for error-detection (which can be easily extended to error-correction). In it, he states the construction of what I’ll call the “Hill codes,” (defined below), gives numerous computational examples, and concludes by recording Hill’s Problem (stated above as Question 1). It is quite possibly Hill’s best work.

Here is how Hill describes his set-up.

Our problem is to provide convenient and practical accuracy checks upon
a sequence of n elements f_1, f_2, \dots, f_r in a finite algebraic
field F. We send, in place of the simple sequence f_1, f_2, \dots, f_r, the amplified sequence f_1, f_2, \dots, f_r, c_1, c_2, \dots, c_k
consisting of the “operand” sequence and the “checking” sequence.

– Lester Hill, [H5]

Then Hill continues as follows. Let F=GF(p) denote the finite field having p elements, where p>2 is a prime number. The checking sequence contains k elements of F as follows:
c_j = \sum_{i=1}^r a_{i}^jf_i,
for j = 1, 2, \dots, k. The checks are to be determined by means of a
fixed matrix
A = \left( \begin{array}{cccc} a_{1} & a_{2} & \dots & a_{r} \\ a_{1}^2 & a_{2}^2 & \dots & a_{r}^2 \\ \vdots & & & \vdots \\ a_{1}^k & a_{2}^k & \dots & a_{r}^k \\ \end{array} \right)
of elements of F, the matrix having been constructed according to the criteria in Hill’s Problem above. In other words, if the operand sequence (i.e., the message) is the vector {\bf f} = (f_1, f_2, \dots, f_r), then the amplified sequence (or codeword in the Hill code) to be transmitted is

{\bf c} = {\bf f}G,
where G = \left( I_r, A \right) and where I_r denotes the
r\times r identity matrix. The Hill code is the row space of G.

We conclude with one more open question.

Question 3:
What is the minimum distance of a Hill code?

The minimum distance of any Hamming code is 3.

Do all sufficiently long Hill codes have minimum distance greater than 3?


Most books today (for example, the excellent MAA publication written by Thompson [T83]) date the origins of the theory of error-correcting codes to the late 1940s, due to Richard Hamming. However, this paper argues that the actual birth is in the 1920s due to Lester Hill. Topics discussed include why Hill’s discoveries weren’t publicly known until relatively recently, what Hill actually did that trumps Hamming, and some open (mathematical) questions connected with Hill’s work.

For more details, see these previous blog posts.

Acknowledgements: Many thanks to Chris Christensen and Alexander Barg for
helpful and encouraging conversations. I’d like to explicitly credit Chris Christensen, as well as historian David Kahn, for the original discoveries of the source material.


[C14] C. Christensen, Lester Hill revisited, Cryptologia 38(2014)293-332.

[CJT12] ——, D. Joyner and J. Torres, Lester Hill’s error-detecting codes, Cryptologia 36(2012)88-103.

[H1] L. Hill, A novel checking method for telegraphic sequences, Telegraph and
Telephone Age (October 1, 1926), 456 – 460.

[H2] ——, The role of prime numbers in the checking of telegraphic communications, I, Telegraph and Telephone Age (April 1, 1927), 151 – 154.

[H3] ——, The role of prime numbers in the checking of telegraphic
communications, II, Telegraph and Telephone Age (July, 16, 1927), 323 – 324.

[H4] ——, Lester S. Hill to Lloyd B. Wilson, November 21, 1925. Letter.

[H5] ——, Checking the accuracy of transmittal of telegraphic communications by means of operations in finite algebraic fields, undated and unpublished notes, 40 pages.

[MS77] F. MacWilliams and N. Sloane, The Theory of Error-Correcting Codes, North-Holland, 1977.

[Sh] A. Shokrollahi, On cyclic MDS codes, in Coding Theory and Cryptography: From Enigma and Geheimschreiber to Quantum Theory, (ed. D. Joyner), Springer-Verlag, 2000.

[St98] T. Standage, The Victorian Internet, Walker & Company, 1998.

[T83] T. Thompson, From Error-Correcting Codes Through Sphere Packings to Simple Groups, Mathematical Association of America, 1983.

Remarks on the Hamming [7.4.3] code and Sage

This post simply collects some very well-known facts and observations in one place, since I was having a hard time locating a convenient reference.

Let C be the binary Hamming [7,4,3] code defined by the generator matrix G = \left(\begin{array}{rrrrrrr}  1 & 0 & 0 & 0 & 1 & 1 & 1 \\  0 & 1 & 0 & 0 & 1 & 1 & 0 \\  0 & 0 & 1 & 0 & 1 & 0 & 1 \\  0 & 0 & 0 & 1 & 0 & 1 & 1  \end{array}\right) and check matrix H = \left(\begin{array}{rrrrrrr}  1 & 1 & 1 & 0 & 1 & 0 & 0 \\  1 & 1 & 0 & 1 & 0 & 1 & 0 \\  1 & 0 & 1 & 1 & 0 & 0 & 1  \end{array}\right). In other words, this code is the row space of G and the kernel of H. We can enter these into Sage as follows:

sage: G = matrix(GF(2), [[1,0,0,0,1,1,1],[0,1,0,0,1,1,0],[0,0,1,0,1,0,1],[0,0,0,1,0,1,1]])
sage: G
[1 0 0 0 1 1 1]
[0 1 0 0 1 1 0]
[0 0 1 0 1 0 1]
[0 0 0 1 0 1 1]
sage: H = matrix(GF(2), [[1,1,1,0,1,0,0],[1,1,0,1,0,1,0],[1,0,1,1,0,0,1]])
sage: H
[1 1 1 0 1 0 0]
[1 1 0 1 0 1 0]
[1 0 1 1 0 0 1]
sage: C = LinearCode(G)
sage: C
Linear code of length 7, dimension 4 over Finite Field of size 2
sage: C = LinearCodeFromCheckMatrix(H)
sage: LinearCode(G) == LinearCodeFromCheckMatrix(H)

The generator matrix gives us a one-to-one onto map G:GF(2)^4\to C defined by m \longmapsto m\cdot G. Using this map, the codewords are easy to describe and enumerate:

\begin{tabular}{ccc}  decimal & binary & codeword \\   0 & 0000 & 0000000 \\   1 & 0001 & 0001011 \\   2 & 0010 & 0010101 \\   3 & 0011 & 0011110 \\   4 & 0100 & 0100110 \\   5 & 0101 & 0101101 \\   6 & 0110 & 0110011 \\   7 & 0111 & 0111000 \\   8 & 1000 & 1000111 \\   9 & 1001 & 1001100 \\   10 & 1010 & 1010010 \\   11 & 1011 & 1011001 \\   12 & 1100 & 1100001 \\   13 & 1101 & 1101010 \\   14 & 1110 & 1110100 \\   15 & 1111 & 1111111  \end{tabular}  .

Using this code, we first describe a guessing game you can play with even small children.

Number Guessing game: Pick an integer from 0 to 15. I will ask you 7 yes/no questions. You may lie once.
I will tell you when you lied and what the correct number is.

Question 1: Is n in {0,1,2,3,4,5,6,7}?
(Translated: Is 1st bit of Hamming_code(n) a 0?)
Question 2: Is n in {0,1,2,3,8,9,10,11}?
(Is 2nd bit of Hamming_code(n) a 0?)
Question 3: Is n in {0,1,4,5,8,9,12,13}?
(Is 3rd bit of Hamming_code(n) a 0?)
Question 4: Is n in {0,2,4,6,8,10,12,14} (ie, is n even)?
(Is 4th bit of Hamming_code(n) a 0?)
Question 5: Is n in {0,1,6,7,10,11,12,13}?
(Is 5th bit of Hamming_code(n) a 0?)
Question 6: Is n in {0,2,5,7,9,11,12,14}?
(Is 6th bit of Hamming_code(n) a 0?)
Question 7: Is n in {0,3,4,7,9,10,13,14}?
(Is 7th bit of Hamming_code(n) a 0?)

Record the answers in a vector (0 for yes, 1 for no): v = (v_1,v_2,...,v_7). This must be a codeword (no lies) or differ from a codeword by exactly one bit (1 lie). In either case, you can find n by decoding this vector.

We discuss a few decoding algorithms next.

Venn diagram decoding:

We use a simple Venn diagram to describe a decoding method.

sage: t = var('t')
sage: circle1 = parametric_plot([10*cos(t)-5,10*sin(t)+5], (t,0,2*pi))
sage: circle2 = parametric_plot([10*cos(t)+5,10*sin(t)+5], (t,0,2*pi))
sage: circle3 = parametric_plot([10*cos(t),10*sin(t)-5], (t,0,2*pi))
sage: text1 = text("$1$", (0,0))  
sage: text2 = text("$2$", (-6,-2)) 
sage: text3 = text("$3$", (0,7))
sage: text4 = text("$4$", (6,-2)) 
sage: text5 = text("$5$", (-9,9)) 
sage: text6 = text("$6$", (9,9)) 
sage: text7 = text("$7$", (0,-9))  
sage: textA = text("$A$", (-13,13)) 
sage: textB = text("$B$", (13,13)) 
sage: textC = text("$C$", (0,-17)) 
sage: text_all = text1+text2+text3+text4+text5+text6+text7+textA+textB+textC
sage: show(circle1+circle2+circle3+text_all,axes=false)

This gives us the following diagram:

Decoding algorithm:
Suppose you receive v = ( v_1, v_2, v_3, v_4, v_5, v_6, v_7).
Assume at most one error is made.
Decoding process:

  1. Place v_i in region i of the Venn diagram.
  2. For each of the circles A, B, C, determine if the sum of the bits in four regions add up to 0 or to 1. If they add to 1, say that that circle has a “parity failure”.
  3. The error region is determined form the following table.

    \begin{tabular}{cc}  parity failure region(s) & error position \\   none & none \\   A, B, and C & 1 \\   B, C & 4 \\   A, C & 2 \\   A, B & 3 \\   A & 5 \\   B & 6 \\   C & 7  \end{tabular}

For example, suppose v = (1,1,1,1,1,0,1). The filled in diagram looks like

This only fails in circle B, so the table says (correctly) that the error is in the 6th bit. The decoded codeword is c = v+e_6 = (1,1,1,1,1,1,1).

Next, we discuss a decoding method based on the Tanner graph.

Tanner graph for hamming 7,4,3 code

The above Venn diagram corresponds to a bipartite graph, where the left “bit vertices” (1,2,3,4,5,6,7) correspond to the coordinates in the codeword and the right “check vertices” (8,9,10) correspond to the parity check equations as defined by the check matrix. This graph corresponds to the above Venn diagram, where the check vertices 8, 9, 10 were represented by circles A, B, C:

sage: Gamma = Graph({8:[1,2,3,5], 9:[1,2,4,6], 10:[1,3,4,7]})
sage: B = BipartiteGraph(Gamma)
sage: B.show()
sage: B.left
set([1, 2, 3, 4, 5, 6, 7])
sage: B.right
set([8, 9, 10])
sage: B.show()

This gives us the graph in the following picture:

Decoding algorithm:
Suppose you receive v = ( v_1, v_2, v_3, v_4, v_5, v_6, v_7).
Assume at most one error is made.
Decoding process:

  1. Place v_i at the vertex i on the left side of the bipartite graph.
  2. For each of the check vertices 8,9,10 on the right side of the graph, determine of the if the sum of the bits in the four left-hand vertices connected to it add up to 0 or to 1. If they add to 1, we say that that check vertex has a “parity failure”.
  3. Those check vertices which do not fail are connected to bit vertices which we assume are correct. The remaining bit vertices
    connected to check vertices which fail are to be determined (if possible) by solving the corresponding check equations.

    check vertex 8: v_2+v_3+v_4+v_5 = 0

    check vertex 9: v_1+v_3+v_4+v_6 = 0

    check vertex 10: v_1+v_2+v_4+v_7 = 0

Warning: This method is not guaranteed to succeed in general. However, it does work very efficiently when the check matrix H is “sparse” and the number of 1’s in each row and column is “small.”

For example, suppose v = (1,1,1,1,1,0,1). The check vertex 9 fails its parity check, but vertex 8 and 10 do not. Therefore, only bit vertex 6 is unknown, since vertex 6 is the only one not connected to 8 and 10. This tells us that the decoding codeword is c = (1,1,1,1,1,v_6,1), for some unknown v_6. We solve for this unknown using the check vertex equation v_1+v_3+v_4+v_6 = 0, giving us v_6 = 1. The decoded codeword is c = (1,1,1,1,1,1,1).

This last example was pretty simple, so let’s try v=(0,1,1,1,1,1,1). In this case, we know the vertices 9 and 10 fail, so c = (v_1,1,1,1,1,v_6,v_7). We solve using

v_1+1+1+v_6 = 0

v_1+1+1+v_7 = 0

This simply tells us v_1=v_6=v_7. By majority vote, we get c = (1,1,1,1,1,1,1).

Digital steganography and Sage

This post shall explain how the F5 stegosystem (in a simple case) can be implemented in Sage. I thank Carlos Munuera for teaching me these ideas and for many helpful conversations on this matter. I also thank Kyle Tucker-Davis [1] for many interesting conversations on this topic.

Steganography, meaning “covered writing,” is the science of secret communication [4]. The medium used to carry the information is called the “cover” or “stego-cover” (depending on the context – these are not synonymous terms). The term “digital steganography” refers to secret communication where the cover is a digital media file.

One of the most common systems of digital steganography is the Least Significant Bit (LSB) system. In this system, we assume the “cover” image is represented as a finite sequence of binary vectors of a given length. In other words, a greyscale image is regarded as an array of pixels, where each pixel is represented by a binary vector of a certain fixed length. Each such vector represents a pixel’s brightness in the cover image. The encoder embeds (at most) one bit of information in the least significant bit of eaach vector. Care must be taken with this system to ensure that the changes made do not betray the stego-cover, while still maximizing the information hidden.

From a short note of Crandell [3] in 1998, it was realized that error-correcting codes can give rise to “stego-schemes”, ie, methods by which a message can be hidden in a digital file efficiently.

Idea in a nutshell: If C is the r-th binary Hamming code (so, n = 2^r-1 is its length, k = 2^r - r - 1 is its dimension, and d = 3 is its minimum distance), G is a generating matrix, H is a check matrix, and C^\perp is the dual code of C, then we take an element v \in GF(2)^n to be a cover, and the message m we embed is an element of GF(2)^r. Once we find a vector z of lowest weight such that H(v+z)=m, we call v+z the stegocover. The stegocover looks a lot like the original cover and “contains” the message m. This will be explained in more detai below.
(This particular scheme is called the F5 stegosystem, and is due to Westfeld.)

Quick background on error-correcting, linear, block codes [5]

A linear error-correcting block code is a finite dimensional vector space over a finite field with a fixed basis. We assume the finite field is the binary field GF(2).

We shall typically think of a such a code as a subspace C of GF(2)^n with a fixed basis, where n>0 is an integer called the length of the code. Moreover, the basis for the ambient space GF(2)^n will be the standard basis,

e_1=(1,0,\dots, 0),  e_2=(0,1,0,\dots, 0),  \dots,  e_n=(0,\dots, 0,1).

There are two common ways to specify a linear code C.

  1. You can give C as a vector subspace of GF(2)^n by specifying a set of basis vectors for C. This set of basis vectors is, by convention, placed as
    the rows of a matrix called a generator matrix G of C. Obviously, the order in which the rows are presented does not
    affect the code itself.

    If g_1, \dots, g_k are the rows of G then

    C= \{c=m_1g_1+\dots +m_kg_k\ |\ {\rm some}\ m_i\in GF(2)\},

    is the set of linear combinations of the row vectors g_i. The vector of coefficients, m=(m_1,\dots, m_k) represents the information you want to encode and transmit.

    In other words, encoding of a message can be defined via the generator matrix:

    \begin{array}{ccc}  m = (m_1,\dots, m_k) & \to & c=m_1g_1+\dots +m_kg_k = m^t\cdot G,\\  GF(2)^k & \to & C.  \end{array}

  2. You can give C as a vector subspace of GF(2)^n by specifying a matrix H for which C is the kernel of H, C={\rm ker}(C). This matrix is called a check matrix of C. Again, the order in which the rows are presented does not affect the code itself.

Note that if G is a full rank k\times n matrix then a full rank check matrix H must be a (n-k) \times n matrix.

These two ways of defining a code are not unrelated.

If G=(I_k\ \vert\ A) is the generating matrix for C then H=(-A^t\ \vert\ I_{n-k}) is a parity check matrix.

Let r>0 be an integer and let H be a r\times (2^r-1) matrix whose columns are all the distinct non-zero vectors of GF(2)^r. Then, the code having H as its check matrix is called a binary Hamming code, denoted Ham(r,2).

Let r=3, and let

H=  \begin{pmatrix}  0 & 1 & 1 & 1 & 1 & 0 & 0 \\  1 & 0 & 1 & 1 & 0 & 1 & 0 \\     1 & 1 & 0 & 1 & 0 & 0 & 1   \end{pmatrix}.

In this form, namely when the columns of H are arranged in standard form such that the rightmost k\times k entries is the identity matrix I_k, the generator matrix G can be quickly found to be

G=  \begin{pmatrix}  1 & 0 & 0 & 0 & 0 & 1 & 1 \\  0 & 1 & 0 & 0 & 1 & 0 & 1 \\     0 & 0 & 1 & 0 & 1 & 1 & 0 \\  0 & 0 & 0 & 1 & 1 & 1 & 1   \end{pmatrix}.

A coset of GF(2)^n/C is a subset of GF(2)^n of the form C+v for some v\in GF(2)^n. Let S be a coset of C. Then,
a coset leader of S is an element of S having smallest Hamming weight.

The coset leaders of a Hamming code are those vectors of wt \leq 1.

Steganographic systems from error-correcting codes

This section basically describes Crandell’s idea [3] in a more formalized language.

Following Munuera’s notation in [6], a steganographic system S can be formally defined as

S= \{\mathcal{C}, \mathcal{M}, \mathcal{K}, emb, rec \},

  1. \mathcal{C} is a set of all possible covers

  2. \mathcal{M} is a set of all possible messages

  3. \mathcal{K} is a set of all possible keys

  4. emb:\mathcal{C}\times\mathcal{M}\times\mathcal{K}\to\mathcal{C} is an embedding function

  5. rec:\mathcal{C}\times\mathcal{K}\to\mathcal{M} is a recovery function

and these all satisfy


for all m\in\mathcal{M}, c\in\mathcal{C}, k\in\mathcal{K}. We will assume that a fixed key k\in\mathcal{K} is used, and therefore,
the dependence on an element in the keyspace \mathcal{K} can be ignored. The original cover c is called the plain cover, m is called the message, and emb(c,m,k)=emb(c,m) is called the stegocover. Let \mathcal{C} be GF(2)^n, representing both plain covers and stegocovers. Also, let \mathcal{M} be GF(2)^k, where k is a fixed integer such that 0 \leq k \leq n.

Sage examples

How do we compute the emb map?

We need the following Sage function.

def hamming_code_coset_leader(C, y):
      Finds the coset leader of a binary Hamming code C.
      F = C.base_ring()
      n = C.length()
      k = C.dimension()
      r = n-k
      V0 = F^r
      if not(y in V0):
          RaiseError, "Input vector is not a syndrome."
      H = C.check_mat()
      colsH = H.columns()
      i = colsH.index(y)
      V = F^n
      v = V(0)
      v[i] = F(1)
      return v

Let V = GF(2)^{63} and consider the cover v = ([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,0,0,1,1,1, 1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1). Regarded as a 7\times 9 matrix,

V = GF(2)^(63)
rhino = V([1,1,1,1,1,1,1,1,1,
A = matrix(GF(2),7,rhino.list())

this looks like an elephant or a rhino:

Now we embed the message m = (1,0,1,0,1,0). First we compute the stegocover:

C = HammingCode(6,GF(2))
H = C.check_mat()
V0 = GF(2)^6
m = V0([1,0,1,0,1,0])
z = hamming_code_coset_leader(C, m)
stegocover = rhino+z
A = matrix(GF(2),7,stegocover.list())

It looks like another rhino/elephant:

Note only one bit is changed since the Hamming weight of z is at most 1. To recover the message m, just multiply the vector “stegocover” by H.

That’s how you can use Sage to understand the F5 stegosystem, at least in a very simple case.

[1] Kyle Tucker-Davis, “An analysis of the F5 steganographic system”, Honors Project 2010-2011

[2] J. Bierbrauer and J. Fridrich. “Constructing good covering codes for applications in Steganography}. 2006.

[3] Crandall, Ron. “Some Notes on Steganography”. Posted on a Steganography Mailing List, 1998.

[4] Wayner, Peter. Disappearing Cryptography. Morgan Kauffman Publishers. 2009.

[5] Hill, Raymond. A First Course in Coding Theory. Oxford University Press. 1997.

[6] Munuera, Carlos. “Steganography from a Coding Theory Point of View”. 2010.

[7] Zhang, Weiming and S. Li. “Steganographic Codes- a New Problem of Coding Theory”. preprint, 2005.