Sports ranking methods, 3

This is the third of a series of expository posts on matrix-theoretic sports ranking methods. This post discusses the random walker ranking.

We follow the presentation in the paper by Govan and Meyer (Ranking National Football League teams using Google’s PageRank). The table of “score differentials” based on the table in a previous post is:

\begin{tabular}{c|cccccc} \verb+x\y+ & Army & Bucknell & Holy Cross & Lafayette & Lehigh & Navy \\ \hline Army & 0 & 0 & 1 & 0 & 0 & 0 \\ Bucknell & 2 & 0 & 0 & 2 & 3 & 0 \\ Holy Cross & 0 & 3 & 0 & 4 & 14 & 0 \\ Lafayette & 10 & 0 & 0 & 0 & 0 & 0 \\ Lehigh & 2 & 0 & 0 & 11 & 0 & 0 \\ Navy & 11 & 14 & 8 & 22 & 6 & 0 \\ \end{tabular}
This leads to the following matrix:

M_0=\left(\begin{array}{cccccc} 0 & 0 & 1 & 0 & 0 & 0 \\ 2 & 0 & 0 & 2 & 3 & 0 \\ 0 & 3 & 0 & 4 & 14 & 0 \\ 10 & 0 & 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 11 & 0 & 0 \\ 11 & 14 & 8 & 22 & 6 & 0 \\ \end{array}\right) .

The edge-weighted score-differential graph associated to M_0 (regarded as a weighted adjacency matrix) is in the figure below.

sm261_baseball-ranking-application_teams-digraph2

This matrix M_0 must be normalized to create a (row) stochastic matrix:

M = \left(\begin{array}{cccccc} 0 & 0 & 1 & 0 & 0 & 0 \\ {2}/{7} & 0 & 0 /{7} /{7} & 0 \\ 0 /{7} & 0 /{21} /{3} & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ {2}/{13} & 0 & 0 /{13} & 0 & 0 \\ {11}/{61} /{61} /{61} /{61} /{61} & 0 \\ \end{array}\right) .

Next, to insure it is irreducible, we replace M by A=(M+J)/2, where J is the 6\times 6 doubly stochastic matrix with every entry equal to 1/6:

A=\left(\begin{array}{cccccc} {1}/{12} /{12} /{12} /{12} /{12} /{12} \\ {19}/{84} /{12} /{12} /{84} /{84} /{12} \\ {1}/{12} /{84} /{12} /{28} /{12} /{12} \\ {7}/{12} /{12} /{12} /{12} /{12} /{12} \\ {25}/{156} /{12} /{12} /{156} /{12} /{12} \\ {127}/{732} /{732} /{732} /{732} /{732} /{12} \end{array}\right).

Let

{\bf v}_0 = \left( \frac{1}{6} , \frac{1}{6} , \frac{1}{6} , \frac{1}{6} , \frac{1}{6} , \frac{1}{6}\right).

The ranking determined by the random walker method is the reverse of the left eigenvector of A associated to the largest eigenvalue \lambda_{max}=1 (by reverse, I mean that the vector ranks the teams from worst-to-best, not from best-to-worst, as we have seen in previous ranking methods).
In other words, the vector

{\bf r}^*=\lim_{n\to \infty}{\bf v}_0A^n.

This is approximately

{\bf r}^* \cong \left(0.2237\dots ,\,0.1072\dots ,\,0.2006\dots ,\,0.2077\dots ,\,0.1772\dots ,\,0.0833\dots \right).

Its reverse gives the ranking:

Army < Lafayette < Bucknell < Lehigh < Holy Cross < Navy.

This gives a prediction failure rate of 13.3\%.

Sports ranking methods, 2

This is the second of a series of expository posts on matrix-theoretic sports ranking methods. This post discusses Keener’s method (see J.P. Keener, The Perron-Frobenius theorem and the ranking of football, SIAM Review 35 (1993)80-93 for details).

See the first post in the series for a discussion of the data we’re using to explain this method. We recall the table of results:

X\Y Army Bucknell Holy Cross Lafayette Lehigh Navy
Army x 14-16 14-13 14-24 10-12 8-19
Bucknell 16-14 x 27-30 18-16 23-20 10-22
Holy Cross 13-14 30-27 x 19-15 17-13 9-16
Lafayette 24-14 16-18 15-19 x 12-23 17-39
Lehigh 12-10 20-23 13-17 23-12 x 12-18
Navy 19-8 22-10 16-9 39-17 18-12 x
sm261_baseball-ranking-application_teams-digraph

Win-loss digraph of the Patriot league mens baseball from 2015

Suppose T teams play each other. Let A=(a_{ij})_{1\leq i,j\leq T} be a non-negative square matrix determined by the results of their games, called the preference matrix. In his 1993 paper, Keener defined the score of the ith team to be given by

s_i = \frac{1}{n_i}\sum_{j=1}^T a_{ij}r_j,

where n_i denotes the total number of games played by team i and {\bf r}=(r_1,r_2,\dots ,r_T) is the rating vector (where r_i\geq 0 denotes the rating of team i).

One possible preference matrix the matrix A of total scores obtained from the pre-tournament table below:

A = \left(\begin{array}{rrrrrr} 0 & 14 & 14 & 14 & 10 & 8 \\ 16 & 0 & 27 & 18 & 23 & 28 \\ 13 & 30 & 0 & 19 & 27 & 43 \\ 24 & 16 & 15 & 0 & 12 & 17 \\ 12 & 20 & 43 & 23 & 0 & 12 \\ 19 & 42 & 30 & 39 & 18 & 0 \end{array}\right),

(In this case, n_i=4 so we ignore the 1/n_i factor.)

In his paper, Keener proposed a ranking method where the ranking vector {\bf r} is proportional to its score. The score is expressed as a matrix product A{\bf r}, where A is a square preference matrix. In other words, there is a constant \rho>0 such that s_i=\rho r_i, for each i. This is the same as saying A {\bf r} = \rho {\bf r}.

The Frobenius-Perron theorem implies that S has an eigenvector {\bf r}=(r_1,r_2,r_3,r_4,r_5,r_6) having positive entries associated to the largest eigenvalue $\lambda_{max}$ of A, which has (geometric) multiplicity 1. Indeed, A has maximum eigenvalue \lambda_{max}= 110.0385..., of multiplicity 1, with eigenvector

{\bf r}=(1, 1.8313\dots , 2.1548\dots , 1.3177\dots , 1.8015\dots , 2.2208\dots ).

Therefore the teams, according to Kenner’s method, are ranked,

Army < Lafayette < Lehigh < Bucknell < Holy Cross < Navy.

This gives a prediction failure rate of just 6.7\%.

Memories of TS Michael, by Thomas Quint

TS Michael passed away on November 22, 2016, from cancer. I will miss him as a colleague and a kind, wise soul. Tom Quint has kindly allowed me to post these reminiscences that he wrote up.


Well, I guess I could start with the reason TS and I met in the first place. I was a postdoc at USNA in about 1991 and pretty impressed with myself. So when USNA offered to continue my postdoc for two more years (rather than give me a tenure track position), I turned it down. Smartest move I ever made, because TS got the position and so we got to know each other.

We started working w each other one day when we both attended a talk on “sphere of influence graphs”. I found the subject moderately interesting, but he came into my office all excited, and I couldn’t get rid of him — wouldn’t leave until we had developed a few research ideas.

Interestingly, his position at USNA turned into a tenure track, while mine didn’t. It wasn’t until 1996 that I got my position at U of Nevada.

Work sessions with him always followed the same pattern. As you may or may not know, T.S. a) refused to fly in airplanes, and b) didn’t drive. Living across the country from each other, the only way we could work together face-to-face was: once each summer I would fly out to the east coast to visit my parents, borrow one of their cars for a week, and drive down to Annapolis. First thing we’d do is go to Whole Foods, where he would load up my car with food and other supplies, enough to last at least a few months. My reward was that he always bought me the biggest package of nigiri sushi we could find — not cheap at Whole Foods!

It was fun, even though I had to suffer through eight million stories about the USNA Water Polo Team.

Oh yes, and he used to encourage me to sneak into one of the USNA gyms to work out. We figured that no one would notice if I wore my Nevada sweats (our color is also dark blue, and the pants also had a big “N” on them). It worked.

Truth be told, TS didn’t really have a family of his own, so I think he considered the mids as his family. He cared deeply about them (with bonus points if you were a math major or a water polo player :-).

One more TS anecdote, complete with photo.  Specifically, TS was especially thrilled to find out that we had named our firstborn son Theodore Saul Quint.  Naturally, TS took to calling him “Little TS”.  At any rate, the photo below is of “Big TS” holding “Little TS”, some time in the Fall of 2000.

tslittlets_photo2000

TS Michael in 2000.

Simple unsolved math problem, 7

Everyone’s heard of the number \pi = 3.141592…, right?

pi-pie

Robert Couse-Baker / CC BY http://2.0 / Flickr: 29233640@N07

And you probably know that \pi is not a rational number (i.e., a quotient of two integers, like 7/3). Unlike a rational number, whose decimal expansion is eventually periodic, if you look at the digits of \pi they seem random,

3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482…

But are they really? No one really knows. There’s a paper that explores the statistics of these digits using the first 22.4 trillion digits of \pi. Does any finite sequence of k digits (say, for example, the 4-digit sequence 2016) occur just as often as any other sequence of the same length (say, 1492), for each k? When the answer is yes, the number is called ‘normal.’ That is, a normal number is a real number whose infinite sequence of digits is distributed uniformly in the sense that each digit has the same natural density 1/10, also all possible 100 pairs of digits are equally likely with density 1/100, and all 1000 triplets of digits equally likely with density 1/1000, and so on.

The following simple problem is unsolved:

Conjecture: \pi is normal.

Sports ranking methods, 1

This is the first of a series of expository posts on matrix-theoretic sports ranking methods. This post, which owes much to discussions with TS Michael, discusses Massey’s method.

Massey’s method, currently in use by the NCAA (for football, where teams typically play each other once), was developed by Kenneth P. Massey
while an undergraduate math major in the late 1990s. We present a possible variation of Massey’s method adapted to baseball, where teams typically play each other multiple times.

There are exactly 15 pairing between these teams. These pairs are sorted lexicographically, as follows:

(1,2),(1,3),(1,4), …, (5,6).

In other words, sorted as

Army vs Bucknell, Army vs Holy Cross, Army vs Lafayette, …, Lehigh vs Navy.

The cumulative results of the 2016 regular season are given in the table below. We count only the games played in the Patriot league, but not including the Patriot league post-season tournament (see eg, the Patriot League site for details). In the table, the total score (since the teams play multiple games against each other) of the team in the vertical column on the left is listed first. In other words, ”a – b” in row $i$ and column $j$ means the total runs scored by team i against team j is a, and the total runs allowed by team i against team j is b. Here, we order the six teams as above (team 1 is Army (USMI at Westpoint), team 2 is Bucknell, and so on). For instance if X played Y and the scores were 10-0, 0-1, 0-1, 0-1, 0-1, 0-1, then the table would read 10-5 in the position of row X and column Y.

X\Y Army Bucknell Holy Cross Lafayette Lehigh Navy
Army x 14-16 14-13 14-24 10-12 8-19
Bucknell 16-14 x 27-30 18-16 23-20 10-22
Holy Cross 13-14 30-27 x 19-15 17-13 9-16
Lafayette 24-14 16-18 15-19 x 12-23 17-39
Lehigh 12-10 20-23 13-17 23-12 x 12-18
Navy 19-8 22-10 16-9 39-17 18-12 x
sm261_baseball-ranking-application_teams-digraph

Win-loss digraph of the Patriot league mens baseball from 2015

In this ordering, we record their (sum total) win-loss record (a 1 for a win, -1 for a loss) in a 15\times 6 matrix:

M = \left(\begin{array}{cccccc} -1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & -1 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 0 & 1 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 \\ 0 & -1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & 0 \\ 0 & 1 & 0 & 0 & -1 & 0 \\ 0 & -1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 & -1 & 0 \\ 0 & 0 & -1 & 0 & 0 & 1 \\ 0 & 0 & 0 & -1 & 1 & 0 \\ 0 & 0 & 0 & -1 & 0 & 1 \\ 0 & 0 & 0 & 0 & -1 & 1 \end{array}\right).

We also record their total losses in a column vector:

{\bf b}= \left(\begin{array}{c} 2 \\ 1 \\ 10 \\ 2 \\ 11 \\ 3 \\ 2 \\ 3 \\ 14 \\ 4 \\ 14 \\ 10 \\ 11 \\ 22 \\ 6 \\ \end{array}\right).

The Massey ranking of these teams is a vector {\bf r} which best fits the equation

M{\bf r}={\bf b}.

While the corresponding linear system is over-determined, we can look for a best (in the least squares sense) approximate solution using the orthogonal projection formula

P_V = B(B^tB)^{-1}B^t,

valid for matrices B with linearly independent columns. Unfortunately, in this case B=M does not have linearly independent columns, so the formula doesn’t apply.

Massey’s clever idea is to solve

M^tM{\bf r}=M^t{\bf b}

by row-reduction and determine the rankings from the parameterized form of the solution. To this end, we compute

M^tM= \left(\begin{array}{rrrrrr} 5 & -1 & -1 & -1 & -1 & -1 \\ -1 & 5 & -1 & -1 & -1 & -1 \\ -1 & -1 & 5 & -1 & -1 & -1 \\ -1 & -1 & -1 & 5 & -1 & -1 \\ -1 & -1 & -1 & -1 & 5 & -1 \\ -1 & -1 & -1 & -1 & -1 & 5 \end{array}\right)

and

M^t{\bf b}= \left(\begin{array}{r} -24 \\ -10 \\ 10 \\ -29 \\ -10 \\ 63 \\ \end{array}\right).

Then we compute the rref of

A= (M^tM,M^t{\bf b}) = \left(\begin{array}{rrrrrr|r} 5 & -1 & -1 & -1 & -1 & -1 & -24 \\ -1 & 5 & -1 & -1 & -1 & -1 & -10 \\ -1 & -1 & 5 & -1 & -1 & -1 & 10 \\ -1 & -1 & -1 & 5 & -1 & -1 & -29 \\ -1 & -1 & -1 & -1 & 5 & -1 & -10 \\ -1 & -1 & -1 & -1 & -1 & 5 & 63 \end{array}\right),

which is

rref(M^tM,M^t{\bf b})= \left(\begin{array}{rrrrrr|r} 1 & 0 & 0 & 0 & 0 & -1 & -\frac{87}{6} \\ 0 & 1 & 0 & 0 & 0 & -1 & -\frac{73}{6} \\ 0 & 0 & 1 & 0 & 0 & -1 & -\frac{53}{6} \\ 0 & 0 & 0 & 1 & 0 & -1 & -\frac{92}{3} \\ 0 & 0 & 0 & 0 & 1 & -1 & -\frac{73}{6} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}\right).

If {\bf r}=(r_1,r_2,r_3,r_4,r_5,r_6) denotes the rankings of Army, Bucknell, Holy Cross, Lafayette, Lehigh, Navy, in that order, then

r_1=r_6-\frac{87}{6},\ \ r_2=r_6-\frac{73}{6},\ \ r_3=r_6-\frac{53}{6},\ \ r_4=r_6-\frac{92}{6},\ \ r_5=r_6-\frac{73}{6}.

Therefore

Lafayette < Army = Bucknell = Lehigh < Holy Cross < Navy.

If we use this ranking to predict win/losses over the season, it would fail to correctly predict Army vs Holy Cross (Army won), Bucknell vs Lehigh, and Lafayette vs Army. This gives a prediction failure rate of 20\%.

Memories of TS Michael, by Bryan Shader

TS Michael passed away on November 22, 2016, from cancer. I will miss him as a colleague and a kind, wise soul.

ts-michaels_2015-12-21_small

TS Michael in December 2015 at the USNA


Bryan Shader has kindly allowed me to post these reminiscences that he wrote up.

Memories of TS Michael, by Bryan Shader

Indirect influence
TS indirectly influenced my choice of U. Wisconsin-Madison for graduate school. My senior year as an undergraduate, Herb Ryser gave a talk at my school. After the talk I was able to meet Ryser and asked for advice on graduate schools. Herb indicated that one of his very good undergraduate students had chosen UW-Madison and really liked the program. I later found out that the person was TS.

About the name
Back in the dark ages, universities still did registration by hand. This meant that for a couple of days before each semester the masses of students would wind their way through a maze of stations in a large gymnasium. For TS’s first 4 years, he would invariably encounter a road block because someone had permuted the words in his name (Todd Scott Michael) on one of the forms. After concretely verifying the hatcheck probabilities and fearing that this would cause some difficulties in graduating, he legally changed his name to TS Michael.

Polyominoes & Permanents
I recall many stories about how TS’s undergraduate work on polyominoes affected
his life. In particular, he recalled how once he started working on tilings on
polyominoes, he could no longer shower, or swim without visualizing polynomino
tilings on the wall’s or floor’s tiling. We shared an interest and passion for permanents (the permanent is a function of a matrix much like the determinant and plays a critical role in combinatorics). When working together we frequently would find that we both couldn’t calculate the determinant of a 3 by 3 matrix correctly, because we were calculating the permanent rather than the determinant.

Presentations and pipe-dreams
TS and I often talked about how best to give a mathematical lecture, or
presentation at a conference. Perhaps this is not at all surprising, as our common advisor (Richard Brualdi) is an incredible expositor, as was TS’s undergraduate advisor (Herb Ryser, our mathematical grandfather). TS often mentioned how Herb Ryser scripted every moment of a lecture; he knew each word he would write on the board and exactly where it would be written. TS wasn’t quite so prescriptive–but before any presentation he gave he would go to the actual room of the presentation a couple of times and run through the talk. This would include answering questions from the “pretend” audience. After being inspired by TS’s talks, I adopted this preparation method.
TS and I also fantasized about our talks ending with the audience lifting us up on their shoulders and carrying us out of the room in triumph! That is never happened to either of us (that I know of), but to have it, as a dream has always been a good motivation.

Mathematical heritage
TS was very interested in his mathematical heritage, and his mathematics brothers and sisters. TS was the 12th of Brandi’s 37 PhD students; I was the 15th. In 2005, TS and I organized a conference (called the Brualidfest) in honor of Richard Brualdi. Below I attach some photos of the design for the T-shirt.

ts-michaels_memories1

t-shirt design for Brualdi-Fest, 1

The first image shows a biclique partition of K_5; for each color the edges of the given color form a complete bipartite graph; and each each of the completed graph on 5 vertices is in exactly one of these complete bipartite graph. This is related to one of TS’s favorite theorem: the Graham-Pollak Theorem.

ts-michaels_memories2

t-shirt design for Bruldi-Fest, 2

The second image (when the symbols are replaced by 1s) is the incidence matrix of the projective plane of order 2; one of TS’s favorite matrices.

Here’s a photo of the Brualdi and his students at the conference:

ts-michaels_memories3

From L to R they are: John Mason (?), Thomas Forreger, John Goldwasser, Dan Pritikin, Suk-geun Hwang, Han Cho, T.S. Michael, B. Shader, Keith Chavey, Jennifer Quinn, Mark Lawrence, Susan Hollingsworth, Nancy Neudauer, Adam Berliner, and Louis Deaett.

Here’s a picture for a 2012 conference:

ts-michaels_memories4

From bottom to top: T.S. Michael (1988), US Naval Academy, MD; Bryan Shader (1990), University of Wyoming, WY; Jennifer Quinn (1993), University of Washington, Tacoma, WA; Nancy Neudauer (1998), Pacific University, OR; Susan Hollingsworth (2006), Edgewood College, WI; Adam Berliner (2009), St. Olaf College, MN; Louis Deaett (2009), Quinnipiac University, CT; Michael Schroeder (2011), Marshall University, WV; Seth Meyer (2012), Kathleen Kiernan (2012).

Here’s a caricature of TS made by Kathy Wilson (spouse of mathematician
Richard Wilson) at the Brualdifest:

OLYMPUS DIGITAL CAMERA

TS Michael, by Kathy Wilson

Long Mathematical Discussions
During graduate school, TS and I would regularly bump into each other as we
were coming and going from the office. Often this happened as we were crossing University Avenue, one of the busiest streets in Madison. The typical conversation started with a “Hi, how are you doing? Have you considered X?” We would then spend the next 60-90 minutes on the street corner (whether it was a sweltering 90 degrees+, or a cold, windy day) considering X. In more recent years, these conversations have moved to hotel lobbies at conferences that we attend together. These discussions have been some of the best moments of my life, and through them I have become a better mathematician.

Here’s a photo of T.S. Michael with Kevin van der Meulen at the Brualdi-fest.

OLYMPUS DIGITAL CAMERA

I’m guessing they are in the midst of one of those “Have you considered X?” moments that TS is famous for.

Mathematical insight
TS has taught me a lot about mathematics, including:

  •  How trying to generalize a result can lead to better understanding of the original result.
  •  How phrasing a question appropriately is often the key to a mathematical breakthrough
  • Results that are surprising (e.g. go against ones intuition), use an elegant proof (e.g. bring in matrices in an unexpected way), and are aesthetically pleasing are worth pursing (as Piet Hein said “Problems worthy of attack, prove their worth by fighting back.”)
  •  The struggle to present the proof of a result in the simplest, most self-contained way is important because often it produces a better understanding. If you can’t say something in a clean way, then perhaps you really don’t understand it fully.

TS’ mathematics fathers are:
Richard Brualdi ← Herb Ryser ← Cyrus MacDuffee ← Leonard Dickson ← E.H. Moore ← H. A. Newton ← Michel Chasles ← Simeon Poisoon ← Joseph Lagrange ← Leonhard Euler ← Johann Bernoulli.