This is the first of a series of expository posts on matrix-theoretic sports ranking methods. This post, which owes much to discussions with TS Michael, discusses Massey’s method.
Massey’s method, currently in use by the NCAA (for football, where teams typically play each other once), was developed by Kenneth P. Massey
while an undergraduate math major in the late 1990s. We present a possible variation of Massey’s method adapted to baseball, where teams typically play each other multiple times.
There are exactly 15 pairing between these teams. These pairs are sorted lexicographically, as follows:
(1,2),(1,3),(1,4), …, (5,6).
In other words, sorted as
Army vs Bucknell, Army vs Holy Cross, Army vs Lafayette, …, Lehigh vs Navy.
The cumulative results of the 2016 regular season are given in the table below. We count only the games played in the Patriot league, but not including the Patriot league post-season tournament (see eg, the Patriot League site for details). In the table, the total score (since the teams play multiple games against each other) of the team in the vertical column on the left is listed first. In other words, ”a – b” in row $i$ and column $j$ means the total runs scored by team against team is , and the total runs allowed by team against team is . Here, we order the six teams as above (team is Army (USMI at Westpoint), team is Bucknell, and so on). For instance if X played Y and the scores were , , , , , , then the table would read in the position of row X and column Y.
Win-loss digraph of the Patriot league mens baseball from 2015
In this ordering, we record their (sum total) win-loss record (a 1 for a win, -1 for a loss) in a matrix:
We also record their total losses in a column vector:
The Massey ranking of these teams is a vector which best fits the equation
While the corresponding linear system is over-determined, we can look for a best (in the least squares sense) approximate solution using the orthogonal projection formula
valid for matrices with linearly independent columns. Unfortunately, in this case does not have linearly independent columns, so the formula doesn’t apply.
Massey’s clever idea is to solve
by row-reduction and determine the rankings from the parameterized form of the solution. To this end, we compute
Then we compute the rref of
If denotes the rankings of Army, Bucknell, Holy Cross, Lafayette, Lehigh, Navy, in that order, then
Lafayette Army = Bucknell = Lehigh Holy Cross Navy.
If we use this ranking to predict win/losses over the season, it would fail to correctly predict Army vs Holy Cross (Army won), Bucknell vs Lehigh, and Lafayette vs Army. This gives a prediction failure rate of .