This is the second in the series of blog posts inspired by the 2004 Ken Ross book entitled A Mathematician at the Ballpark. The first one is here. In this post, again, we illustrate all these notions using the Baltimore Orioles’ 2022 season.
For an experienced baseball fan, baseball is a game of patterns. We “know” what a well-executed pitch looks like, how a double play is to be executed, how a pop-up is to be fielded, and so on. Because of these expected patterns, we know the plays which should emerge and so the desire to track their occurrences should come as no surprise. It’s been done since professional baseball started in the 1870s. This week we discuss a pitching statistic you see on televised games, WHIP. WHIP is short for “walks plus hits allowed per innings pitched”.
Earned run average
We start off with the most basic pitching statistic, the Earned Run Average or ERA. This is the number of earned runs per 9 innings pitched:
ERA = 9·ER/IP,
where
IP, the number of Innings Pitched,
ER, the number of Earned Runs allowed by the pitcher. That is, it counts the number of runs enabled by the offensive team’s production in the face of competent play from the defensive team.
It is possible to have ERA = ∞, since innings are measured by the number of outs achieved (so if the pitcher doesn’t get any batters out, his IP=0). The lower the ERA the better the pitcher. In the 2022 season, right-handed Félix Bautista, who entered late innings as either a closer or a reliever, had an ERA of 2.19. Left-handed closer Cionel Pérez had an ERA of 1.40.
WHIP
We define walks plus hits allowed per innings pitched by:
WHIP = (BB+H)/IP,
where (as in the previous post) BB is the number of walks and H stands for the number of Hits allowed by the pitcher (so, for example, reaching base due to a fielding error doesn’t count). WHIP reflects a pitcher’s propensity for allowing batters to reach base, therefore a lower WHIP indicates better performance.
When we plot the ERA vs the WHIP for the top 20 Orioles pitchers in 2022, we get
Again, the line shown is the line that best fits the data. As the line of best fit doesn’t fit the date too well, this tells us that these two statistical measurements aren’t too well-correlated. In other words, low ERA indicates a good pitcher and low WHIP indicates a good pitcher, but the values for “average” pitchers seem less related to each other.
Yes, I more-or-less stole the above title from the 2004 Ken Ross book entitled A Mathematician at the Ballpark. Like that book, anyone familiar with middle-school (or junior high school) math, should have no problem with most of what we do here. However, I will try to go into baseball in more detail than the book did.
Paraphrasing slightly, I read somewhere the following facetious remark:
From a survey of 1000 random baseball fans
across the nation, 183% of them hate math.
If you are one of these 183%, then this series could be for you. Hopefully, even if you aren’t a baseball expert, but you would like to learn some baseball statistics, (now often called “sabermetrics”), these posts will help. I’m no expert myself, so we’ll learn together.
In this series of blog posts, each post will introduce a particular metric in baseball statistics as well as some of the math and baseball behind it. We illustrate all these notions using the Baltimore Orioles’ 2022 season.
This week we look at one of the most popular statistics you see on televised games: OPS or “On-base Plus Slugging,” which is short for on-base percentage plus slugging percentage. Don’t worry, we’ll explain all these terms as we go.
On-base percentage
First, On-Base Percentage or OBP is a more recent version of On-Base Average or OBA (the same as OBP but the SF term is omitted). We define
OBP = (H + BB + HBP)/(AB + BB + HBP + SF),
where
H is the number of Hits (the times the batter reaches base because of a batted, fair ball without error by the defense),
BB is the number of Base-on-Balls (or walks), where a batter receives four pitches that the umpire calls balls, and is in turn awarded first base,
HBP, or Hit By Pitch, counts the times this hitter is touched by a pitch and awarded first base as a result, and
SF is the number of Sacrifice Flies and AB the number of At-Bats, which are more complicated to carefully define.
The official scorer keeps track of all these numbers, and more, as the baseball game is played. We still have to define the expressions AB and SF.
First, SF, or Sacrifice Flies, counts the number of fly balls hit to the outfield for which both of the following are true:
this fly is caught for an out, and a baserunner scores after the catch (so there must be at most one hit at the time),
the fly is dropped, and a runner scores, if in the scorer’s judgment the runner could have scored after the catch had the fly ball been caught.
A sacrifice fly is only credited if a runner scores on the play. (By the way, this is a “recent” statistic, as they weren’t tabulated before 1954. Between 1876, when the major league baseball national league was born, and 1954 baseball analysts used the OBA instead.)
Second, AB, or At-Bats, counts those plate appearances that are not one of the following:
A walk,
being hit by a pitch,
a bunt (or Sacrifice Hit, SH),
a sacrifice fly,
interference (the catcher hitting the bat with his glove, for example), or
an obstruction (by the first baseman blocking the base path, for example).
Incidentally, the self-explanatory number Plate Appearances, or PA, can differ from AB by as much as 10%, mostly due to the number of walks that a batter can draw.
The main terms in the OBP expression are H and AB. So we naturally expect OBP to be approximately equal to the Batting Average, defined by
BA = H/AB,
For example, if we take the top 18 Orioles players and plot the BA vs the OBP, we get the following graph:
The line shown above is simply the line of best fit to visually indicate the correlation.
Example: As an example, let’s look at the Orioles’ All-Star center fielder, Curtis Mullins, who had 672 plate appearances and 608 at bats, for a difference of 672 − 608 = 64. He had 1 bunt, 5 sacrifice flies, he was hit by a pitch 9 times, and walked 47 times. These add up to 62, so (using the above definition of AB) the number of times he was awarded 1st base due to interference or obstruction was 64 − 62 = 2.
Mullins’ H = 157 hits break down into 105 singles, 32 doubles, 4 triples, and 16 home runs.
Second, let’s add to these his 126 strikeouts, for a total of 157+126+64 = 347.
The remaining 608 − 347 = 261 plate appearances were pitches hit by Mullins, but either caught on the fly but a fielder or the ball landed fair and Mullins was thrown out at a base.
These account for all of Mullins’ plate appearances. Mullins has a batting average of BA = 157/608 = 0.258 and an on-base percentage of OBP = 0.318.
Slugging percentage
The slugging percentage, SLG, (SLuGging) is the total bases achieved on hits divided by at-bats:
SLG = TB/AB.
Here, TB or Total Bases, is the weighted sum
TB = 1B + 2*2B + 3*3B + 4*HR,
where
1B is the number of “singles” (hits where the batter makes it to 1st Base),
2B is the number of doubles,
3B is the number of triples, and
HR denotes the number of Home Runs.
On-base Plus Slugging
With all these definitions under own belt, finally we are ready to compute “on-base plus slugging”, that is the on-base percentage plus slugging percentage:
OPS = OBP + SLG.
Example: Again, let’s consider Curtis Mullins. He had 1B = 105 singles, 2B = 32 doubles, 3B = 4 triples, and HR = 16 home runs, so his TB = 105+64+12+64 = 245. Therefore, his SLG = 245/608 = 0.403, so his on-base plus slugging is OPS = OBP + SLG = 0.318 + 0.403 = 0.721.
Caroline Melles and I have written a preprint that collects numerous examples of harmonic quotient morphisms , where is a quotient graph obtained from some subgroup . The examples are for graphs having a small number of vertices (no more than 12). For the most part, we also focused on regular graphs with small degree (no more than 5). They were all computed using SageMath and a module of special purpose Python functions I’ve written (available on request). I’ve not counted, but the number of examples is relatively large, maybe over one hundred.
I’ll post it to the math arxiv at some point but if you are interested now, here’s a copy: click here for pdf.
While GAP has a group_id function which locates a “small” group in a small groups database (see the SageMath page or the GAP page for more info), AFAIK, SageMath doesn’t have something similar. I’ve written one (see below) based on the mountain of hard work done years ago by Emily Kirkman.
def graph_id_graph6(Gamma, verbose=False):
"""
Returns the graph6 id of Gamma = (V,E).
If verbose then it also displays the table of all graphs with
|V| vertices and |E| edges.
Assumes Gamma is a "small" graph.
EXAMPLES:
sage: Gamma = graphs.HouseGraph()
sage: graph_id_graph6(Gamma, verbose=False)
'Dbk'
sage: graph_id_graph6(Gamma, verbose=True)
graphs with 5 vertices and 6 edges:
Graph6 Aut Grp Size Degree Sequence
------------------------------------------------------------
DB{ 2 [1, 2, 2, 3, 4]
DFw 12 [2, 2, 2, 3, 3]
DJ[ 24 [0, 3, 3, 3, 3]
DJk 2 [1, 2, 3, 3, 3]
DK{ 8 [2, 2, 2, 2, 4]
Dbk 2 [2, 2, 2, 3, 3]
'Dbk'
"""
n = len(Gamma.vertices())
m = len(Gamma.edges())
ds = Gamma.degree_sequence()
Q = GraphQuery(display_cols=['graph6', 'aut_grp_size', 'degree_sequence'], num_edges=['=', m], num_vertices=["=", n])
for g in Q:
if g.is_isomorphic(Gamma):
if verbose:
print("\n graphs with %s vertices"%n+" and %s edges:\n"%m)
Q.show()
print("\n")
return g.graph6_string()
The 1874 poem “The Mathematician in Love” by Scottish mechanical engineer William Rankine (in the book From Songs and Fables) has been published in many places (e.g., poetry.com, New Scientist and the scanned version is available at the internet archive. However, the mathematical equations Rankine presented at the end of his poem are only available in the scanned versions. As WordPress can render LaTeX, the poem quoted below includes those last few lines.
The Mathematician in Love
William J. M. Rankine
I.
A mathematician fell madly in love
With a lady, young, handsome, and charming:
By angles and ratios harmonic he strove
Her curves and proportions all faultless to prove,
As he scrawled hieroglyphics alarming.
II.
He measured with care, from the ends of a base.
The arcs which her features subtended:
Then he framed transcendental equations, to trace
The flowing outlines of her figure and face.
And thought the result very splendid.
III.
He studied (since music has charms for the fair)
The theory of fiddles and whistles, —
Then composed, by acoustic equations, an air,
Which, when ’twas performed, made the lady’s long hair
Stand on end, like a porcupine’s bristles.
IV.
The lady loved dancing: – he therefore applied.
To the polka and waltz, an equation;
But when to rotate on his axis he tried.
His centre of gravity swayed to one side.
And he fell, by the earth’s gravitation.
V.
No doubts of the fate of his suit made him pause.
For he proved, to his own satisfaction.
That the fair one returned his affection; – “because,
As every one knows, by mechanical laws,
Re-action is equal to action.”
VI.
“Let x denote beauty, – y manners well-bred, –
x, Fortune, – (this last is essential), –
Let L stand for love” – our philosopher said, –
“Then z is a function of x, y and 0,
Of the kind which is known as potential.”
VII.
“Now integrate L with respect to dt,
(t Standing for time and persuasion);
Then, between proper limits, ’tis easy to see,
The definite integral Marriage must be: —
(A very concise demonstration).”
VIII.
Said he – “If the wandering course of the moon
By Algebra can be predicted,
The female affections must yield to it soon” –
But the lady ran off with a dashing dragoon,
And left him amazed and afflicted.
End notes:
Equation referred to in Stanza VI.–
Equation referred to in Stanza VII.–
Take an ordinary deck of 52 cards and place them, face up, in the following pattern: Going from the top of the deck to the bottom, placing cards down left-to-right, put 13 cards in the top row: 11 cards in the next row: then 9 cards in the next row: then 7 cards in the next row: then 5 cards in the next row: then 3 cards in the next row: and finally, the remaining 4 cards in the last row:
Now, take the left-most card in each row and move it to the right of the others (effectively, this is a cyclic shift of that row of cards to the left).
Finally, reassemble the deck by reversing the order of the placement.
In memory of the great German mathematician Edmund Landau (1877-1938, see also this bio), I call this the Landau shuffle. As with any card shuffle, this shuffle permutes the original ordering of the cards. To restore the deck to it’s original ordering you must perform this shuffle exactly 180180 times. (By the way, this is called the order of the shuffle.) Yes, one hundred eighty thousand, one hundred and eighty times. Moreover, no other shuffle than this Landau shuffle will require more repetitions to restore the deck. So, in some sense, the Landau shuffle is the shuffle that most effectively rearranges the cards.
Left to right: Oswald Veblen, Edmund Landau, Harald Bohr. Location, photographer, date unknown.
Now suppose we have a deck of (distictly labeled) cards, where is an integer. The collection of all possible shuffles, or permutations, of this deck is denoted and called the symmetric group. The above discussion leads naturally to the following question(s).
Question: What is the largest possible order of a shuffle of this deck (and how do you construct it)?
This requires a tiny bit of group theory. You only need to know that any permutation of symbols (such as the card deck above) can be decomposed into a composition or product) of disjoint cycles. To compute the order of an element , write that element in disjoint cycle notation. Denote the lengths of the disjoint cycles occurring in as , where are integers forming a partition of : . Then the order of is known to be given by , where LCM denotes the least common multiple.
The Landau function is the function that returns the maximum possible order of an element . Landau introduced this function in a 1903 paper where he also proved the asymptotic relation .
Example: If then note and that .
Oddly, my favorite mathematical software program SageMath does not have an implementation of the Landau function, so we end with some SageMath code.
def landau_function(n): L = Partitions(n).list() lcms = [lcm(P) for P in L] return max(lcms)
Here is an example (the time is included to show this took about 2 seconds on my rather old mac laptop):
sage: time landau_function(52) CPU times: user 1.91 s, sys: 56.1 ms, total: 1.97 s Wall time: 1.97 s 180180
This was once posted on my USNA webpage. Since I’ve retired, I’m going to repost it here.
Coding Theory and Cryptography: From Enigma and Geheimschreiber to Quantum Theory (David Joyner, ed.) Springer-Verlag, 2000. ISBN 3-540-66336-3
Summary: These are the proceedings of the “Cryptoday” Conference on Coding Theory, Cryptography, and Number Theory held at the U. S. Naval Academy during October 25-26, 1998. This book concerns elementary and advanced aspects of coding theory and cryptography. The coding theory contributions deal mostly with algebraic coding theory. Some of these papers are expository, whereas others are the result of original research. The emphasis is on geometric Goppa codes, but there is also a paper on codes arising from combinatorial constructions. There are both, historical and mathematical papers on cryptography. Several of the contributions on cryptography describe the work done by the British and their allies during World War II to crack the German and Japanese ciphers. Some mathematical aspects of the Enigma rotor machine and more recent research on quantum cryptography are described. Moreover, there are two papers concerned with the RSA cryptosystem and related number-theoretic issues.
Chapters
Reminiscences and Reflections of a Codebreaker, by Peter Hilton pdf
This is now out of print and will not be reprinted (as far as I know). The above pdf files are posted by written permission. I thank Springer-Verlag for this.
In 2003, a math major named Steven McMath approached Fred Crabbe and I about directing his Trident thesis. (A Trident is like an honors thesis, but the student gets essentially the whole year to focus on writing the project.) After he graduated, I put a lot of his work online at the USNA website. Of course, I’ve retired since then and the materials were removed. This blog post is simply to try to repost a lot of the materials which arose as part of his thesis (which are, as official works of the US government, all in the public domain; of course, Dan Shanks notes are copyright of his estate and are posted for scholarly use only).
McMath planned to study Dan Shanks‘ SQUFOF method, which he felt could be exploited more than had been so far up to that time (in 2004). This idea had a little bit of a personal connection for me. Dan Shanks was at the University of Maryland during the time (1981-1983) I was a grad student there. While he and I didn’t talk as much as I wish we had, I remember him as friendly guy, looking a lot like the picture below, with his door always open, happy to discuss mathematics.
One of the first things McMath did was to type up the handwritten notes we got (I think) from Hugh Williams and Duncan Buell. A quote from a letter from Buell:
“Dan probably invented SQUFOF in 1975. He had just bought an HP 67 calculator, and the algorithm he invented had the advantages of being very simple to program, so it would fit on the calculator, very powerful as an algorithm when measured against the size of the program, and it factors double precision numbers using single precision arithmetic.“
Here are some papers on this topic (with thanks to Michael Hortmann for the references to Gower’s papers):
David Joyner, Notes on indefinite integral binary quadratic forms with SageMath, preprint 2004. pdf
F. Crabbe, D. Joyner, S. McMath, Continued fractions and Parallel SQUFOF, 2004. arxiv
D. Shanks, Analysis and Improvement of the Continued Fraction Method of Factorization, handwritten notes 1975, typed into latex by McMath 2004. pdf.
D. Shanks, An Attempt to Factor N = 1002742628021, handwritten notes 1975, typed into latex by McMath 2004. pdf.
D. Shanks, SQUFOF notes, handwritten notes 1975, typed into latex by McMath 2004. pdf.
S. McMath, Daniel Shanks’ Square Forms Factorization, 2004 preprint. pdf1, pdf2.
S. McMath, Parallel Integer Factorization Using Quadratic Forms, Trident project, 2004. pdf.
Since that time, many excellent treatments of SQUFOF have been presented. For example, see
J. Gower, S. Wagstaff, Square Form Factorization, Math Comp, 2008. pdf.
J. Gower, Square Form Factorization, PhD Thesis, December 2004. pdf.
At first, you might think this is obvious – just “clip” off each corner of the tetrahedron to create the truncated tetrahedron (by essentially creating a triangle from each of these clipped corners – see below for the associated graph). Then just map each such triangle to the corresponding vertex of the tetrahedron. No, it’s not obvious because the map just described is not a covering. This post describes one way to think about how to construct any covering.
First, color the vertices of the tetrahedron in some way.
The coloring below corresponds to a harmonic morphism :
All others are obtained from this by permuting the colors. They are all covers of – with no vertical multiplicities and all horizontal multiplicities equal to 1. These 24 harmonic morphisms of are all coverings and there are no other harmonic morphisms.
If you search hard enough on the internet you’ll discover a pamphlet from the 1898 by Si Stebbins entitled “Card tricks and the way they are performed” (which I’ll denote by [S98] for simplicity). In it you’ll find the “Si Stebbins system” which he claims is entirely his own invention. I’m no magician, by from what I can dig up on Magicpedia, Si Stebbins’ real name is William Henry Coffrin (May 4 1867 — October 12 1950), born in Claremont New Hampshire. The system presented below was taught to Si by a Syrian magician named Selim Cid that Si sometimes worked with. However, this system below seems to have been known by Italian card magicians in the late 1500’s. In any case, this blog post is devoted to discussing parts of the pamphlet [S98] from the mathematical perspective.
In stacking the cards (face down) put the 6 of Hearts first, the 9 of Spades next (so it is below the in the deck), and so on to the end, reading across left to right as indicted in the table below (BTW, the pamphlet [S98] uses the reversed ordering.) My guess is that with this ordering of the deck — spacing the cards 3 apart — it still looks random at first glance.
Hearts
Spades
Diamonds
Clubs
6
9
Queen
2
5
8
Jack
Ace
4
7
10
King
3
6
9
Queen
2
5
8
Jack
Ace
4
7
10
King
3
6
9
Queen
2
5
8
Jack
Ace
4
7
10
King
3
6
9
Queen
2
5
8
Jack
Ace
4
7
10
King
3
Si Stebbins’ System
Next, I’ll present a more mathematical version of this system to illustrate it’s connections with group theory.
We follow the ordering suggested by the mnemonic CHaSeD, we identify the suits with numbers as follows: Clubs is 0, Hearts is 1, Spades is 2 and Diamonds is 3. Therefore, the suits may be identified with the additive group of integers (mod 4), namely: .
For the ranks, identify King with 0, Ace with 1, 2 with 2, , 10 with 10, Jack with 11, Queen with 12. Therefore, the ranks may be identified with the additive group of integers (mod 13), namely: .
Rearranging the columns slightly, we have the following table, equivalent to the one above.
0
1
2
3
3
6
9
12
2
5
8
11
1
4
7
10
0
3
6
9
12
2
5
8
11
1
4
7
10
0
3
6
9
12
2
5
8
11
1
4
7
10
0
3
6
9
12
2
5
8
11
1
4
7
10
0
Mathematical version of the Si Stebbins Stack
In this way, we identify the card deck with the abelian group
.
For example, if you spot the then you know that 13 cards later (and If you reach the end of the deck, continue counting with the top card) is the , 13 cards after that is the , and 13 cards after that is the .
Here are some rules this system satisfies:
Rule 1 “Shuffling”: Never riff shuffle or mix the cards. Instead, split the deck in two, the “bottom half” as the left stack and the “top half” as the right stack. Take the left stack and place it on the right one. This has the effect of rotating the deck but preserving the ordering. Do this as many times as you like. Mathematically, each such cut adds an element of the group to each element of the deck. Some people call this a “false shuffle” of “false cut.”
Rule 2 “Rank position”: The corresponding ranks of successive cards in the deck differs by 3.
Rule 3 “Suit position”: Every card of the same denomination is 13 cards apart and runs in the same order of suits as in the CHaSeD mnemonic, namely, Clubs , Hearts , Spades , Diamonds .
At least, we can give a few simple card tricks based on this system.
Trick 1: A player picks a card from the deck, keeps it hidden from you. You name that card.
This trick can be set up in more than one way. For example, you can either
(a) spread the cards out behind the back in such a manner that when the card is drawn you can separate the deck at that point bringing the two parts in front of you, say a “top” and a “bottom” stack, or
(b) give the deck to the player, let them pick a card at random, which separates the deck into two stacks, say a “top” and a “bottom” stack, and have the player return the stacks separately.
You know that the card the player has drawn is the card following the bottom card of the top stack. If the card on the bottom of the top stack is denoted and the card drawn is then
For example, a player draws a card and you find that the bottom card is the . What is the card the player picked?
solution: Use the first congruence listed: add 3 to 9, which is 12 or the Queen. Use the second congruence listed: add one to Diamond (which is 3) to get (which is Clubs ). The card drawn is the .
Trick 2: Run through the deck of cards (face down) one at a time until the player tells you to stop. Name the card you were asked to stop on.
Place cards behind the back first taking notice what the bottom card is. To get the top card, add 3 to the rank of the bottom card, add 1 to the suit of the bottom card. As you run through the deck you silently say the name of the next card (adding 3 to the rank and 1 to the suit each time). Therefore, you know the card you are asked to stop on, as you are naming them to yourself as you go along.
You must be logged in to post a comment.