The baseball states graph

A state of a baseball game is a 10-tuple (s0,s1,s2,s3,j,vs,hs,tab,b,s), where

  • s0 denotes the number of outs (represented as one of the integers 0,1,2)
  • s1 is 0 if there is no runner on 1st base and 1 otherwise,
  • s2 is 0 if there is no runner on 2nd base and 1 otherwise,
  • s3 is 0 if there is no runner on 3rd base and 1 otherwise,
  • j is the inning number (represented usually as one of the integers 1, 2, . . . , 9, but it can have a larger value if the score is tied),
  • vs is the current score (number of runs) of the visiting team,
  • hs is the current score of the home team,
  • tab is the “team at bat” – 0 for visitor and 1 for home,
  • b counts the balls to the batter,
  • s counts the strikes.

For simplicity, we will always work within a given inning and omit the variables past the inning number variable. Therefore, for the remainder, regard the set of all possible states as a list of 4-tuples. These states can be listed in a 8×3 array:

(0,0,0,0), (1,0,0,0), (2,0,0,0),
(0,1,0,0), (1,1,0,0), (2,1,0,0),
(0,0,1,0), (1,0,1,0), (2,0,1,0),
(0,0,0,1), (1,0,0,1), (2,0,0,1),
(0,1,1,0), (1,1,1,0), (2,1,1,0),
(0,1,0,1), (1,1,0,1), (2,1,0,1),
(0,0,1,1), (1,0,1,1), (2,0,1,1),
(0,1,1,1), (1,1,1,1), (2,1,1,1)

Similarly, the so-called run expectancy matrix (denoted REM or RE24) is formed by examining the 24 states of a baseball game and entering the number of expected runs into each state. (This matrix depends on the year the games were played and on the ballpark. Interactive RE24 visualizations and downloadable datasets can be found on the FanGraphs website, https://blogs.fangraphs.com/introducing-the-batter-specific-run-expectancy-tool/ .) We are only looking at the states here.

There are often several ways to transition from one state to another. For example, the transition (0,0,0,0) -> (0,0,0,1) could be from a (triple) hit by the batter followed by running the bases to 3rd base. In the reverse order, the transition (0,1,1,1) -> (0,0,0,0) is also possible, when the batter hits a (grand slam) homerun. We list (most of) the possible transitions below. Omitted are the “self-transitions”, such as (1) (0,0,0,0) -> (0,0,0,0) when the batter hits a homerun, or (2) (0,1,0,0) -> (0,1,0,0) when the batter hits a single but gets an RBI from the runner on 1st base advancing all the way home. (Question: These self-transitions yield loops in the associated graph but should they be allowed for wider applications?) As indicated above, we also omit transitioning to the next inning (3 outs).

Transitions from (0,0,0,0): (0,1,0,0), (0,0,1,0), (0,0,0,1), (1,0,0,0).
Transitions from (0,1,0,0): (0,0,0,0), (0,0,1,0), (0,0,0,1), (0,1,1,0),  (0,1,0,1),  (0,0,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (2,0,0,0).
Transitions from (0,0,1,0): (0,0,0,0), (0,1,0,0), (0,0,0,1), (0,1,1,0),  (0,1,0,1),  (0,0,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (2,0,0,0).
Transitions from (0,0,0,1): (0,0,0,0), (0,1,0,0), (0,0,1,0), (0,1,0,1),  (0,0,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (2,0,0,0).
Transitions from (0,1,1,0): (0,0,0,0),  (0,1,0,0), (0,0,1,0), (0,0,0,1), (0,1,0,1), (0,0,1,1), (0,1,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (0,1,0,1): (0,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), (0,1,1,0), (0,0,1,1), (0,1,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (0,0,1,1): (0,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1), (0,1,1,0), (0,1,0,1), (0,1,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (0,1,1,1): (0,0,0,0), (0,1,0,0), (0,0,1,0), (0,0,0,1),  (0,1,1,0), (0,1,0,1), (0,0,1,1), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).

Transitions from (1,0,0,0): (1,1,0,0), (1,0,1,0), (1,0,0,1), (2,0,0,0).
Transitions from (1,1,0,0): (1,0,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1),  (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (1,0,1,0): (1,0,0,0), (1,1,0,0), (1,0,0,1), (1,1,1,0), (1,1,0,1),  (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (1,0,0,1): (1,0,0,0), (1,1,0,0), (1,0,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (1,1,1,0): (1,0,0,0), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,0,1), (1,0,1,1), (1,1,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (1,1,0,1): (1,0,0,0), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,0,1,1), (1,1,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (1,0,1,1): (1,0,0,0), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,0,1), (1,1,0,1), (1,1,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (1,1,1,1): (1,0,0,0), (1,1,0,0), (1,0,1,0), (1,0,0,1), (1,1,1,0), (1,1,0,1), (1,0,1,1), (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1), (2,1,1,1).

Transitions from (2,0,0,0): (2,1,0,0), (2,0,1,0), (2,0,0,1).
Transitions from (2,1,0,0): (2,0,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (2,0,1,0): (2,0,0,0), (2,1,0,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (2,0,0,1): (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,1,1,0), (2,1,0,1), (2,0,1,1).
Transitions from (2,1,1,0): (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,0,1), (2,0,1,1), (2,1,1,1).
Transitions from (2,1,0,1): (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,0,1,1), (2,1,1,1).
Transitions from (2,0,1,1): (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,1,1,1).
Transitions from (2,1,1,1): (2,0,0,0), (2,1,0,0), (2,0,1,0), (2,0,0,1), (2,1,1,0), (2,1,0,1), (2,0,1,1).

The graph with vertices being the 24 states and edges being those determined by the above neighborhoods, has 24 vertices, 182 edges, and degree of each state is given as follows: 

(0, 0, 0, 0), 8
(0, 0, 0, 1), 11
(0, 0, 1, 0), 11
(0, 0, 1, 1), 17
(0, 1, 0, 0), 11
(0, 1, 0, 1), 17
(0, 1, 1, 0), 17
(0, 1, 1, 1), 20
(1, 0, 0, 0), 9
(1, 0, 0, 1), 18
(1, 0, 1, 0), 18
(1, 0, 1, 1), 18
(1, 1, 0, 0), 18
(1, 1, 0, 1), 18
(1, 1, 1, 0), 18
(1, 1, 1, 1), 15
(2, 0, 0, 0), 22
(2, 0, 0, 1), 18
(2, 0, 1, 0), 18
(2, 0, 1, 1), 12
(2, 1, 0, 0), 18
(2, 1, 0, 1), 12
(2, 1, 1, 0), 12
(2, 1, 1, 1), 8

Note the state of maximal degree (of 22) is (2,0,0,0).

Another mathematician visits the ballpark – WHIP

This is the second in the series of blog posts inspired by the 2004 Ken Ross book entitled A Mathematician at the Ballpark. The first one is here. In this post, again, we illustrate all these notions using the Baltimore Orioles’ 2022 season.

For an experienced baseball fan, baseball is a game of patterns. We “know” what a well-executed pitch looks like, how a double play is to be executed, how a pop-up is to be fielded, and so on. Because of these expected patterns, we know the plays which should emerge and so the desire to track their occurrences should come as no surprise. It’s been done since professional baseball started in the 1870s.
This week we discuss a pitching statistic you see on televised games, WHIP. WHIP is short for “walks plus hits allowed per innings pitched”.

Earned run average

We start off with the most basic pitching statistic, the Earned Run Average or ERA. This is the number of earned runs per 9 innings pitched:

ERA = 9·ER/IP,

where 

  • IP, the number of Innings Pitched,
  • ER, the number of Earned Runs allowed by the pitcher. That is, it counts the number of runs enabled by the offensive team’s production in the face of competent play from the defensive team.

It is possible to have ERA = ∞, since innings are measured by the number of outs achieved (so if the pitcher doesn’t get any batters out, his IP=0). The lower the ERA the better the pitcher. In the 2022 season, right-handed Félix Bautista, who entered late innings as either a closer or a reliever, had an ERA of 2.19. Left-handed closer Cionel Pérez had an ERA of 1.40.

WHIP

We define walks plus hits allowed per innings pitched by:

WHIP = (BB+H)/IP,

where (as in the previous post) BB is the number of walks and H stands for the number of Hits allowed by the pitcher (so, for example, reaching base due to a fielding error doesn’t count). WHIP reflects a pitcher’s propensity for allowing batters to reach base, therefore a lower WHIP indicates better performance.

When we plot the ERA vs the WHIP for the top 20 Orioles pitchers in 2022, we get 

Again, the line shown is the line that best fits the data. As the line of best fit doesn’t fit the date too well, this tells us that these two statistical measurements aren’t too well-correlated. In other words, low ERA indicates a good pitcher and low WHIP indicates a good pitcher, but the values for “average” pitchers seem less related to each other.

Another mathematician visits the ballpark – OPS

Yes, I more-or-less stole the above title from the 2004 Ken Ross book entitled A Mathematician at the Ballpark. Like that book, anyone familiar with middle-school (or junior high school) math, should have no problem with most of what we do here. However, I will try to go into baseball in more detail than the book did.

Paraphrasing slightly, I read somewhere the following facetious remark:

From a survey of 1000 random baseball fans 

across the nation,  183% of them hate math. 

If you are one of these 183%, then this series could be for you. Hopefully, even if you aren’t a baseball expert, but you would like to learn some baseball statistics, (now often called “sabermetrics”), these posts will help. I’m no expert myself, so we’ll learn together.

In this series of blog posts, each post will introduce a particular metric in baseball statistics as well as some of the math and baseball behind it. We illustrate all these notions using the Baltimore Orioles’ 2022 season.

This week we look at one of the most popular statistics you see on televised games: OPS or “On-base Plus Slugging,” which is short for on-base percentage plus slugging percentage. Don’t worry, we’ll explain all these terms as we go. 

On-base percentage

First, On-Base Percentage or OBP is a more recent version of On-Base Average or OBA (the same as OBP but the SF term is omitted). We define 

OBP = (H + BB + HBP)/(AB + BB + HBP + SF), 

where 

  • H is the number of Hits (the times the batter reaches base because of a batted, fair ball without error by the defense), 
  • BB is the number of Base-on-Balls (or walks), where a batter receives four pitches that the umpire calls balls, and is in turn awarded first base,
  • HBP, or Hit By Pitch, counts the times this hitter is touched by a pitch and awarded first base as a result, and 
  • SF is the number of Sacrifice Flies and AB the number of At-Bats, which are more complicated to carefully define.

The official scorer keeps track of all these numbers, and more, as the baseball game is played. We still have to define the expressions AB and SF.

First, SF, or Sacrifice Flies, counts the number of fly balls hit to the outfield for which both of the following are true:

  • this fly is caught for an out, and a baserunner scores after the catch (so there must be at most one hit at the time),
  • the fly is dropped, and a runner scores, if in the scorer’s judgment the runner could have scored after the catch had the fly ball been caught.

A sacrifice fly is only credited if a runner scores on the play. (By the way, this is a “recent” statistic, as they weren’t tabulated before 1954. Between 1876, when the major league baseball national league was born, and 1954 baseball analysts used the OBA instead.)

Second, AB, or At-Bats, counts those plate appearances that are not one of the following:

  • A walk,
  • being hit by a pitch,
  • a bunt (or Sacrifice Hit, SH),
  • a sacrifice fly,
  • interference (the catcher hitting the bat with his glove, for example), or
  • an obstruction (by the first baseman blocking the base path, for example).

Incidentally, the self-explanatory number Plate Appearances, or PA, can differ from AB by as much as 10%, mostly due to the number of walks that a batter can draw.

The main terms in the OBP expression are H and AB. So we naturally expect OBP to be approximately equal to the Batting Average, defined by

BA = H/AB,

For example, if we take the top 18 Orioles players and plot the BA vs the OBP, we get the following graph:

The line shown above is simply the line of best fit to visually indicate the correlation.

Example: As an example, let’s look at the Orioles’ All-Star center fielder,  Curtis Mullins, who had 672 plate appearances and 608 at bats, for a difference of 672 − 608 = 64. He had 1 bunt, 5 sacrifice flies, he was hit by a pitch 9 times, and walked 47 times. These add up to 62, so (using the above definition of AB) the number of times he was awarded 1st base due to interference or obstruction was 64 − 62 = 2.

Mullins’ H = 157 hits break down into 105 singles, 32 doubles, 4 triples, and 16 home runs.

Second, let’s add to these his 126 strikeouts, for a total of 157+126+64 = 347.

The remaining 608 − 347 = 261 plate appearances were pitches hit by Mullins, but either caught on the fly but a fielder or the ball landed fair and Mullins was thrown out at a base.

These account for all of Mullins’ plate appearances. Mullins has a batting average of BA = 157/608 = 0.258 and an on-base percentage of OBP = 0.318.

Slugging percentage

The slugging percentage, SLG, (SLuGging) is the total bases achieved on hits divided by at-bats:

SLG = TB/AB.

Here, TB or Total Bases, is the weighted sum

TB = 1B + 2*2B + 3*3B + 4*HR,

where

  • 1B is the number of “singles” (hits where the batter makes it to 1st Base),
  • 2B is the number of doubles,
  • 3B is the number of triples, and
  • HR denotes the number of Home Runs.

On-base Plus Slugging

With all these definitions under own belt, finally we are ready to compute “on-base plus slugging”, that is the on-base percentage plus slugging percentage:

OPS = OBP + SLG.

Example: Again, let’s consider Curtis Mullins. He had 1B = 105 singles, 2B = 32 doubles, 3B = 4 triples, and HR = 16 home runs, so his TB = 105+64+12+64 = 245. Therefore, his SLG = 245/608 = 0.403, so his on-base plus slugging is OPS = OBP + SLG = 0.318 + 0.403 = 0.721.

This finishes our discussion of OPS. I hope this helps explain it better. For more, see the OPS page at the MLB site or the wikipedia page for OPS