The Beginner’s Guide To Deriving wOBA
We feature many statistics on FanGraphs, but one of the most fundamental is Weighted On-Base Average (wOBA). If you’re not familiar with the merits of wOBA in general, I invite you to head over to our full library page on it or to learn about why it’s a gateway sabermetric statistic. For our purposes, I’ll simply include the summary:
wOBA is designed to weigh the different offensive results by their actual average contribution to run scoring. Batting average treats all hits equally and ignores walks. OBP treats all times on base equally. Slugging percentage weighs hits based on the number of bases achieved but ignores walks. Adding OBP and SLG is better than any one of AVG/OBP/SLG, but it still isn’t quite right.
Hitting a single and drawing a walk are both positive outcomes, but they have a different impact on the inning. A walk always moves each runner up one base while a single could have a variety of outcomes depending on who is on base and where the ball is hit. We want a statistic that captures that nuance.
Granted, wOBA doesn’t adjust for park, league, quality of competition, or a number of other factors, but it’s a good starting point on which to build. So how do we take the beautiful chaos of baseball and create the formula listed above?
The exact numbers are going to change each year based on the run environment (how many runs are being scored league wide), but they are consistent enough that we won’t have any problems understanding each other. I’m going to use the 2015 data, but you can view every year here. Allow me to bring a chunk of that table into this post for convenience:
Season | wOBA | wOBAScale | wBB | wHBP | w1B | w2B | w3B | wHR | runSB | runCS | R/PA | R/W | cFIP |
2015 | .313 | 1.251 | .687 | .718 | .881 | 1.256 | 1.594 | 2.065 | .200 | -.392 | .113 | 9.421 | 3.134 |
2014 | .310 | 1.304 | .689 | .722 | .892 | 1.283 | 1.635 | 2.135 | .200 | -.377 | .108 | 9.117 | 3.132 |
2013 | .314 | 1.277 | .690 | .722 | .888 | 1.271 | 1.616 | 2.101 | .200 | -.384 | .110 | 9.264 | 3.048 |
2012 | .315 | 1.245 | .691 | .722 | .884 | 1.257 | 1.593 | 2.058 | .200 | -.398 | .114 | 9.544 | 3.095 |
2011 | .316 | 1.264 | .694 | .726 | .890 | 1.270 | 1.611 | 2.086 | .200 | -.394 | .112 | 9.454 | 3.025 |
2010 | .321 | 1.251 | .701 | .732 | .895 | 1.270 | 1.608 | 2.072 | .200 | -.403 | .115 | 9.643 | 3.079 |
You can ignore the last four columns for the purposes of wOBA, but this is a truncated version of our Guts! page and shows us each year’s league wOBA, the wOBA scale, and the weights for each of our six offensive events of interest. Hopefully those numbers will look similar to the wOBA equation you saw earlier.
Our ultimate goal is to create a statistic that measures each offensive action’s context neutral contribution to run scoring because scoring runs is the currency of baseball. We have decided that we want to measure walks, HBP, singles, doubles, triples, and home runs. If you wanted to, you could build wOBA with more nuanced stats like fly ball outs, ground outs, strikeouts, etc; it would just get more complicated without much added value.
We have a specific goal and the set of offensive actions we want to measure, but now we need a method of putting them together.
Run Expectancy
The first thing we need is a run expectancy matrix. If you need a complete introduction to the concept, head over to this page. In general, run expectancy measures the average number of runs scored (through the end of the current inning) given the current base-out state.
Base-out states are a record of the number of outs (0, 1, or 2) and how many runners are on base and where (no one on, man on 1B, men on 1B and 3B, etc). There are three out-states and eight base-states, meaning that there are 24 base-out states. Each plate appearance has a base-out state.
Let’s use one out, man on first as our example. In order to calculate the run expectancy for that base-out state, we need to find all instances of that base-out state from the entire season (or set of seasons) and find the total number of runs scored from the time that base-out state occurred until the end of the innings in which they occurred. Then we divide by the total number of instances to get the average. If you do the math using 2010-2015, you get 0.509 runs. In other words, if all you knew about the situation was that there was one out and a man on first, you would expect there to be .509 runs scored between that moment and the end of the inning on average.
You repeat the process for the other 23 base-out states and wind up with a table like this:
Runners | 0 outs | 1 outs | 2 outs |
__ __ __ | 0.481 | 0.254 | 0.098 |
1B __ __ | 0.859 | 0.509 | 0.224 |
__ 2B __ | 1.100 | 0.664 | 0.319 |
1B 2B __ | 1.437 | 0.884 | 0.429 |
__ __ 3B | 1.350 | 0.950 | 0.353 |
1B __ 3B | 1.784 | 1.130 | 0.478 |
__ 2B 3B | 1.964 | 1.376 | 0.580 |
1B 2B 3B | 2.292 | 1.541 | 0.752 |
The table listed here was calculated by Tom Tango using 2010-2015 data for the entire league and serves as a good baseline. At FanGraphs, we park adjust the matrix for each game, so the exact numbers might be a touch different if you’re trying to play along at home in excruciating detail.
Linear Weights
Now that you have a run expectancy matrix, you need to learn how to use it. Each plate appearance moves you from one base-out state to another. So if you walk with a man on first base and one out, you move to the “men on first and second and one out” box. That box has an RE value of 0.884. Because your plate appearance moved you from .509 to 0.884, that PA was worth +0.375 in terms of run expectancy.
Every plate appearance has one of these values, either positive or negative. You can learn more about this by following the earlier link.
What we want to determine is the average run value of a walk, HBP, single, double, triple, and home run. To do this, we take the total RE value of all walks (unintentional in this case), for example, and divide that number by the number of walks in that season. You’re going to wind up getting something around 0.3. You repeat this for the other five actions. This gives you the runs above average produced by each of these kinds of events.
In theory, we could essentially be done right now because we have everything we need to build a statistic that will weigh the offensive actions properly. However, the inventors of wOBA decided that it would probably be best to scale it to something familiar to make it easier to understand. And they picked OBP.
Scaling
We have the runs above average for walks (0.29), HBP (0.31), singles (0.44), doubles (0.74), triples (1.01), and home runs (1.39), but what we want to do now is put wOBA on a scale that will look like OBP. In OBP, an out is worth zero, so the first thing we want to do is adjust the run value scale so that an out is equal to zero.
There is an easy way to do this. First, we need to find the linear weight for all outs using the same method we used to find the value for the other events. We’ll call it -0.26 for 2015. This means that an out is worth -0.26 runs less than the average PA when it comes to run expectancy. What we want to do now is add 0.26 to each of our run values so that outs are equal to zero. So for walks, which we said are worth 0.29 runs above average, we bump those up to 0.55 runs relative to an out. Using linear weights, walks are worth 0.55 runs more than outs. We repeat this for each of the five other positive offensive outcomes.
Event | Run Value |
BB | 0.55 |
HBP | 0.57 |
1B | 0.70 |
2B | 1.00 |
3B | 1.27 |
HR | 1.65 |
As you’ll notice, these are not the weights you saw in the wOBA equation. We’re not done scaling them yet. We know that we want BB, HBP, 1B, 2B, 3B, and HR in the numerator of the wOBA formula and plate appearances (minus weird stuff like sac bunts) in the denominator, so what we’re going to do is calculate “wOBA” for the entire league using the linear weights in this table and the total number of events of each type.
In other words, we’re gong to multiply 0.55 times the number of walks in MLB in 2015 and add that to 0.57 times the number of HBP and so on, and then divide the entire sum by the number of plate appearances (really AB + BB – IBB + SF +HBP). If we do that, we wind up with 0.250.
But remember that we want wOBA to look like OBP. So we need to scale the entire thing so that the league’s wOBA is .313 (to match OBP with IBB removed). To do that, we divide .313/.250 and get 1.251, which we call the wOBA Scale.
We take the wOBA Scale and multiply it against the linear weights from the table above and viola, we have ourselves the weights listed in the wOBA equation. And we’re done!
Concluding Thoughts
It’s important to remember that wOBA is one implementation of a linear weights based offensive metric. Baseball Prospectus has their own version, True Average, which is based on the same pillars and implemented differently. Choosing to scale it to OBP is an aesthetic choice. We could scale it to batting average or to nothing. The important part is just that we understand the scale we’re using.
The main idea is that we’re giving each type of outcome a value based on the average change in run expectancy that particular outcome yields. The idea is to give the right amount of credit to each kind of event. Doing so does not make wOBA a perfect statistic, it simply makes it a better one than the traditional AVG/OBP/SLG.
There are lots of little nuances you can add to something like wOBA to get it closer and closer to the truth. All we’re doing here is creating the foundation for all of that work.
Neil Weinberg is the Site Educator at FanGraphs and can be found writing enthusiastically about the Detroit Tigers at New English D. Follow and interact with him on Twitter @NeilWeinberg44.
Hey,
I am new to advanced stats. Is someone able to explain to me why Intentional Walks is not included in the calculation for wOBA? Unless I am not understanding… it looks like IBB is isolated from the calculation?
Thanks
IBB are generally worth much less than uBB using this method because IBB occur when walks are least damaging to the pitcher (first base open). wOBA treats them as if they never happened at all, but when we use wOBA to build wRAA/Batting Runs/WAR, we are multiplying wOBA*PA (essentially) and are giving them credit for their average wOBA for each IBB (this has the effect of giving more credit to good hitters who get IBB than bad hitters who get IBB). You could create a wOBA that includes IBB, but they would be much less valuable than a normal walk.
I have a question about the paragraph below in your article, which I really enjoyed reading. In the beginning of the paragraph, you state that you’re determining “the average run value” of six different offensive events. Then, you focus on how to calculate the average RE value of a walk. [Are “run value” and “RE value” the same thing?] But at the paragraph’s end, you state that the results are “runs above average produced by each of these kinds of events.” What is the difference between the “average run value” and “runs above average produced” by each event? Further, why is the final result called “runs above average produced” by each event? What is the average it is above?
“What we want to determine is the average run value of a walk, HBP, single, double, triple, and home run. To do this, we take the total RE value of all walks (unintentional in this case), for example, and divide that number by the number of walks in that season. You’re going to wind up getting something around 0.3. You repeat this for the other five actions. This gives you the runs above average produced by each of these kinds of events.”
Apologies for the confusion. In that paragraph, they are all the same thing. Each PA has an RE value, and you want to determine the average RE/PA for all singles (for example). RE is calculated as runs relative to the average plate appearance, so RE/PA is centered on zero as average. The rest of the post then goes through how to adjust those numbers to make the scale work.
1. How is “the average plate appearance” for all singles determined?
2. It’s interesting that “RE/PA is centered on zero as average.” For example, in 2015 there were 183,627 plate appearances and 20,467 runs scored. Thus, the average number of runs per plate appearance that year was 0.1124. In calculating the average RE/PA that year for each offensive event, was 0.1124 subtracted from each? Or was it done some other way to center on zero?
1. You take the actual RE change for every single in the league (i.e. the difference between the RE at the beginning of the PA and the end of the PA) and divide it by the number of singles for that year.
2. Think of it this way. When you start an inning you start at 0 outs, 0 men on base, right? That has a value of about .47 runs. If you do the math, this is the average number of runs scored per inning in MLB. This makes sense because every inning starts in that state, so any time a team scores more than that they have an above average inning and anytime they score 0 it is a below average inning. Because you go into an inning starting at expected runs of 0.47, everything comes out to runs above or below average.