One of the hallmark statistics available at FanGraphs is Wins Above Replacement (WAR) and we’ve just rolled out an updated Library entry that spells out the precise calculations for pitchers in more detail than ever before. There’s always been a clear sense of the the kinds of things that go into our WAR calculation, but until this point finding the specific formula for pitcher WAR has been a little complicated.
As of today, we’ve resolved that and I encourage you to go check out our basic primer on WAR and our detailed breakdown of how we calculate it for pitchers. If you’re a hands-on learner, grab a pen and paper or spreadsheet and follow along. I’m going to walk you through a complete example of how to calculate WAR for pitchers. Let’s use the 2016 version of Marcus Stroman as our exemplar. Please note that I will being rounding off certain numbers in the example to keep the page as neat as possible, so if you wind up being off by 0.1 WAR or so, don’t sweat it.
See also: Position Player Formula | Position Player Example
Read the rest of this entry »
Baseball statistics are designed to answer questions. Some questions are simple, such as “who reached base most often in 2016?” while others are more complicated, like “who was the best base runner in the American League?” Statistics allow us to gather up data points from individual events and summarize them in ways that are easy to understand.
Different statistics answer different questions and therefore have different uses. You can’t figure out who the best hitter is solely by looking at his batting average. Batting average tells you something, but batting average itself answers a very specific and limited question. Over the years, we’ve attempted to expand the statistics we use to better capture the game we love. Instead of batting average, we moved to OBP, then OPS, then wOBA, and so on. The key is to decide on your question and then find which available statistic(s) best answers that question.
One issue that comes up regularly is whether a statistic is predictive of the future or merely descriptive. You’ve likely heard that FIP is a better predictor of future ERA than current ERA. For this reason, many people believe that FIP was designed as a predictive statistic. As I discuss here, that is not accurate, but the perception persists because FIP is useful for prediction.
One fundamental aspect of baseball is how well you can describe every situation. With modern technology, this is becoming true of every sport, but the ability to categorize baseball stretches far back into history. We don’t have batted-ball type from 1951, but we know who was batting and pitching, the number of outs and base runners, the inning, the score, the park, and loads of other information for every plate appearance. The data gets better as we approach the present day, but organizing the game based on a collection of variables has a long history.
Want to know how well a batter hit against left-handed pitchers at home in 2004? That’s a question we can answer using something commonly known as “splits.” Some splits, such as handedness splits, are particularly meaningful. Others, such as day of the week splits, are entirely trivial. The compartmentalized nature of baseball, in which individual events occur in an orderly fashion, allows us to record tons of data and then sort it later however we choose.
This post covers basic information you will want to know as you dive into this realm of baseball data. If you’re an advanced consumer of baseball statistics, you probably won’t find a ton of information that’s new to you.
Earlier this year, FanGraphs began carrying shift data compiled by our friends at Baseball Info Solutions. With shifting on the rise every year, this kind of information has become more and more vital to fans and analysts. This posts walks you through how to access and use the data available on FanGraphs.
First, let’s start by describing what data is available. We have pitcher and hitter shift data going back to 2010 that is viewable on the leaderboards (for players, teams, and league) and player splits pages. The data exists for balls in play only (non-home run batted balls), so if a team shifts mid-plate appearance we only have the alignment for the final pitch. This also means we don’t have data for walks, hit batters, strikeouts, and home runs. If you want to know about how well a team deploys the shift while on defense, you want to look at pitchers. If you want to know about which hitters get shifted against, you want to look at hitters.
If you have at least a passing familiarity with sabermetrics, you’ve probably heard something like this: Fielding Independent Pitching (FIP) is what a pitcher’s ERA should have been based on his walks, hit batters, strikeouts, and home runs. In other words, FIP is described as a predictive tool to tell you what should have happened rather than as a retrospective assessment of actual pitcher performance.
But this is wrong. This is a shorthand way of describing FIP that well-meaning analysts (myself included) have used, but I’ve come to realize that by aiming to put FIP in terms of ERA, we’ve actually made it more difficult for people to grasp and embrace what FIP is really telling us. It’s time to change the way we talk about FIP, because while the concept of defense independent stats has gained popularity, there is often push back (by some) against FIP as a measure of value, in part, because of less than ideal presentation.
We feature many statistics on FanGraphs, but one of the most fundamental is Weighted On-Base Average (wOBA). If you’re not familiar with the merits of wOBA in general, I invite you to head over to our full library page on it or to learn about why it’s a gateway sabermetric statistic. For our purposes, I’ll simply include the summary:
wOBA is designed to weigh the different offensive results by their actual average contribution to run scoring. Batting average treats all hits equally and ignores walks. OBP treats all times on base equally. Slugging percentage weighs hits based on the number of bases achieved but ignores walks. Adding OBP and SLG is better than any one of AVG/OBP/SLG, but it still isn’t quite right.
Hitting a single and drawing a walk are both positive outcomes, but they have a different impact on the inning. A walk always moves each runner up one base while a single could have a variety of outcomes depending on who is on base and where the ball is hit. We want a statistic that captures that nuance.
Granted, wOBA doesn’t adjust for park, league, quality of competition, or a number of other factors, but it’s a good starting point on which to build. So how do we take the beautiful chaos of baseball and create the formula listed above?
The exact numbers are going to change each year based on the run environment (how many runs are being scored league wide), but they are consistent enough that we won’t have any problems understanding each other. I’m going to use the 2015 data, but you can view every year here. Allow me to bring a chunk of that table into this post for convenience:
You can ignore the last four columns for the purposes of wOBA, but this is a truncated version of our Guts! page and shows us each year’s league wOBA, the wOBA scale, and the weights for each of our six offensive events of interest. Hopefully those numbers will look similar to the wOBA equation you saw earlier.
Our ultimate goal is to create a statistic that measures each offensive action’s context neutral contribution to run scoring because scoring runs is the currency of baseball. We have decided that we want to measure walks, HBP, singles, doubles, triples, and home runs. If you wanted to, you could build wOBA with more nuanced stats like fly ball outs, ground outs, strikeouts, etc; it would just get more complicated without much added value.
We have a specific goal and the set of offensive actions we want to measure, but now we need a method of putting them together.
The first thing we need is a run expectancy matrix. If you need a complete introduction to the concept, head over to this page. In general, run expectancy measures the average number of runs scored (through the end of the current inning) given the current base-out state.
Base-out states are a record of the number of outs (0, 1, or 2) and how many runners are on base and where (no one on, man on 1B, men on 1B and 3B, etc). There are three out-states and eight base-states, meaning that there are 24 base-out states. Each plate appearance has a base-out state.
Let’s use one out, man on first as our example. In order to calculate the run expectancy for that base-out state, we need to find all instances of that base-out state from the entire season (or set of seasons) and find the total number of runs scored from the time that base-out state occurred until the end of the innings in which they occurred. Then we divide by the total number of instances to get the average. If you do the math using 2010-2015, you get 0.509 runs. In other words, if all you knew about the situation was that there was one out and a man on first, you would expect there to be .509 runs scored between that moment and the end of the inning on average.
You repeat the process for the other 23 base-out states and wind up with a table like this:
The table listed here was calculated by Tom Tango using 2010-2015 data for the entire league and serves as a good baseline. At FanGraphs, we park adjust the matrix for each game, so the exact numbers might be a touch different if you’re trying to play along at home in excruciating detail.
Now that you have a run expectancy matrix, you need to learn how to use it. Each plate appearance moves you from one base-out state to another. So if you walk with a man on first base and one out, you move to the “men on first and second and one out” box. That box has an RE value of 0.884. Because your plate appearance moved you from .509 to 0.884, that PA was worth +0.375 in terms of run expectancy.
Every plate appearance has one of these values, either positive or negative. You can learn more about this by following the earlier link.
What we want to determine is the average run value of a walk, HBP, single, double, triple, and home run. To do this, we take the total RE value of all walks (unintentional in this case), for example, and divide that number by the number of walks in that season. You’re going to wind up getting something around 0.3. You repeat this for the other five actions. This gives you the runs above average produced by each of these kinds of events.
In theory, we could essentially be done right now because we have everything we need to build a statistic that will weigh the offensive actions properly. However, the inventors of wOBA decided that it would probably be best to scale it to something familiar to make it easier to understand. And they picked OBP.
We have the runs above average for walks (0.29), HBP (0.31), singles (0.44), doubles (0.74), triples (1.01), and home runs (1.39), but what we want to do now is put wOBA on a scale that will look like OBP. In OBP, an out is worth zero, so the first thing we want to do is adjust the run value scale so that an out is equal to zero.
There is an easy way to do this. First, we need to find the linear weight for all outs using the same method we used to find the value for the other events. We’ll call it -0.26 for 2015. This means that an out is worth -0.26 runs less than the average PA when it comes to run expectancy. What we want to do now is add 0.26 to each of our run values so that outs are equal to zero. So for walks, which we said are worth 0.29 runs above average, we bump those up to 0.55 runs relative to an out. Using linear weights, walks are worth 0.55 runs more than outs. We repeat this for each of the five other positive offensive outcomes.
As you’ll notice, these are not the weights you saw in the wOBA equation. We’re not done scaling them yet. We know that we want BB, HBP, 1B, 2B, 3B, and HR in the numerator of the wOBA formula and plate appearances (minus weird stuff like sac bunts) in the denominator, so what we’re going to do is calculate “wOBA” for the entire league using the linear weights in this table and the total number of events of each type.
In other words, we’re gong to multiply 0.55 times the number of walks in MLB in 2015 and add that to 0.57 times the number of HBP and so on, and then divide the entire sum by the number of plate appearances (really AB + BB – IBB + SF +HBP). If we do that, we wind up with 0.250.
But remember that we want wOBA to look like OBP. So we need to scale the entire thing so that the league’s wOBA is .313 (to match OBP with IBB removed). To do that, we divide .313/.250 and get 1.251, which we call the wOBA Scale.
We take the wOBA Scale and multiply it against the linear weights from the table above and viola, we have ourselves the weights listed in the wOBA equation. And we’re done!
It’s important to remember that wOBA is one implementation of a linear weights based offensive metric. Baseball Prospectus has their own version, True Average, which is based on the same pillars and implemented differently. Choosing to scale it to OBP is an aesthetic choice. We could scale it to batting average or to nothing. The important part is just that we understand the scale we’re using.
The main idea is that we’re giving each type of outcome a value based on the average change in run expectancy that particular outcome yields. The idea is to give the right amount of credit to each kind of event. Doing so does not make wOBA a perfect statistic, it simply makes it a better one than the traditional AVG/OBP/SLG.
There are lots of little nuances you can add to something like wOBA to get it closer and closer to the truth. All we’re doing here is creating the foundation for all of that work.
Once upon a time, all we had were box scores. We might know a player went 1-3 with a double and a walk, but we wouldn’t know how exactly all of the game’s events unfolded. We’ve come a long way since then, getting play-by-play data, pitch-by-pitch data, video tracking, PITCHf/x, and Statcast. We have results data stretching back more than a century, but the way those results came about gets easier to understand with new information.
What direction was the double hit? How far did it go? Who fielded it? Hearing a player hit a double seems like specific information, but there’s plenty more you might want to know about that event. One of the ways we communicate that information is through Spray Charts.
There are certainly other ways to communicate information of this nature, but one implementation is to display it visually on a diamond graphic and you can find our implementation of spray charts on the player pages here at FanGraphs.
Pitchers and catchers are reporting for Spring Training this week and, before you know it, there will be real, live baseball happening in Arizona and Florida. As the season approaches, I’d like to take a little time to welcome any statistical newcomers to the FanGraphs Library.
If you’ve made it this far, you’ve almost certainly read articles on the main site, visited some of our player pages or leaderboards, or played our fantasy baseball game, Ottoneu. But what you might not know is that we have an entire section of the site devoted to helping you get the most out of the information housed at FanGraphs.
The most well-known components of the Library are detailed descriptions of the statistics available at the site. These pages are written presuming no previous knowledge of sabermetrics, statistical theory, or mathematics. If you understand the rules of the game, you’ll have no trouble following along. For example, if you come across “wOBA” in an article or on one of the stat pages and have no idea that it stands for, our Library entry is here to help. Not only can you find a basic description of that stat, but there is also a detailed breakdown of how to calculate it, how to use it, why it is important, and all sorts of other information that will help you get more out of the site.
While it’s January and many free agents have decided where they will be playing in 2016 and beyond, there are still some notable players without new teams. One thing I’m struck by each offseason is how frequently some people comment on new contracts without a good grasp of how teams and players settle on a term of years and dollars. In particular, it’s common to hear these comments from pundits and fans who aren’t quite as plugged into the game as regular readers of sites like FanGraphs.
So for the new reader, or the old one looking to explain the finer points to their friends, here are some basic principles about free agent contracts to remember when thinking about their prudence.
This time of year is about roster decisions. Teams are working to build their 2016 rosters with an eye on how 2016 fits into their overall plan. Some teams are looking at their current roster and payroll and deciding to go for it, while others are setting themselves up for a bright future. Clubs are making trades and signing free agents, and from the outside, we’re trying to figure out which moves are good and which aren’t.
There are a lot of factors that go into evaluating a particular transaction or set of transactions. Far too many to talk about all at once. But we can generally agree that our attempt to forecast future player performance is central to any effort. In order to know if the Cubs made a smart move in signing Ben Zobrist, we need to develop some prediction about how good Zobrist will be over the life of his four-year deal. Obviously, this is a tricky business.
We are trying to project Zobrist’s future. We’ve talked about projections in this space before. They are estimates of true talent, adjusted for aging. You can read more about the basics here, but this article will focus on the aging component. In order to make decisions about players, we need to know how good they are presently and how those skills will improve or decline in the future.