FIP May Look Like ERA, But It’s Designed Like wOBA
If you have at least a passing familiarity with sabermetrics, you’ve probably heard something like this: Fielding Independent Pitching (FIP) is what a pitcher’s ERA should have been based on his walks, hit batters, strikeouts, and home runs. In other words, FIP is described as a predictive tool to tell you what should have happened rather than as a retrospective assessment of actual pitcher performance.
But this is wrong. This is a shorthand way of describing FIP that well-meaning analysts (myself included) have used, but I’ve come to realize that by aiming to put FIP in terms of ERA, we’ve actually made it more difficult for people to grasp and embrace what FIP is really telling us. It’s time to change the way we talk about FIP, because while the concept of defense independent stats has gained popularity, there is often push back (by some) against FIP as a measure of value, in part, because of less than ideal presentation.
The central problem is that FIP is sold as a better version of ERA that is based on the outcomes a pitcher controls independent of his defense. The problem with this presentation — a problem its inventor, Tom Tango, likely understood when he created it — is that FIP is actually nothing like ERA in terms of how it’s created. ERA tracks the number of earned runs (you can apply this to RA9 too) credited to the pitcher. ERA and RA9 are simple, results-based, context dependent stats. ERA is like RBI or runs scored. It doesn’t matter what caused a run to score as long as it did score.
To make a more flattering comparison, ERA is like a less sophisticated RE24 or WPA. ERA fails to properly account for runners left on based for relievers and unearned runs, but ERA and RE24 don’t care how well you pitched to individual batters; they care how those individual plate appearances blended together. (As an aside, RA9 is superior to ERA but that’s for another day.)
FIP doesn’t care if the runs score or don’t score, it cares what you did during individual plate appearances. In this way, FIP is much more like Weighted On-Base Average (wOBA) than it is ERA. In fact, there’s a good case to be made that FIP is exactly like wOBA in it’s construct.
In wOBA, batters get credit for BB, HBP, 1B, 2B, 3B, and HR based on the relative value of each of those events using a methodology called Linear Weights. Putting wOBA on a scale equal to OBP wasn’t the only way to implement a stat like this. Baseball Prospectus has a similar stat scaled to batting average. It could be scaled to anything! Scaling is just about presentation.
FIP is the same way. FIP gives pitchers credit based on the individual batter-pitcher results, not the results for the full inning. You could simply look at pitcher wOBA, but research by Voros McCracken found that pitchers have very little control over results on balls put into play. In other words, it doesn’t seem like pitchers have much control over the rates at which batted balls become outs or hits.
There is disagreement about this point, but it’s secondary to this particular argument. If you think pitchers can truly control their BABIP, that is a separate question. If you think they can truly influence weak contact that leads to more outs on balls in play, FIP just isn’t going to be the method for you and you could turn to pitcher wOBA. But that doesn’t mean FIP fails as a retrospective statistic.
Pitchers control whether the outcome is a strikeout, walk/hit batter, home run, or ball in play independent of their defense. There’s really no argument here (ignoring the catcher). So what Tom Tango did with FIP is used linear weights to determine the relative values of walks/hit batters, strikeouts, home runs, and balls in play. He could have stopped right there and had a statistic that weighed these four categories per plate appearance, but he chose to scale it to ERA to make it easier to determine whether a pitcher had a good FIP. I think has led to the problematic language we use to talk about FIP.
FIP classifies pitching outcomes into four buckets and judges pitchers based on the relative value of those outcomes in a context neutral way. We use FIP instead of pitcher wOBA because of McCracken’s BABIP findings, but we use FIP/wOBA when we are trying to measure pitchers based on individual events, not in terms of how particular innings unfolded.
FIP is designed so that every ball in play is treated the same. wOBA is designed so that every single, etc is treated the same. ERA and RE24 treat singles differently depending on what happened in the previous plate appearance. That might be a worthwhile question, but it is a fundamentally different question. And this is why I think we made a mistake in scaling FIP to ERA. It makes people think it should behave the way ERA behaves when it, for better or worse, behaves like wOBA.
I am not arguing that this is a normative good, just that it is the truth. You might think that this isn’t a good way to talk about pitcher value, but it is a way to talk about pitcher value. FIP isn’t designed to predict the future, it’s designed to reflect the past based on those four buckets of outcomes. It is a measure of how the pitcher actually pitched.
It’s probably a worthwhile pursuit to measure players with context neutral and context dependent stats. I’m not going to advocate for one or the other in this article, but I think it’s important to understand the difference as it relates to FIP. No one would say wOBA isn’t what actually happened and you shouldn’t say it about FIP. FIP isn’t trying to be ERA, we’ve forced it into the ERA mold for the wrong reasons.
Fortunately, we have minus stats. FIP- uses the same inputs and spits out a number relative to league average. Saying a pitcher’s fielding independent numbers are 10% better than average is a better way to communicate the concept than saying he has a 3.60 FIP. Saying “3.60 FIP” implies we’re trying to tell you something about his ERA and that’s not really what we’re doing. FIP is saying that if you chose to judge a pitcher in a context-neutral way based on these four buckets of outcomes, this is how well he has pitched. Maybe you don’t care about that question, but it’s time for us to separate FIP and ERA in the way that we separate wOBA and RE24.
Neil Weinberg is the Site Educator at FanGraphs and can be found writing enthusiastically about the Detroit Tigers at New English D. Follow and interact with him on Twitter @NeilWeinberg44.
I agree with the analogy of FIP to wOBA. They both:
1. use a subset of performances(*)
2. weight the values based on their run impact, regardless of when they happened
3. scaled to a common scale
(*) FIP ignores batted balls in field of play, and baserunner movement (SB, CS, WP, etc). wOBA ignores baserunner movement.
It seems worth mentioning that part of the reason that FIP is occasionally used as an ERA predictor is that it happens to be more predictive of future ERA than ERA itself. Even if it wasn’t designed for the purpose, it happens to be the “best” tool for the job quite often — “best” defined as some combination of accuracy, availability, and explainability, and familiarity.