The Biggest ERA-FIP Differences of 2014 by Neil Weinberg October 6, 2014 Fielding Independent Pitching (FIP) is one of the more prominently featured statistics on FanGraphs and one of the bedrocks of sabermetric analysis. We all know that FIP is an imperfect measure of pitcher performance because it assumes average results on all balls in play, but we also know that it does a better job isolating the individual pitcher’s performance than simply looking at their ERA or RA9 because it only looks at strikeouts, walks, home runs, and hit batters. It’s a very informative tool, but it’s a metric derived from a subset of results. When a pitcher’s ERA is significantly different from their FIP, the standard credo is that they were lucky or unlucky, but there are genuine reasons why a pitcher might have results that are better or worse than their FIP. To illustrate this, let’s take a peak at the biggest FIP over and under-performers of 2014. There are all sorts of reasons why a particular pitcher might end up with a better or worse ERA than their FIP over the course of a season. Some of it is simply random variation. Anytime you have a statistic that estimates another, it won’t be perfect. So a 3.50 ERA with a 3.59 FIP is not worth noticing. You want to make note of large differences that are on the order of 0.30 runs or more. That’s not a hard and fast rule, but if it’s much less than that you just aren’t looking at anything very unusual. Beyond simple variation, here are some factors that can influence the ERA-FIP difference that aren’t “luck.” Defense Bullpen (stranding inherited runners or not) Sequencing (the order in which you allow hits/walks/outs) Quality of contact allowed Type of contact allow (batted ball type) Ability to control the running game There are probably other factors that play a role, but those are the big ones. The goal of FIP is to strip out the first two on this list. You want the statistic to be defense independent and you’d prefer if it could normalize what happens to the base runners you leave on the bases when a reliever follows you into the game. We don’t want to judge pitchers based on sequencing either, but it’s perfectly defensible if you want to attribute it to them when making retrospective evaluations, like in WAR. You would absolutely like to judge the quality and type of contact allowed, but FIP treats every ball in play the same. It’s a trade off you make in order to remove defense and one that you make because we know that a pitcher only has limited control over his BABIP. Finally, you would like to credit or debit a pitcher for how he holds runners, but again, FIP doesn’t control for that. To put it all together, the difference between ERA and FIP is not solely based on defense or luck. Some pitchers can influence the difference depending on their various skills and we’re always working to estimate just how much pitchers can do that. Now that we have a sense of the potential factors, let’s take a look at the pitchers who had the largest positive and negative ERA-FIP differences in 2014. With the dubious honor of the largest “under-performance,” Clay Buchholz comes in with a 5.33 ERA and 4.01 FIP (1.33 difference). On the other side, it’s Doug Fister’s 2.41 ERA and 3.93 FIP (-1.52 difference). You can use the Custom Leaderboards to compare their seasons. Let’s run through the different factors to see if we’re looking at luck or something meaningful, and then if that something meaningful is about the pitcher or the factors we want to remove from the analysis. Defense We can’t look directly at the precise defensive performance behind each player because that data isn’t available, but we can get a sense of how good their defenses were in general. The Nationals had +10 DRS and -6.3 UZR in 2013 with worse performances coming in the outfield. On average, the entire pitching staff had a slightly better ERA than FIP, but a good deal of that is driven by Fister’s large gap. Their BABIP allowed is .294 (including Fister’s .262 in 164 innings). We can’t say for sure, but it doesn’t appear as if the Nationals defense is extreme enough (think Tigers or Royals) to seriously influence a pitchers ERA-FIP difference. Certainly, defensive performance is asymmetric, but there aren’t signs that this is obviously about defense for Fister. Turning to Buchholz, the Red Sox pitchers didn’t have a big gap across the board and most of the starters had small ERA-FIP differences. Even more surprising is that the Sox had +27 DRS and a +48.5 UZR, so there isn’t much evidence that Buchholz is getting killed by his defense. Again, we can’t say for sure, but it doesn’t look like we’re getting a big effect. One small factor is that Fister allowed a slightly higher percentage of unearned runs than Buchholz. You could play with the numbers and maybe bring his ERA up to 2.70 if you wanted to normalize that effect. It’s hard to say, but Fister has a 2.85 RA9 and Buchholz has a 5.71 RA9. Do with that what you will. Bullpens There’s a very simple way to evaluate this effect. If you leave a runner on first base every time you get pulled, but the bullpen allows them to score every time, all of those runs aren’t really your fault. You should really only be charged for the expected number of runs, which we can approximate using RE24 on a per 9 inning scale. We’ll also have to park adjust their RA9. Here’s what we get (Note that these are park adjusted differently, so just use the numbers as a rough guide). Fister – 2.85 pRA9, 3.04 RE24/9 Buchholz – 5.48 pRA9, 5.24 RE24/9 So Fister’s bullpen is helping him a little bit and Buchholz’s is hurting him a little more. Fister left 10 men on the bases for his bullpen and only one scored. Buchholz left 21 on and 11 scored, so we have confirmed an effect! Sequencing This is a trickier one to measure, but we can use LOB% as an approximation. Buchholz’s has a very poor 62.1% and Fister’s is a robust 83.1%. Can we attribute this solely to sequencing? No. But the idea of sequencing is important. A single, single, home run, out, out, out in one inning counts for three runs while home run, single, single, out, out, out counts for just one run even though they are identical in everything but order. There isn’t evidence that pitchers can control the order of events, but you’re welcome to give them credit for it if you wish looking back. That said, the sequencing argument seems to be playing a role in the Fister and Buchholz ERA-FIP differences. Quality and Type of Contact So we can’t say very much about the quality of contact based on the data we have publicly available. It stands to reason that higher quality contact will lead to hits more often (we know that it does on average), but we don’t know how these factors play into Fister and Buchholz’s seasons. We can look at the type of contact, however. Ground balls go for hits more often than fly balls but less often than line drives, but they also don’t go for extra bases very much. The two pitchers had identical 34% fly ball rates and Fister’s two point lead in ground ball rate (about 49% to 47%) transfers to two extra points of line drive rate for Buchholz (19% to 17%). So you would expect a higher BABIP from Buchholz than Fister, but probably not a 50 point difference based only on trajectory. We think of Fister as a guy who induces weak contact, but we can’t say for sure exactly how much this matters. The Running Game This is one of FIP’s obvious limitations. If you allow a runner to reach base, but you also make it very easy for them to steal a base or get big jumps, you will allow more runs than if you are great at keeping them close to the base. And since we know that pitchers have a big influence over this part of the game, they should get credit for this. Fister and Buchholz both register with +1 stolen base runs, with Buchholz allowing six steals in 11 attempts (and 277 total opportunities). Fister allowed zero steals in one attempt (and 228 total chances). Some of this relates to the catcher, but it appears as if Fister probably gets a bit of a bump from his ability to control the running game and Buchholz is probably neutral. Summary What have we learned from this exercise? Their bullpens and sequencing certainly seem to have a big effect on things with an unknown amount coming from the quality of contact. It makes sense that Fister is beating his FIP and Buchholz isn’t, but we can’t know for sure exactly how much influence each factor has. Typically speaking, there are pitchers who have the right skills to allow them to beat their FIP consistently, but those combinations are relatively rare. Lefty, fly ball pitchers who don’t allow home runs come to mind, but even those guys on average don’t beat their FIP by more than a run per season most years. Obviously, some of Fister and Buchholz’s differences are simple randomness, but they do seem to also feature a few of the characteristics of guys who should have done a little better or worse in 2014. Fister’s beaten his FIP by about 20 points for his career, but Buchholz is also on the positive side of things. Some of this is a repeatable skill and some is luck and defense. It should serve as a reminder that the ERA-FIP gap is not simply about luck that will regress. There are lots of reasons why a pitcher’s FIP might be better or worse that you can identify, and then you can decide which of them should be credited to the pitcher and which should be left alone. Evaluating pitchers is very difficult. FIP is a great tool for doing so, but looking at in the context of ERA and RA9 can help you come to more useful conclusions. It’s a bit of work, but it will lead you to a much better place. Have questions? Ask them in the comments!