ERA, FIP, and Answering the Right Question

August 15, 2014

One of the things baseball fans and analysts work very hard to do is isolate individual performance. At the end of a game, there is a final score that tells you how many runs each team scored. At a very basic level, that’s all that really matters. Baseball is a battle to score more runs than your opponent over the span of nine innings repeated 162 times. Yet analyzing the game requires more information than that because we want explanations. We want to know which players are good and which players aren’t so good. We care about how individual performance contributes to winning.

For pitchers, this is especially difficult because while pitchers have a huge impact on the number of runs they allow, they don’t have complete control. You can’t just look at the number of runs a pitcher allowed and say they were definitively responsible for those runs and call it a day. You aren’t isolating their performance and if you aren’t isolating individual performance you’re looking only at outcomes, and that’s not typically very interesting.

Every statistic, or really any analysis in general, should start with a question. On a basic level, the question we have is “How good is this pitcher?” which more specifically translates into “How effective is this pitcher at preventing runs?”

Typically, most fans and commentators turn to Earned Run Average (ERA) to answer this question. This tells you how many earned runs a pitcher allows per nine innings. It seems to answer the question we’ve posed, but it actually doesn’t. It’s a bit of a trap, largely because it has a sneaky name.

Implied in “earned” is the idea that ERA strips out the runs scored that were not the pitcher’s fault. This is simply not true. If a pitcher walks four batters in a row and then strikes out the next three, no one would argue that the pitcher was responsible for that particular run. But things get much trickier when we’re talking about balls that are put into play.

ERA sounds like it is stripping out the effect of poor defense by not punishing pitchers when errors are committed behind them, but errors are just a small part of the overall run prevention equation. Sure, if the shortstop boots and easy ground ball, ERA doesn’t count that baserunner against the pitcher, but defense impacts the game in ways other than errors.

Imagine a scenario in which there is a routine fly ball to left-center field. It’s a lazy fly ball that isn’t struck with any sort of particular force and 95 times out of 100, the center fielder trots over and the ball lands harmlessly in their glove. Now imagine that the center fielder, for any reason, simply doesn’t get to the ball. Maybe it was a late break, maybe they fell over, maybe they thought their left fielder was about to crash into them. It doesn’t matter, but let’s assume they don’t physically drop the baseball. No error is charged, but the play isn’t made.

It might look something like this:

That goes in the book as a hit. The pitcher induced a routine fly ball and his defense completely let him down. It wasn’t an error because that’s not how official scorers define errors, so any runs that come as a result of this play are earned runs that count against the pitcher. As a result of this play, we end up thinking Robbie Ray allowed a run to score that simply should not have scored based on how he performed. Looking at ERA actually gives us inaccurate information, as far as evaluating this individual pitcher is concerned with regards to our question.

This is a particularly extreme example selected for illustrative purposes, but defense impacts the number of runs pitchers allow every night. A slow transfer, a bad route, or a poorly executed first step can all lead to plays not being made that should be made by an average big league defender.

Two immediate responses emerge. First, these kinds of defensive miscues and dumb luck do not even out until you get about three seasons worth of balls in play. It’s not as easy as saying Ray got super unlucky in the situation above but that run will come right back to him courtesy of a diving play the following weekend. It takes a long time for that to balance out and that means we’d be making poor evaluations in the meantime. On average, there is about an even number of pitchers who are affected by good luck as there are pitchers affected by bad luck, but that’s going to be misleading if you’re looking at any one pitcher.

Second, at a fundamental level we don’t want to know how a pitcher’s defense performed behind them. If you could field of team of the best defenders at every position and run them out there behind a mediocre pitcher, he would probably allow fewer runs than a better pitcher who had to throw in front of a below average crop of fielders. Maybe you wouldn’t mistake Clayton Kershaw for Tommy Milone as a result, but you might mistake Clayton Kershaw for Jon Lester.

If our goal is to determine how well a pitcher prevents runs, you want to compare them to each other in a manner that strips out all of the factors that lead to run scoring that have nothing to do with them. You want to know how well Kershaw and Lester would perform if they had the same defenders behind them. And you’d like it if their luck was even too.

Obviously, we can’t run the kinds of experiments required to provide a definitive answer to these questions (and that wouldn’t be very much fun), but we can look at the aspects of their performance that aren’t dependent on their defense. This is where Fielding Independent Pitching (FIP) comes in.

FIP provides you with an estimate of a pitcher’s ERA based only on their strikeouts, walks, hit batters, and home runs allowed while assuming league average defense, sequencing, and luck on balls in play. Click the link above to learn more about the precise mechanics of FIP, but basic idea is that FIP will tell you more about a pitcher’s specific contributions to the run prevention process than ERA will because the difference between the two is made up almost entirely of factors that research has shown to be out of the pitcher’s control.

FIP isn’t perfect, but FIP does a much better job answering the question we want to answer. We care about measuring an individual pitcher’s contribution to the run prevention process and using a statistic like ERA which doesn’t attempt to remove defense from the equation is not going to give you an accurate representation at the individual level. FIP provides a better estimate of that performance.

In the future, we may be able to measure the speed and trajectory of every ball in play and assign it a run value, but for now, we’re left to decide how much credit we want to assign to the pitcher for each ball in play. FIP holds pitchers responsible for how often they allow a ball in play and then assumes that those balls in play fall for hits with average frequency. Some pitchers can consistently beat that average, but the vast majority can’t.

It comes down to finding a statistic that matches the question you want to answer. If your question is how well does this pitcher prevent runs, looking at the runs they’ve allowed isn’t going to do the trick, as strange as that sounds. Defense plays a huge role in determining runs allowed and crediting or debiting a pitcher based on the quality of their defense won’t help you evaluate the performance of that pitcher. FIP does a much better job.

This isn’t to say we should ignore ERA or RA9, but rather that we should understand that a pitcher cannot be judged solely by that number and that the difference between ERA and FIP can often provide you with some insight about what has happened around that pitcher. The key is that ERA isn’t telling you what you think it is at first glance and replacing it in your analysis with something that more appropriately answers your question is a big step forward.

Have FIP questions? Ask them in the comments!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG