Archive for October, 2014

Stats To Avoid: Runs Batted In (RBI)

Even the best statistics, things like wRC+, are imperfect. You can’t take wOBA as a perfect measure of truth or be certain that FIP is a perfect estimate of pitcher performance. In many cases, they may be the best we have, but we acknowledge the limitations. While it’s true that even our favorite metrics have flaws, that doesn’t mean that we should give equal considering to extremely flawed statistics.

This post will be the first in a series, scattered across the offseason months, that demonstrates the serious problems associated with some of the more popular traditional metrics. Many of you are well aware of these issues, but plenty of people are reading up on sabermetrics for the first time every day and our goal here is to create a comprehensive guide that helps everyone get the most out of everything we have to offer. Part of that puzzle is explaining why you might not want to look at things like batting average, RBI, and wins. Today, we’ll start with Runs Batted In (RBI).

Read the rest of this entry »

How To Use FanGraphs: Player Pages!

The mission of the FanGraphs Library is to make it easier for readers to understand and use our data and site. This means providing information about the statistics and principles we use, but it’s also a place to point out the various features of the site and how to get the most out of the metrics we offer. A couple of months ago, I wrote about our leaderboards and today I will discuss everything you can do on individual player pages.

We’ll be using Lorenzo Cain as an example because he’s the rising star of the moment. The pitcher pages are only different in the specific statistics they offer, but the basic format and set of features are the same.

Read the rest of this entry »

What Exactly is a Projection?

It’s always important to know exactly what question you’re asking. In baseball, one of the most difficult distinctions for many people is difference between how a player has performed and how the player is going to perform. It is very common to see analysis, even from well-versed fans of advanced metrics, that goes something like this.

“You want Player X up in this situation because he has a .380 wOBA against LHP this year.”

Even if the sample size is sufficient, that statement isn’t an ideal reflection of our expectations about the future. We frequently treat the recent past as an estimate of future skill even though we know that it isn’t. Certainly, the educated observer doesn’t need to be convinced that the ten most recent plate appearances aren’t useful information on their own, but even the last 600 PA aren’t what you want. What you want, when making a claim about the present or future, is a projection.

Now you might not always be asking a future-oriented question. If you want to decide who the best hitter in baseball was in 2014, you don’t need a projection. If you want to know who has thrown the most effective slider since 2012, you don’t need a projection. But if you want to know who is the best hitter in baseball right now or who is going to be a better signing next year, you want a projection.

A projection is a forecast about the future. It is certainly imperfect. It’s an estimate. Projecting a .400 wOBA doesn’t mean you make a $1,000 bet on that player running a .400 wOBA, it means that’s the best guess for how that player is going to perform. On average, some players will do better and some players will do worse. There’s error involved in the actual calculations, but the idea behind it is sound.

You want to make decisions about the future based on every single piece of relevant data and you want to weigh that data by its importance. Steamer projects Miguel Cabrera will have a .407 wOBA in 2015. What that means is that Steamer, based on everything it knows about Cabrera’s history and the way players typically age, we should expect a .407 wOBA. Steamer knows that Cabrera had a “down” year in 2014, but it also knows he had a great 2013 and that hitters of his caliber usually age in a certain way. It’s all built in. You don’t just care how a guy did last year or how he did in his career, you care about the entire body of work and the underlying factors that are driving it.

Think about it like the weather. You want to know if it’s going to rain today. How would you go about predicting whether or not it will rain? You would obviously pay some attention to the recent weather, but you would also look at historical weather patterns, and then you would look at the conditions in and around your area. It rained to your west last night: When that happens, how likely is it that the rain will come your way? There is a certain mix of pressure and air flow, what does that usually lead to? It’s all relevant information.

The same is true for baseball players. You care how Cabrera has hit for the last 600 PA. Those are super important data points, but they aren’t the only ones. You also care about the 600 PA before that. And before that. The older the data, the less important, but it never becomes useless. Additionally, you don’t just care about performance, you care about the underlying numbers.

If a player has a .400 wOBA with a.390 BABIP, you know most of their great season is predicated on getting lots more hits on balls in play than average. You wouldn’t automatically expect that .390 BABIP to continue, so you need to determine the typical BABIP regression for players of this type based on everything else you know about them.

You never want to make a decision based on a player’s simple past. You want to use that data to make a valid inference about the future and the process of doing so constitutes a projection. There are all sorts of different methods. Some are as simple as taking a couple years of data and weighing them by recency. Some like ZiPS, Steamer, Oliver, etc use much more advanced methodologies to estimate how well they think a player will perform using all sorts of information about that player and similar players of years past.

There is no ideal system, but the idea of projection is ideal. You care that the Royals won X number of games last month, but that doesn’t mean they’ll win X games this month. The last month is relevant, but it isn’t the whole story. Baseball is volatile and unpredictable any one sample of data is going to deviate from the true, underlying skill of a player. You want to do your best to make the best guess you can about their future and then use that to make decisions. That’s projection.

We like projections at FanGraphs. They’re useful for approximating current true talent levels and they help us predict which teams will be successful and which teams won’t. You could guess who was going to win the divisions based on the previous year’s player performances, but those players are going to perform differently this year and you want to account for that.

Many people are turned off by the idea of projection because projections seems like a black box. If you see a guy is hitting .380 wOBA this year but the projection says he’s a .340 wOBA hitter, you can’t easily internalize that a .340 wOBA hitter has produced a .380 wOBA to date. It’s human nature to assume the outcomes we observe are measures of truth, when in reality, they are influenced by randomness.

So when a stat-geek says they don’t want Player X to hit because they aren’t great, even though the player has a .350 wOBA during the last 400 PA, it doesn’t make sense. They have hit well, so they are good. But that isn’t exactly right. Their last 400 PA matter, but they don’t tell the whole story. A projection is trying to tell the whole story.

The systems aren’t perfect and the nature of the beast means they won’t get very many players exactly right, but they do a better job predicting the future than the last six weeks or six months of data will.

But it all comes down to the question. You might not care very much about predicting the future or approximating true talent. If you only care about past value, you can stick to the raw stats. But if you want to say something about how well a player is going to perform and what their true talent is, you want a projection. FanGraphs houses many of these each year and you can follow along, not only with the preseason numbers, but how they change based on the data of a new season.

Questions about projections? Ask them in the comments!

The Biggest ERA-FIP Differences of 2014

Fielding Independent Pitching (FIP) is one of the more prominently featured statistics on FanGraphs and one of the bedrocks of sabermetric analysis. We all know that FIP is an imperfect measure of pitcher performance because it assumes average results on all balls in play, but we also know that it does a better job isolating the individual pitcher’s performance than simply looking at their ERA or RA9 because it only looks at strikeouts, walks, home runs, and hit batters. It’s a very informative tool, but it’s a metric derived from a subset of results.

When a pitcher’s ERA is significantly different from their FIP, the standard credo is that they were lucky or unlucky, but there are genuine reasons why a pitcher might have results that are better or worse than their FIP. To illustrate this, let’s take a peak at the biggest FIP over and under-performers of 2014.

Read the rest of this entry »