Statistic Percentile Charts
How awesome is this chart? It’s simple, easy to understand, and imparts a swath of information all at once. I had no idea what league-average Isolate Power (ISO) was until now, but bam, there it is. This, my friends, is a thing of beauty. Until I saw this chart, I had no idea I needed all this information, but now that I’ve seen it, I want more.
I suppose I shouldn’t be surprised: this work of genius was created by Lee Panas, the writer at Tiger Tales and the author of “Beyond Batting Average”, a concise, well written book geared toward introducing everyday baseball fans to sabermetric statistics and analysis. It’s a great book and I recommend it thoroughly (although in the spirit of full disclosure, I must admit that I have a soft spot for Lee: not only am I a fellow saber-ed nut, but like Lee, I’m forced to root for my favorite ball team from afar, stuck in the wintery hellscapes of New England).
But I’m not writing this article as a book review (you can find those elsewhere); I’m writing because of that beautiful graphic up above. One of the complaints I hear most frequently from saber-newbies is that while they want to use these new statistics, they have no idea if the numbers they’re seeing are good or bad. Is a .320 wOBA good? Exactly how bad is a -5 UZR? I know it’s bad, but is it only mildly bad or tear-your-eyes-out bad? And what, pray tell, does a 4.00 tERA mean? It’s one thing to understand the theory behind the statistic, but sometimes understanding its scale can be just as challenging.
And so, I’ve taken Lee’s lead and included similar charts on each of the statistic pages here in the Library. The league-average rates are all accurate, and I’ve estimated percentiles based on the scores of all batters with more than 400 PA and pitchers with more than 90 IP. These percentiles may not be 100% accurate in all instances, but they are close enough to work as estimates in order to provide context.
Thanks again to Lee for the inspiration. If you like the charts, go check out some of his work.
Piper was the editor-in-chief of DRaysBay and the keeper of the FanGraphs Library.
This is great. Can it be broken down by position, as well?
It could, but I didn’t want to do that for every stat page since it’d a) take a massive amount of time, and b) make the pages really crowded and tough to digest. This can be something I can research and write about going forward, though.
Is this chart based on the 3.1 PA/game standard?
Okay, I see it now ” more than 400 PA and pitchers with more than 90 IP.” It might make more sense to weight by PAs and batters faced. There are many PAs by players who do not get to 400, like utility infielders, Relievers can be great and never hit 90 IP. Just a thought.
Oh, I know. When I calculate stuff for relievers, I set the bar lower…around 10 IP to get a nice big sample.
I know there are probably more accurate ways to do it, but I tested a couple of them more rigorously and found that the results come in close enough to not make a difference. The league average rates are all accurate, as I said, so the percentiles are there to add some context. If they’re off by a couple decimals, it shouldn’t affect anyone’s understanding.
Thanks for the thoughts, though.
This is going to be very useful, Steve. Thanks!
Suggestions: Typical “number of events” for stats to be significant/have predictive power. E.g, how many PAs does it take for OBP, BA, BABIP, etc to become significant? How many IP/BF for HR/9?
The main reason I ask this is because lots of different stats are used to make a point about how a player performed or will perform, with little context to indicate whether those stats are over a significant sample. While I assume we can trust FG writers to avoid articles that rely on too-small sample sizes, it would be help in evaluating articles from outside FG, our own ideas, etc.
and of course you beat me to it: http://www.fangraphs.com/library/index.php/principles/sample-size/
Maybe put this info on each of the stat primers, where appropriate?
Steve, your library is outstanding. It’s a really nice guide for people new to sabermetrics. I’m glad I was able to make a small contribution with the percentile chart idea. Your addition of player names along with percentiles is a great idea.
Thanks Lee! Again, great idea with the percentiles…I love them.
Kudos, Lee.
I’ve been clamoring for these percentile breakdowns for years! Fantastic.
Not to be pedantic, but it’s confusing that the middle row in the charts both claims to be the 50th percentile (i.e. the median) and the average. It can’t be both, since the stat distributions generally aren’t symmetric.
Also what do you mean by “estimated the percentiles?” Does it simply mean that you’re calculating the sample percentile of end-season stats for players with more than 400 PA/90 IP? If so, I think that the word “estimated” is misplaced.
I would suggest listing the percentiles as you currently do, except substituting the median for the average. The average should then be included as a separate row, as it is providing different information (it is independent of playing time definitions).
Whoops! Great points Mikkel…this is why I’m not a statistician / researcher. I just assumed the 50th percentile was average, but yeah, that’s not the case. My mistake.
Practicality-wise, it’s going to be really time consuming for me to go back and change all of those, and I simply don’t have that time right now. I will keep it in mind for the future, when I get around to doing my next batch of changes to the site. In the meantime, ignore the 50….the values in those columns are means.
And yeah, I’d say estimated is the correct word because I’m not about to guarantee that all these percentiles are 100% correct. In many cases, I had to weigh technical accuracy (using a sample of every player to reach the majors last year) over relevancy (using a sample of all MLB regulars last season). I chose to side on the relevancy side for the most part, since this is meant to be a practical learning tool more than anything, but you could make a case that they aren’t 100% accurate. I wanted to hedge my bets, and remind people to realize the limitations of the numbers.
That sounds good to me! The problem with using the word “estimate” is that it has a precise meaning in statistics. In essense, all the stats on Fangraphs are estimates of true ability. What you have done with your percentiles is to stick to a certain definition of your sample and then compute some summary statistics. That is perfectly acceptable and shouldn’t be considered any more of an “estimate” than any other number on the site.
Thanks again for the effort!
How can you make a similar chart for you’re own statistics over the course of this upcoming season?