The Beginner’s Guide To Splits
One fundamental aspect of baseball is how well you can describe every situation. With modern technology, this is becoming true of every sport, but the ability to categorize baseball stretches far back into history. We don’t have batted-ball type from 1951, but we know who was batting and pitching, the number of outs and base runners, the inning, the score, the park, and loads of other information for every plate appearance. The data gets better as we approach the present day, but organizing the game based on a collection of variables has a long history.
Want to know how well a batter hit against left-handed pitchers at home in 2004? That’s a question we can answer using something commonly known as “splits.” Some splits, such as handedness splits, are particularly meaningful. Others, such as day of the week splits, are entirely trivial. The compartmentalized nature of baseball, in which individual events occur in an orderly fashion, allows us to record tons of data and then sort it later however we choose.
This post covers basic information you will want to know as you dive into this realm of baseball data. If you’re an advanced consumer of baseball statistics, you probably won’t find a ton of information that’s new to you.
Basics
The most important thing to remember when dealing with splits is that the results are only as meaningful as the underlying concept. For example, there is a long history of research and observation indicating that most hitters do better against pitchers of the opposite hand (LHH vs RHP and RHH vs LHP) than against pitchers of the same hand (LHH vs LHP and RHH vs RHP). The inverse is true for pitchers. This is known as a platoon split. In other words, the handedness of the batter and pitcher have a meaningful impact on the likely outcome of the plate appearance.
On the other hand, some splits are simply a product of random variation. Day of the week splits (Monday, Tuesday, etc) are the clearest example of the other extreme. There’s no underlying reason why a player would perform better on certain days than others, but if you carve performance up into seven groups, you will find differences simply due to the fluctuation that occurs in the game. Splits like these don’t tell us anything about the player other than what happened over a given period of time. They can be interesting, but they are much different than a guy’s numbers against lefties, for instance.
Second, it’s important to think about sample size. As you likely know, you learn a lot more about a player in 400 PA than you do in 40. This is true over a full season or within a split. You need to pay attention to the sample size for each part of the split and not the overall sample size. A platoon split over 300 total PA won’t tell you very much because you’re likely only looking at 75-100 PA against lefties, which is simply not enough to tell you much about the hitter. You also have to be careful about using more and more seasons to increase your sample size because player talent changes and data from five years ago isn’t as informative as data from this year.
Third, you have to pay close attention to the potential for variance in the quality of competition. This is perhaps most evident for left-handed relievers, who come into the game to face lefties and often get pulled before a good righty comes to the plate. For this reason, the average quality of the right-handed opponents for this reliever are lower than the average right-handed hitter. This means that their true talent against righties is probably worse than you see in their statistics because the very best righties aren’t part of the sample. This can manifest in other ways, but the platoon issue is the most obvious.
For these reasons, it’s important to note that observed splits (i.e. the actual stats) are not a reflection of a player’s true talent splits (i.e. what you would expect going forward). Now, you might be interested in observed splits for informational reasons, but it’s important not to conflate the two. Just because a player has a .340 wOBA with runners in scoring position compared to a .310 wOBA without men in scoring position, it does not mean that his true talent split is +.030 even if he does have an RISP skill. Observed splits are one part true talent and one part random variation.
A great cautionary tale is batter-pitcher matchup stats. You see these featured on lots of broadcasts, but most research has found they are not predictive of future matchups. This is because these matchup stats are based on a small number of PA and occur over a long period of time in which the players change. If you faced a guy 100 times over two weeks, that might be useful data, but if you face him 40 times over five years, you’re just not going to have a lot to work with. The idea that a certain pitcher is good against a certain batter makes plenty of sense, but you’re likely not going to be able to tell the real from the random by looking at matchup stats.
Types of Splits
There are many different kinds of splits you can apply to a player’s numbers. Handedness splits are quite common, but you can look at all kinds of things:
- Home/away/ballpark,
- Situational (men on base, bases empty, etc),
- Times through the order,
- Role (starter/reliever),
- Positional (as 1B, as 2B, etc),
- Batted-Ball (on grounders, on fly balls, pulled, hard-hit, etc),
- Against certain teams/opponents,
- Leverage (low leverage, high leverage, etc),
- Time (April, May, last week, since June 15, etc),
- Batting order (hitting 2nd, hitting 8th, etc),
- and many others.
Things like platoon splits and times through the order can be quite informative, but even the splits that don’t carry a lot of weight can have some value. How well a player performs against a particular team usually doesn’t tell you much about how they will perform in the future, but if they have been particularly bad for a long period of time, that might tell you that the opposing team has a good scouting report or something. Even when the splits are largely trivial, there can be something useful hidden in there.
Summary
When dealing with splits, it’s important to pay attention to sample size and the underlying concept the split is conveying. If a player has a large platoon split over 1,500 PA, there’s a good chance that split is communicating something meaningful. It’s also important to think about the data contained in the split, as some players are selected in/out of a situation because of the very split you want to test. If you’re terrible against left-handed pitching, you’re likely never going to face the best left-handed pitching.
You can find splits on our player pages or on the dropdown menu in the center of our leaderboard dash board, or at the many other fine sites that publish baseball statistics. Don’t put too much stock into a split unless you’ve really studied it, but even the ones that are meaningless can still be fun to explore.
Neil Weinberg is the Site Educator at FanGraphs and can be found writing enthusiastically about the Detroit Tigers at New English D. Follow and interact with him on Twitter @NeilWeinberg44.
How do I make a table showing most home runs over x number of consecutive games going back to the live ball era?
We don’t offer that option with any of our data. You might be able to do that with B-Ref’s Play Index, but you will probably have to use some sort of Retrosheet data and a database program.