The Projection Rundown: The Basics on Marcels, ZiPS, CAIRO, Oliver, and the Rest by Piper Slowinski February 16, 2011 Now that football season is over and baseball is once again close at hand, Projection Season is well underway. Fantasy players, analysts, bloggers, and plain ol’ fans – everyone turns to projections to help them this time of year. The Hot Stove has cooled down and Spring Training has just started, so really…what else is there to do? With that in mind, I’ve got a handful of posts on projections in the works for the next week. This is the first one, and in it I deal with a basic question: what are the different projection systems available, and how are each of them calculated? In order to know how to properly use each projection, it’s always a good idea to understand what data is taken into account and how it is used. Remember: there is no one “gold standard” for projection systems. Each system will tell you something slightly different, so whenever trying to draw conclusions from projections, it’s best to use as many sources as possible. You can also find this new information on the Library’s Projections page, so it’ll always be available there as a reference. – Marcel – Developed by Tom Tiger, Marcel is a simple projection system that is still quite reliable. I’ll let Tango do the explaining: “The Marcel the Monkey Forecasting System (or the Marcels for short) is the most advanced forecasting system ever conceived. Not. Actually, it is the most basic forecasting system you can have, that uses as little intelligence as possible. So, that’s the allusion to the monkey. It uses 3 years of MLB data, with the most recent data weighted heavier. It regresses towards the mean. And it has an age factor.” Theoretically, projections that do more work than Marcels (like ZiPS, Bill James, CAIRO, Oliver, PECOTA) will be more accurate, but in the past, other systems have only added a small increase in accuracy. Even though it is very basic, the Marcel system is still quite accurate and serves as a good reference point when looking at other projections. 2011 Marcels projections can be found here and on FanGraphs. – Bill James – Created by Baseball Info Solutions, the Bill James projections uses at most eight seasons of data per player, with a strong focus on the previous three. While the exact methodology is proprietary, the Bill James projections are based on past performance, age, home park, and expected playing time. His projections tend to be the most optimistic of all the major systems, especially with young players. – ZiPS – The work of Dan Szymborski over at Baseball Think Factory, the ZiPS projections uses weighted averages of four years of data (three if a player is very old or very young), regresses pitchers based on DIPS theory and BABIP rates, and adjusts for aging by looking at similar players and their aging trends. It’s an effective projection system, and is displayed at FanGraphs for off-season and in-season projections. – Oliver – This system was created by Brian Cartwright and is available over at The Hardball Times. It’s a comparatively simple projection system – using weighted averages of the past three seasons of data, and adjusting for aging and regression – but it calculates its major league equivalencies (MLEs) in a different way than most systems, taking the raw numbers and adjusting them based on park and league. Since most projection systems simply try to adjust for the transition between each minor-league level, Oliver’s projections are better when showing how young players will perform at the major league level. This is also the only projection system to include a fielding and WAR component. – CAIRO – A system developed by the folks at Revenge of the RLYW, the CAIRO system starts with a basic Marcel projection model, but then includes minor league statistics, adjusts for park and league effects, adjusts the aging curve depending upon the statistic, takes age and position into account when regressing a player’s performance, and uses four years of data instead of three. These projections are then put into the Diamond Mind simulator, and team projections are estimated using the results of 50,000 simulations. 2011 projections can be found here. – Fans – During the off-season following the 2009 season, FanGraphs began the the Fan projections, which rely upon a “wisdom of the crowds” approach at evaluating a player. Fans are asked to fill out ballots on various players, ranking how they expect those players to perform in the upcoming season. Ballots are they compiled and averaged for each player, giving us their Fan projection. These projections are normally quite optimistic, but in some cases they can add real value about players that may follow an unusual career path. They’re also a good way to estimate a player’s potential playing time, which is a variable that most projection systems struggle with. – PECOTA – Developed by Nate Silver and Baseball Prospectus, PECOTA is one of the more complicated projection models, using a player’s statistics and historical statistics of similar ballplayers to arrive at a projection. Colin Wyers has done work in recent years to improve PECOTA’s accuracy, and a stripped-down version of PECOTA has been shown to be as effective as the Marcels projection system (implying that the full PECOTA would be slightly more accurate). PECOTA also does projections on a team level and creates a list of comparable historical players for each projection. You can find PECOTA at the Baseball Prospectus website. – CHONE – Developed by Sean Smith, this system used four years of data for hitters and three years for pitchers. It adjusted for park, league, and aging effects, and it also uses batted ball data and minor league statistics. CHONE was widely considered one of the most accurate projection system, but it is no longer available to the public. For more on the accuracy of each projection system, I recommend reading Tom Tango’s recent study.