Sign In
  • Support FanGraphs
    FanGraphs Membership
    FanGraphs Shirts
    FanGraphs Mugs
    Gift a Membership
    Donate to FanGraphs
  • Games
    Ottoneu Fantasy Baseball
    Signup, FAQ, Blog Posts
  • Blogs
    Blog Roll

    FanGraphs
    Podcasts: FanGraphs Audio | Effectively Wild | Chin Music

    FanGraphs Prospects

    RotoGraphs
    Podcasts: The Sleeper and The Bust | Field of Streams | Beat the Shift

    Community Research

    Archived Blogs: The Hardball Times | NotGraphs | TechGraphs | FanGraphs+
    Archived THT: THT Live | Dispatch | Fantasy | ShysterBall
    Archived Podcasts: Stealing Home | Doing It For Bartolo | OttoGraphs | UMP: The Untitled McDongenhagen Project
  • Projections
    2023 Pre-Season Projections
    ZiPS, ZiPS DC
    Steamer
    Depth Charts
    ATC
    THE BAT, THE BAT X
    2023 600 PA / 200 IP Projections
    Steamer600
    2023 Updated In-Season Projections
    ZiPS (RoS), ZiPS (Update)
    Steamer (RoS), Steamer (Update)
    Depth Charts (RoS)
    THE BAT (RoS), THE BAT X (RoS)
    3 Year Projections
    ZiPS 2024, ZiPS 2025
    On-Pace Leaders
    Every Game Played, Games Played %
    Auction Calculator
  • Scores
    Today
    Live Scoreboard, Probable Pitchers
    Live Daily Leaderboards
    Win Probability & Box Scores
    2022, 2021, 2020, 2019, 2018, 2017...
  • Standings
    2022 Projected Standings
    2022 Playoff Odds, Playoff Odds Graphs
    ZiPS Postseason Game-By-Game Odds
    AL East
    AL Central
    AL West
    NL East
    NL Central
    NL West
  • Leaders
    Major League Leaders
    Batting: 2022, 2021, 2020, 2019, 2018, Career
    Pitching: 2022, 2021, 2020, 2019, 2018, Career
    Fielding: 2022, 2021, 2020, 2019, 2018, Career
    Splits Leaderboards
    Season Stat Grid
    60-Game Span Leaderboards (Special)

    KBO Leaders
    Batting, Pitching

    Minor League Leaders
    AAA: Triple-A East, Triple-A West, Mexican
    AA: Double-A Northeast, Double-A South, Double-A Central
    A+: High-A Central, High-A East, High-A West
    A: Low-A West, Low-A East, Low-A Southeast
    R: Appalachian, Gulf Coast, Pioneer, Arizona
    R: Dominican
    WAR Tools
    Combined WAR Leaderboards
    WAR Graphs
    WPA Tools
    WPA Inquirer
    Rookie Leaders
    Batters 2022, Pitchers 2022
    Splits Leaders
    Batters: vs L, vs R, Home, Away
    Pitchers: vs L , vs R, Home, Away
  • Teams
    Team Batting Stats
    2022, 2021, 2020, 2019, 2018, 2017...
    Team Pitching Stats
    2022, 2021, 2020, 2019, 2018, 2017...
    Team WAR Totals (RoS)
    AL East
    Blue Jays  |  DC
    Orioles  |  DC
    Rays  |  DC
    Red Sox  |  DC
    Yankees  |  DC
    AL Central
    Guardians  |  DC
    Royals  |  DC
    Tigers  |  DC
    Twins  |  DC
    White Sox  |  DC
    AL West
    Angels  |  DC
    Astros  |  DC
    Athletics  |  DC
    Mariners  |  DC
    Rangers  |  DC
    NL East
    Braves  |  DC
    Marlins  |  DC
    Mets  |  DC
    Nationals  |  DC
    Phillies  |  DC
    NL Central
    Brewers  |  DC
    Cardinals  |  DC
    Cubs  |  DC
    Pirates  |  DC
    Reds  |  DC
    NL West
    D-backs  |  DC
    Dodgers  |  DC
    Giants  |  DC
    Padres  |  DC
    Rockies  |  DC
    Positional Depth Charts
    Batters: C, 1B, 2B, SS, 3B, LF, CF, RF, DH
    Pitchers: SP, RP
  • RosterResource
    Current Depth Charts
    AL East
    Blue Jays
    Orioles
    Rays
    Red Sox
    Yankees
    AL Central
    Guardians
    Royals
    Tigers
    Twins
    White Sox
    AL West
    Angels
    Astros
    Athletics
    Mariners
    Rangers
    NL East
    Braves
    Marlins
    Mets
    Nationals
    Phillies
    NL Central
    Brewers
    Cardinals
    Cubs
    Pirates
    Reds
    NL West
    D-backs
    Dodgers
    Giants
    Padres
    Rockies
    Offseason Tools
    2023 Opening Day Tracker
    2023 Offseason Tracker
    2023 Free Agent Tracker
    In-Season Tools
    2023 Closer Depth Chart
    2023 Injury Report
    2022 Lineup Tracker
    2023 Payroll Pages
    2022 Probables Grid
    2022 Schedule Grid
    2023 Transaction Tracker
  • Prospects
    Prospects Home
    THE BOARD!
    THE BOARD: Scouting + Stats!
    How To Use THE BOARD: A Tutorial
    Top Prospects List
    Top Prospects
    2023 2022
    AL
    BALCHWHOU
    BOSCLELAA
    NYYDETOAK
    TBRKCRSEA
    TORMINTEX
    NL
    ATLCHCARI
    MIACINCOL
    NYMMILLAD
    PHIPITSDP
    WSNSTLSFG
    AL
    BALCHWHOU
    BOSCLELAA
    NYYDETOAK
    TBRKCRSEA
    TORMINTEX
    NL
    ATLCHCARI
    MIACINCOL
    NYMMILLAD
    PHIPITSDP
    WSNSTLSFG

    • 2022 Preseason Top 100

  • Glossary
    Library
    Batting Stats
    wOBA, wRC+, ISO, K% & BB%, more...
    Pitching Stats
    FIP, xFIP, BABIP, K/9 & BB/9, more...
    Defensive Stats
    UZR Primer, DRS, FSR, TZ & TZL, more...
    More
    WAR, UBR Primer, WPA, LI, Clutch
    Guts!
    Seasonal Constants
    Park Factors
    Park Factors by Handedness
  • Sign In
Help Support FanGraphs


Become a Member No Thanks
Already a member? Log In
  • Intro
  • Features
  • Offense
    • Complete List (Offense)
    • OBP
    • OPS and OPS+
    • wOBA
    • wRC and wRC+
    • wRAA
    • Off
    • BsR
    • UBR
    • wSB
    • wGDP
    • BABIP
    • ISO
    • HR/FB
    • Spd
    • Pull%/Cent%/Oppo%
    • Soft%/Med%/Hard%
    • GB%, LD%, FB%
    • K% and BB%
    • Plate Discipline (O-Swing%, Z-Swing%, etc.)
    • Pitch Type Linear Weights
    • Pace
  • Defense
    • Overview
    • Def
    • UZR
    • DRS
    • Defensive Runs Saved – 2020 Update
    • Inside Edge Fielding
    • Catcher Defense
    • FSR
    • RZR
    • TZ / TZL
  • Pitching
    • Complete List (Pitching)
    • ERA
    • WHIP
    • FIP
    • xFIP
    • SIERA
    • Strikeout and Walk Rates
    • Pull%/Cent%/Oppo%
    • Soft%/Med%/Hard%
    • GB%, LD%, FB%
    • BABIP
    • HR/FB
    • LOB%
    • Pitch Type Linear Weights
    • SD / MD
    • ERA- / FIP- / xFIP-
    • Plate Discipline (O-Swing%, Z-Swing%, etc.)
    • Pace
    • PITCHF/x
      • What is PITCHF/x?
      • Pitch Type Abbreviations & Classifications
      • Heat Maps
      • Common Mistakes
      • PITCHf/x Resources
  • WE/RE/LI
    • RE24
    • Win Expectancy
    • WPA
    • LI
    • WPA/LI
    • Clutch
  • Principles
    • DIPS
    • Regression toward the Mean
    • Replacement Level
    • Sample Size
    • Splits
    • Projection Systems
    • Linear Weights
    • Counting vs. Rate Statistics
    • Park Factors
    • Park Factors – 5 Year Regressed
    • Positional Adjustment
    • Aging Curve
    • League Equivalencies
    • Pythagorean Win-Loss
    • Luck
  • WAR
    • What is WAR?
    • WAR for Position Players
    • WAR for Pitchers
    • FDP
    • fWAR, rWAR, and WARP
    • WAR Misconceptions
  • Business

Measuring Pitching Value is Complicated

by Neil Weinberg
May 4, 2015

You’re likely aware that there are different versions of Wins Above Replacement (WAR) housed here, at Baseball-Reference, and at Baseball Prospectus (called WARP). For a lot of people, this makes the statistic confusing because it seems like there shouldn’t be multiple ways to calculate something with the same name. To the credit of the critics, somewhere along the way we should have agreed on a way to make it easier to communicate which statistic is which that’s a little more clear than fWAR, rWAR, and WARP, but that’s not the focus of the discussion today.

When it comes to WAR for position players, the differences among the models are less philosophical and more technical. The sites use different defensive components, different base running stats, and a few other differences in the same vein, but the overall approach is pretty much equivalent. The inputs are different, but the different WARs agree on what should be measured. When it comes to pitching, it gets more complicated because what should be measured becomes the debate itself. This article doesn’t intend to tell you which WAR is best, but rather to walk through the decisions that one needs to make when evaluating a pitcher’s value.

It always starts with a question, and in this instance, we want to know how valuable a pitcher is to his team overall. Leaving aside pitchers who bat in the NL, the goal of pitching is to prevent runs. When your team is in the field, the goal is to allow the other team to score the fewest number of runs possible and the pitcher is the central figure in that pursuit.

So the more precise question is how well does a pitcher prevent runs, and then of course, how often does the pitcher do that?

This is a question that sounds easy but winds up being pretty complicated. It seems like you should be able to take the number of runs a pitcher allows and be done with it, but unfortunately pitchers aren’t the only ones involved in that part of the game, so we need some way of isolating the responsibility of the pitcher. In other words, we want to know how many runs the pitcher is responsible for and over how many batters faced or innings pitched.

And this is where the various WARs diverge quickly. The goal of each is to measure the role of the pitcher in run prevention, but they all take a very different approach to doing so. At FanGraphs, we use Fielding Independent Pitching (FIP) as the building block. FIP-based WAR is all about charging the pitcher for the parts of the game we know don’t involve his defense. So fWAR considers strikeouts, walks, hit batters, home runs, and infield flies. There’s more complexity than that, but the basic assumption is that stripping out defense works best when assuming equal results on balls in play.

No one would argue that pitchers have no control on anything that happens on balls in play, but fWAR treats balls in play that way because pitchers don’t have that much control over balls in play and its more accurate to assume average responsibility than to try to carve it up with the available data. Again, everyone will tell you that this is not a perfect way to measure pitcher value, as it chooses to make an assumption about a part of the question we can’t answer very well right now.

At Baseball-Reference, they use runs allowed as the driving force and then they adjust that mark based on the quality of the team’s defense (among other things). In other words, rWAR takes the actual amount of runs allowed and then pushes it in one direction or the other based on what we believe to be the quality of the defense played behind the pitcher. This, again, is not a perfect solution to the problem as pitchers are affected by defense in different ways and the aggregate adjustment made at Baseball-Reference isn’t granular enough to account for that because we’re limited by the available data.

At Baseball Prospectus, their new DRA-based WARP is all about controlling for as much context as possible. They’re basing the measure on run expectancy allowed and then they use an advanced modeling strategy to attempt to control for all kinds of things like defense, catching, umpiring, weather, etc. The potential drawback here is that WARP is overfitting the model and is assuming their adjustments work equally well in all situations.

All of the WAR versions are useful models of pitching value, but as the old adage goes, all of the models are also wrong. Each model makes choices and assumptions that constrict the accuracy of the results, which is true any time you model anything.

But what you observe when looking at the three versions of WAR is that all three take a very different approach to addressing the exact same question. Based on how a pitcher actually pitched, how many runs should they have allowed? What is the pitcher’s specific responsibility? That’s the question at hand.

FanGraphs has a method to answer that question. So does Baseball-Refernece. So does Baseball Prospectus. The issue is that we don’t actually know which method is best. The Baseball Prospectus model is brand new, so some of their technological advances are clear improvements, but their fundamental building blocks are still built on the shaky foundation that all pitching metrics are built upon.

We have to make a lot of choices when we measure pitching. What units matter? Do we judge by the pitch? By the plate appearance? By the inning? By the game? It may seem trivial, but you can strikeout three batters in an inning while facing only those three hitters or you can do it while walking two guys in between. Per inning, that’s a great strikeout rate, but per PA, it’s a little worse. Is a pitcher who gets ahead 0-2 a lot but gives up some weak singles the same as a pitcher who gets behind 2-0 and gives up solid singles? Does it matter as much if a pitcher gives up four runs in a game that his team is winning by 10 as it does when he gives up four runs in a tie game?

Everyone believes in removing defense from the equation, but how can you possibly remove something so closely tied to run prevention? Should you just assume a pitcher can’t control what happens once a ball is put in play? Should you try to adjust their overall runs allowed based on defense? Should you try to control for everything that’s happening when a ball is put into play? How well can we do anything of those things based on the data we have? Would launch angle and velocity help?

How do we adjust for opponent quality? How about park factors? Do the umpire and the catcher matter?

These are all important questions without clear answers. I won’t tell you which WAR is right because none of them are right. They’re the manifestation of different answers to the questions I just posed. They’re different in design so they’re different in the results they offer.

You want to be able to look at a pitcher’s body of work and make some estimate of how many runs he should have allowed relative to his peers in the same context or some generic context. There are a lot of ways to get to an answer.

Imagine a single start in which the pitcher allowed three runs in eight innings. That’s not enough information to tell you how well the pitcher pitched, but the tough question we are all asking is what additional information do we need to properly evaluate the start? We care about the opponent, the park, the defense, the catcher, the umpire, and we also care about how those three runs happened. Did the pitcher walk a lot of hitters? Was the contact hard? Did the ball take a lucky hop or a terrible bounce?

In a perfect world, you would somehow simulate thousands of games at every turn. It’s not just about the runs allowed, controlling for defense. That’s a blunt tool for a very nuanced job. FIP is blunt. DRA is less blunt, but it’s also averaging out a lot of effects that might operate in very complex ways. There’s a lot we don’t know about how to properly credit a pitcher for his performance.

Ultimately, the correct answer is to use each tool as it was intended to be used. Each WAR does a fairly nice job of approximating a pitcher’s value and most of the time they’re close enough that you’re confident you have a sense that a given pitcher has performed in a general tier. Sometimes they disagree and when they disagree it could be because of randomness occurring in the various components of the models or it could be because you’re looking at a pitcher whose skills and value are complicated to understand.

We want to know which pitchers contribute most effectively to run prevention, but due to all of the things outside of a pitcher’s control that affect run prevention, we have to make some decisions about how to isolate the pitcher. At this point, no one has figured out the ideal way to do that. Until someone does, you should learn about the different options and choose the one that you think makes the best set of assumptions according to your definition of pitching value.





The Beginner’s Guide To Plate Discipline
 
Team Record, Pythagorean Record, and Base Runs

Neil Weinberg is the Site Educator at FanGraphs and can be found writing enthusiastically about the Detroit Tigers at New English D. Follow and interact with him on Twitter @NeilWeinberg44.

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
randplaty
7 years ago

“its more accurate to assume average responsibility than to try to carve it up with the available data.”

is it? The very premise of DRA is that it’s more accurate to carve it up.

“That’s a blunt tool for a very nuanced job. FIP is blunt. DRA is less blunt, but it’s also averaging out a lot of effects that might operate in very complex ways.”

But FIP is averaging out a lot MORE of those effects. You’re always going to have to average out complex events, but the goal is to get more accurate. So the question is, “Is DRA’s averaging out, more accurate than FIP’s averaging out?” BP has provided a lot of data that says yes. So I’m going to go with BP for now unless you can provide data that shows FIP’s averaging out is better.

2
Neil Weinbergmember
7 years ago
Reply to  randplaty

The quotation you’re discussing is explaining the assumption of FIP, not making a declaration that such an assumption is correct or best. Just wanted to clarify that part.

-1
martinmember
7 years ago

so far I like very much the 50/50 approach to pitchers WAR.
I know it is not perfect, nor does it pretend that it is perfect, but I would guess it works as well as any other method so far. so, how come it is not mentiioned in this article? just curious.
g

0
You are going to send email to

Move Comment

Updated: Saturday, January 28, 2023 7:05 AM ETUpdated: 1/28/2023 7:05 AM ET
Player Linker - @fangraphs - Contact Us - Advertise - Terms of Service - Privacy Policy
sis_logo
All major league baseball data including pitch type, velocity, batted ball location, and play-by-play data provided by Sports Info Solutions.
mlb logo
Major League and Minor League Baseball data provided by Major League Baseball.
Mitchel Lichtman
All UZR (ultimate zone rating) calculations are provided courtesy of Mitchel Lichtman.
TangoTiger.com
All Win Expectancy, Leverage Index, Run Expectancy, and Fans Scouting Report data licenced from TangoTiger.com
Retrosheet.org
Play-by-play data prior to 2002 was obtained free of charge from and is copyrighted by Retrosheet.