Analyzing the strike zone is not so simple. |
Why? Because borderline pitches present huge problems that even the computers don't know how to solve.
In August, we wrote about Pitch f/x error and made the point, albeit a passing one, that, "pitch f/x provides a px value and a pz value, both of which are exact data points; however, the baseball is a sphere with a certain volume and a diameter of approximately 2.9 to three inches." For the sake of this analysis, we'll place the baseball at a 9.25" circumference, or 2.94 inch diameter, which is the maximum afforded by Rule 3.01 The Ball. (Math: 9.25 divided by pi is ~2.94; we used C / π = d = 2r).
The dimensional properties of a baseball. |
To expand on this concept, per the laws of physics, a baseball takes up more space than a singular coordinate. Its radius is approximately 1.47 inches, which converts to .123 feet (1.47" divided by 12" is .123'). Today's analysis uses three significant digits where available; .1225 rounds to .123.
Taking this into account, here's one example from Game 2 of the 2016 World Series where sites such as MLB's Baseball Savant and the independent FanGraphs pegged a pitch as incorrectly called when, pursuant to the physical reality that a baseball occupies more than a singular point in space, the call was actually correct.
In other words, this is an example of #RoboUmp getting it wrong.
Brooks Baseball plot of 10/26's WS Game 2. |
As seen in the corresponding annotated graphic based on the Brooks Baseball strike zone plot, Brooks presents a static strike zone on which it places all callable pitches for the entire game: the solid black line represents a "league average" strike zone and assumes every batter is the same height. The dashed "Fast Map" line represents the strike zone the average umpire typically would call to a league-average batter. For obvious reasons (e.g., multiple batters' variable heights and stances), the so-called black box is not particularly useful for strike zone analysis.
A site such as Baseball Savant and program such as StatCast would calculate that the Arrieta-Lindor pz of 3.34 is greater than sz_top 3.29, compare that to the fact that the pitch was called a strike, and deem it an incorrect call; after all, according to the computer's formula, when pz > sz_top, the call should be "ball". StatCast's method actually places such a pitch into a "Zone" number that corresponds to "Ball." It's such an easy and simple algorithm that anyone can be an analyst.
Well, isn't that cute...but it's wrong.
If pz 3.34 represents the center of the baseball, then we must add/subtract the ball's radius of .123 to create an interval that represents the true physical property of the baseball, we'll call it "pz_True": 3.340 +/- .123 => pz_True = (3.217, 3.463).
Thus, we see that Lindor's measured sz_top of 3.29 falls within the pz_True interval, meaning the baseball straddled the top of the strike zone: a portion of the baseball traveled through the strike zone—0.876 inches of the baseball to be exact—meaning the strike call was correct, even though conventional zone algorithms from Baseball Savant and the like would indicate it was incorrect.
Recall the following Official Baseball Rules definition: "A STRIKE is a legal pitch when so called by the umpire, which...Is not struck at, if any part of the ball passes through any part of the strike zone." The Strike Zone ("determined from the batter’s stance as the batter is prepared to swing at a pitched ball") is even more confounding.
Thus, even with only .876 inches of the 2.94'-diameter baseball—or 29.8% of the ball—traversing the strike zone, the proper call, by rule, is "Strike." And this doesn't even begin to address the other types of error we previously discussed, nor manufacturer-reported margin-of-error which is another can of worms entirely.
Now, multiply this ONE case study by tens of pitches per game, hundreds per regular season day, thousands per week, etc., and you'll see why using mass data sites like Baseball Savant to evaluate strike zone accuracy can be misleading at best, if not largely and significantly inaccurate.
All this hopefully will give you an idea as to why we take our time in providing strike zone analysis, relative to QOC. There are many variables that play a role, and quality work in that regard requires time and care.