Author: Alex Vigderman

  • Who will judge this year’s SIS Football Analytics Challenge?

    Who will judge this year’s SIS Football Analytics Challenge?

    Update: The competition will be held August 4 at 8pm ET.  You can make a donation here.

    Watch here!.

    In case you hadn’t heard, the second annual SIS Analytics Challenge is underway. If you’re not already entered, you can learn more about the contest here, but it’s more likely that you’ve come to find out who will be judging your research in the coming weeks.

    We are fortunate to have had so many great industry experts agree to take part in the challenge. Each track will have four judges. Their backgrounds cover a wide spectrum of expertise and we’re excited about the different perspectives they will provide.

    If you’re answering the football research prompt, your work will be judged by:
    Caio Brighenti, Detroit Lions football analyst
    Matt Manocherian, SIS VP of Football & Research and former NFL scout
    Najee Goode, 8-year NFL linebacker and Super Bowl LII champion
    Seth Walder, ESPN sports analytics writer
    And the sports betting track panel is comprised of:
    Aaron Schatz, Football Outsiders editor-in-chief and ESPN+ NFL analyst
    Dan Hannigan-Daley, SIS CEO
    Johnny Avello, Las Vegas legend and current DraftKings Sportsbook director
    Rick Neuheisel, former NFL offensive coordinator and Power 5 head coach for 12 years
     

    While we think you’ll be excited about the opportunity to present to these judges, SIS has other stuff in the works and wants to empanel even more judges before presentations are due. Stay tuned for further updates in the coming days.

  • A Primer for SIS’s NFL Wins Above Replacement (WAR) Metric

    A Primer for SIS’s NFL Wins Above Replacement (WAR) Metric

    BY ALEX VIGDERMAN

    This article is meant to be both an introduction and reference documentation for the “Above Replacement” stats added to the SIS DataHub Pro in April 2021.

    They say, “the greatest ability is availability.” That may not be entirely true, but the saying does reflect the true idea that a player’s ability to stay on the field consistently over the course of a season and over many seasons is crucial. Being able to not only perform better than one’s peers but to do so over a large sample makes a player truly valuable. Even being an average player over time has value because the alternative is to use a below-average player—in short, Efficiency x Volume = Value.

    In baseball, the Wins Above Replacement (WAR) statistic was created to measure a player’s ability to accumulate value relative to his peers, both by playing consistently and being consistently better than some benchmark. With WAR, the benchmark is a “replacement” player, conceptually one who can be easily acquired in free agency or promoted from the minor leagues (or from the practice squad in the case of football).

    Comparing to replacement level means that a player is valued relative to the easily-obtainable alternatives a team might have. That benchmark is, of course, different for different positions, as it’s more difficult to play certain positions competently than others. We would expect an average quarterback to be more valuable to a team than an average running back.

    Embracing the WAR concept in football, Sports Info Solutions has its own WAR metric, using concepts coming from baseball’s implementation (at Fangraphs and Baseball Reference specifically) as well as previous attempts in football by groups like Yurko et al. The metric builds off the structure created by SIS’s Total Points system—specifically, Points Above Average, or PAA—by adding two layers of computation on top of it: the comparison to the PAA of a replacement-level player, and a Points-to-Wins conversion that puts player contributions on the scale of team wins.

    WAR starts with the difference between a player's Points Above Average and the value that a replacement-level player would have accumulated, then multiplies it by a Points-to-Wins conversion factor.

    These WAR values (and the underlying Points Above Replacement, or PAR) are available on the SIS DataHub Pro in all the same places that you’d find Total Points, with all the same ability to filter and sort the results. These numbers are only available to SIS DataHub Pro subscribers and will not be made available on the free version of the DataHub.

    What’s the difference between Points Above Replacement and Total Points?

    In most cases, these metrics will tell you similar things. They both aim to express a player’s value in terms of points to his team, and so they’re roughly on the same scale, and both have Points Above Average as their basis.

    The key reason to consider using PAR (and correspondingly WAR) instead of Total Points is that there is a specific notion of the relative value of different positions and facets of the game. Total Points draws a distinction between quarterbacks and everyone else, but it does not make any distinction between the value gained by a guard run blocking and that of a wide receiver in the passing game. In reality, the former is less valuable than the latter, but because the extent of that difference is relatively small you can consider Total Points to be a reasonable facsimile of Points Above Replacement.

    Methodology

    Determining Replacement Level

    What makes a player “above replacement”?

    It could be taken in a few different conceptual directions. One option is to consider replacement level as the performance of backups, and another is the performance of players who are only on a roster part of the time. However, the latter group can be quite hard to measure because you need enough play data to evaluate those players accurately.

    SIS has chosen to use playing time in the first 15 offensive or defensive plays of a game as the determinant. The 15-play threshold was chosen to align with a “scripted play” structure that comes from Bill Walsh’s game-calling strategy. This makes it so that players who regularly fail to make it onto the field when the team’s plans are still in place are essentially considered backups or easily-replaced players.

    Specifically, players who appeared in the first fifteen snaps for either side in at least half of the games and at least a quarter of the total snaps in a given season are considered above replacement level, and any player who fails to qualify for that standard is “replacement level” (or below). This line for replacement level is a bit higher than might be ideal, but we can’t measure the performance of players who don’t make it onto the field, so this serves as a viable delineation.

    Steps Used to Determine Replacement Level Performance
    1. Start with only the first 15 offensive plays and defensive plays of each game of the last three years (and all special teams plays, for kickers and punters)
    2. For each season, identify the players who did not appear in enough games or enough plays to qualify as at least rotational players.
      1. Offense/Defense: at least half the games and at least a quarter of snaps
      2. Kickers/Punters: at least half the games and at least 15 kick events
      3. This list of limited-playing-time players could also include starters who missed substantial time to injury. That’s an unfortunate side-effect of this method. The effect of this is tempered by our using three years’ worth of data to inform our replacement level estimates, so a starting-level player who gets injured one year will still primarily be considered a starter.
    3. Replacement level performance is taken as the average per-play Points Above Average among “replacement level” players for each combination of position and facet of the game (e.g., running back receiving, offensive tackle run blocking, safety pass rushing)

    This replacement-level definition isn’t used directly for those players’ WAR, because it doesn’t take into account their performance on those plays. After all, a player who was considered a backup by this standard could perform so well in a small sample that he ends up above zero WAR.

    But the average performance of these backup-level players is used as the value standard for a replacement-level player. That standard leaves us with, for example, 32 quarterbacks, 109 wide receivers, 84 defensive tackles, and 107 cornerbacks who are considered at least rotational players, and everyone else is included in the determination of replacement level.

    Within that pool of “replacement level” players, the average per-play performance is calculated for every combination of position and facet of the game. Those values are taken as the average Points Above Average for the group of “replacement level” players within each position-phase combination. A player’s Points Above Replacement is just his Points Above Average in a given facet minus his position’s replacement-level Points Above Average in that facet.

    In general, when a skill has a low replacement level, it means that there’s a wide gap between the best and worst players, and therefore that skill is valuable. To illustrate how that works out in terms of positional value, here are the common position-phase combinations with the most and least average value relative to replacement level for the 2018-2020 seasons.

    Most and Least Valuable Skills by SIS WAR, 2018-20

    Rank Highest Average WAR Rank Lowest Average WAR
    1 Quarterback Passing 1 Defensive Tackle Pass Rush
    2 Running Back Rushing 2 Center Run Blocking
    3 Wide Receiver Receiving 3 Cornerback Pass Rush
    4 Cornerback Pass Coverage 4 Linebacker Run Defense
    5 Tight End Receiving 5 Defensive End Run Defense

    The biggest thing that pops out here—aside from the fact that Aaron Donald is single-handedly holding up the value of defensive tackles—is that running backs might matter after all, as they show up second on the list.

    While it is true that rushing performance is relatively valuable, RB rushing is closer in average value to the bottom of this list than the top. You can consider passing in one category and everything else in another, although there is still room to distinguish among the rest.

    Beyond that, the Total Points system gives backs more credit than other EPA-based systems because it makes an adjustment to each player on each play based on the likely EPA gain or loss on the play given the call (pass or run) and situation. While most 3rd-and-8 runs are dead in the water, it’s not held against the offensive players that such a call was made. This makes running backs in general more able to gain and lose value than you might think given how EPA on run plays tends to look.

    One last point is that we selectively remember the replacements who do well. We hold up players like Tony Pollard, Alexander Mattison, and Austin Ekeler as examples of replacements who can perform comparably to their better-compensated teammates. The counter to that is there are just as many examples of players who are given brief opportunities and squander it. And because running backs can really mess things up with a poorly-timed fumble, players like Jeremy McNichols and C.J. Prosise counterbalance the strong performances of the names above.

    Converting from Points to Wins

    The conversion from points into wins is to some extent a rough estimate, because it doesn’t take into account the context of the plays involved. We all understand that the same play can have very different effects on a team’s chance of winning depending on the context. But to more consistently evaluate players regardless of the situation around them, we treat all plays as having a neutral context in this respect. To that end, the points-to-wins conversion is instead a multiplier based on the concept of Pythagorean win percentage (or more specifically, Pythagenpat) which essentially uses points scored and allowed to estimate win percentage.

    Advanced mathematical interlude:

    We can use calculus to estimate, in essence, the slope of the line of wins vs. points, i.e., the extra points needed to add a win. Sabermetric writer Patriot—writing about baseball in a way that applies equally to football—shows how to start from the formula and use partial derivatives to convert it into a formula for points-per-win at a league level that depends only on the scoring environment (in points per game, PPG):, where x is the value that makes the Pythagorean win expectancy most accurate when it’s used as the exponent instead of squaring each term. For 2016 to 2019, with a PPG just over 45, z = 0.73.

    The actual number will vary over time, but the translation in recent seasons has hovered around 32 or 33 points per win. Put into different terms, we would expect that a team with a full-season Points Above Replacement of 160 would be around two wins better on average than a team with a total of 96.

    The last piece of the puzzle is that because there’s a defined number of wins available each year (although that number changes starting in 2021), there needs to be a constant number of Wins Above Replacement each year.

    For a 16-game season, SIS has chosen the number 192, which corresponds to the idea that a 2-14 team is replacement level. This is based on the notion that while teams do occasionally win fewer than two games, the “true talent” of those teams isn’t quite that poor, and a “true” replacement level team would align with that baseline. The total WAR is calculated in the following manner:

    Total WAR is the difference between the total wins available in the season minus the total wins if every team were replacement level.

    If after all the previous calculations the total WAR for the league in a season isn’t 192, the player values are adjusted slightly so that the total ends up at that number. This ensures each season is measured consistently.

    Note: From 2021 forward, with a 17-game season, we are still using a 2-win team as the baseline, and therefore there are an additional 16 WAR available, for a total of 208.

    What does it look like?

    Here’s a glimpse of the 2020 Passing WAR leaderboard found via the DataHub Pro (via the Value tab), with some other metrics that might help give context to how we get to WAR. You can see that Points Earned, Points Above Average, and PAR tell similar stories about the relative value of these MVP candidates, and in particular Tom Brady’s season stands out as one that deserves much more credit than the EPA of his throws would suggest.

    Player Team Att EPA Points Above Avg Points Earned PAR WAR
    Patrick Mahomes Chiefs 588 138.87 90.51 162.05 152.6 4.6
    Deshaun Watson Texans 544 115.39 78.32 157.58 145.9 4.4
    Tom Brady Buccaneers 610 82.05 80.12 158.70 138.6 4.2
    Josh Allen Bills 572 132.46 71.20 143.19 137.4 4.1
    Aaron Rodgers Packers 526 143.56 85.12 154.21 137.0 4.1

    Each year roughly half the quarterbacks in the league produce more WAR than any player at any other position, so you just get out of here with your RB-as-MVP conversations. In a 2,000 yard season where he was the centerpiece of his team’s offense, Derrick Henry produced about as much value as an about-to-retire Drew Brees and an about-to-be-shipped-out Teddy Bridgewater.

    We know that quarterbacks are much more valuable than other players, but WAR shows just how large that gap really is.

  • DataHub Pro features many new NFL, CFB additions

    DataHub Pro features many new NFL, CFB additions

    BY ALEX VIGDERMAN

    The SIS DataHub Pro has put in some work in the offseason. And that work isn’t done.

    First, a quick bit of background.

    We have two portals for people to interact with our treasure trove of football data.

    One is the SIS DataHub, which is a great resource for dozens of statistics at the NFL level, including our proprietary total-value statistic, Total Points. That site is available to everyone free of charge.

    The other is the SIS DataHub Pro, which has a price tag but adds in the ability to sort, filter, and download data in whatever configuration you like, and also includes college football data. You can sign up for a demo here.

    This week, we are publishing updates to the DataHub Pro that seriously beefs up what it offers you as an analyst, especially on the college side. With the draft coming this week, it’s the perfect time to check out the DataHub Pro.

    Consistent Breadth and Depth Across Levels

    Aside from the fact that we’re adding dozens of new items to the site, we also made it a point to keep the available stats and filters consistent regardless of what you’re looking for. That means that you’ll find the same filters and statistics available on each of the NFL, CFB, player, and team leaderboards.

    So if you find that, for example, the Patriots were very poor when they used zone blocking against a light box in 2020, you can use the same filters to find a possible addition in the draft that might suit that deficiency (might I suggest Western Michigan tackle Jaylon Moore?).

    Major Overhaul to College Leaderboards

    Because we didn’t have quite as much available on the college side previously, you instantly feel the impact of these updates when you run your first query.

    Here’s an example. The DataHub Pro’s Receiving leaderboard for college players already had 18 statistics and 32 filters for you to slice and dice data on pass-catching prospects.

    Now, we’re offering 37 statistics and 43 filters. Here’s a slice of that.

    You could already find out that another Western Michigan product, receiver D’Wayne Eskridge, led the nation in yards per catchable target when lined up out wide against man coverage in 2020.

    Highest Yards per Target on Catchable Throws Lined up Wide Against Man Coverage, 2020 (min. 10 tgt)

    Player Team Tgts Y/Tgt ADoT
    D’Wayne Eskridge Western Michigan 13 21 9.4
    Cornell Powell Clemson 18 18.6 13.6
    Erik Ezukanma Texas Tech 14 18.1 13.5
    Dax Milne BYU 13 17.3 16.2
    DeVonta Smith Alabama 18 16.8 12.3

    Now you can truly go as deep with your analysis as you can anywhere on the Internet. We can now also find out who saw the most on-target throws into the end zone when they were isolated on their side of the formation. That was SIS’s 38th-ranked receiver headed into the draft, Jonathan Adams Jr. of Arkansas State, who had nine such targets and led the nation with 1.3 EPA per target on such throws.

    Like we have for the NFL side, statistics are now split into three tabs: one for Totals, one for Rates, and one for Value statistics.

    That last category is really exciting, because this release of the SIS DataHub Pro now offers the ability to explore EPA-based stats at the college level. That starts with Expected Points Added itself but moves on to things like Positive Play %, Boom% / Bust% (the percent of plays with an EPA above 1 or below -1), and of course our flagship metric Total Points. This was available for draft-eligible players via the SIS Football Rookie Handbook, but now you can pair it with the filtering functionality that the DataHub Pro provides.

    Most Points Saved per Play in 2020, Man Coverage Snaps Lined Up Outside (min. 10 targets)

    Player Team Cov. Snaps Positive% Points Saved / Play
    Deommodore Lenoir Oregon 64 36% 0.24
    Jaylon Jones Georgia State 61 13% 0.23
    Caelen Carson Wake Forest 78 28% 0.22
    Coney Durr Minnesota 41 50% 0.22
    Kenderick Marbles Louisiana-Monroe 46 31% 0.21
    Lenoir, SIS’s 30th-ranked CB entering the draft, was a man coverage asset in 2020

    What about the NFL?

    It’s draft season, so we’re really excited about what we’re adding on the college side. But that doesn’t mean we’re shirking on our responsibilities in providing the best NFL charting data around.

    Most of the filters that are new on the college side are also new on the NFL side. Here’s a sampling of the filters we’re excited about:

    • Passing – What was the QB’s footing like at the snap?
    • Rushing – Did the back use the designed gap?
    • Receiving – Was the throw into the end zone?
    • Blocking – How deep did the QB drop?
    • Pass Defense – Was the QB pressured on the play?
    • Pass Rush – What technique was the player lined up as?
    • Run Defense – Was there motion on the play?

    And beyond that, we still have plenty of stats up our sleeves for NFL analysts.

    Here are the NFL leaders in Wins Above Replacement on plays with the clearest of clear running lanes: through the designed gap, no blown blocks, not contacted for at least 5 yards downfield. For conciseness, we’ll call these “clean runs.”

    Most Wins Above Replacement on ÔÇ£Clean RunsÔÇØ, 2020

    Player Att WAR
    J.K. Dobbins 27 0.7
    Miles Sanders 21 0.6
    Derrick Henry 44 0.6
    Melvin Gordon 29 0.5
    6 tied 0.4
    Dobbins really capitalized on the opportunities afforded by his blocking and scheme in 2020

    We’re ecstatic to finally get these updates out the door and into your hands, especially in advance of the NFL draft. Sign up for a free trial and take the new features for a spin! And if you have any feedback, we definitely want to hear from you. We have more updates in the pipeline for this offseason, but we want to make sure we’re doing well by our users first.

  • An update to the last few years of Defensive Runs Saved

    BY ALEX VIGDERMAN

    This is a less fun update than what we did during the 2019-20 offseason, in which we were announcing a revamp of how we evaluate defense, but one that is no less crucial.

    The lifeblood of our work at SIS is recording data points from video. Defensive Runs Saved in particular relies on elements like the direction, timing, and trajectory of batted balls. As it turns out, since 2017 MLB games have more and more commonly had a disconnect between the actual timing of a play and what shows on the broadcast. Specifically, when the camera switches away from the center field camera to show the batted ball on balls to the infield, the play appears to take less time than it does in reality. This means that in timing any event on that play from video, the resulting measurement will be a couple hundred milliseconds short on average.

    You can see where this might be a problem for defensive metrics. A ball gets hit at a certain speed, but when recording the time over video it seems like the ball is traveling faster than it is. That means that we’ll end up giving the fielder more credit than he deserves for making that play (or less blame if he fails to make it).

    This isn’t a problem going forward—we’re now able to use a camera angle that will give us the full view of the play from the start—but for recent seasons, we had to resort to an automated method to fix affected balls in play. Going back to 2017, SIS’s ball-in-play times have been retroactively modified using an automated process, one that checks each park each month for the existence of a 100-ms-or-greater offset, and modifies all relevant batted balls to account for that offset. This phenomenon didn’t start until a few years ago, so it affects only a couple parks in 2017, but that number jumped to more than half of parks in 2018 and 2019. In 2020, we had access to the new camera angle, but because Defensive Runs Saved uses multiple years of data as the basis for its evaluation, those numbers do change slightly.

    When and where will I see these changes?

    The updated numbers are currently available on FieldingBible.com. Because there are so many other sites with different preseason work of their own to take care of, we couldn’t make it so that we had a single point in time when all sources of Defensive Runs Saved update in tandem. So what you find on Fangraphs, Baseball-Reference, etc. won’t match with what you find on the Fielding Bible site for a bit. But we’ll be aligned by the start of the MLB season.

    It’s worth noting that this will also affect metrics that depend on our ball-in-play times, like Hard Hit Rate and Ultimate Zone Rating. As mentioned, we’re working with Fangraphs to get their numbers backfilled, but because of the mechanics of it we can’t update them without updating ourselves first.

    Which teams were affected by this?

    As mentioned above, this asynchrony has crept to more and more parks over time. These are the parks that had at least one month where times in that park were offset, by season (excluding 2020 because the offset has been compensated for by the use of the new camera angle).

    201720182019
    Comerica ParkAngel Stadium of AnaheimAngel Stadium of Anaheim
    Tropicana FieldAT&T ParkAT&T Park
     Busch StadiumBusch Stadium
     Chase FieldChase Field
     Comerica ParkCitizens Bank Park
     Dodger StadiumComerica Park
     Great American BallparkDodger Stadium
     Kauffman StadiumGreat American Ballpark
     Marlins ParkKauffman Stadium
     Miller ParkMarlins Park
     O.co ColiseumMiller Park
     PETCO ParkMinute Maid Park
     Progressive FieldO.co Coliseum
     Rangers Ballpark in ArlingtonPETCO Park
     SunTrust ParkProgressive Field
     Target FieldRangers Ballpark in Arlington
     Tropicana FieldSunTrust Park
     Yankee StadiumTarget Field
      Tropicana Field
      Yankee Stadium

    In 2017, only two parks had an asynchrony to speak of. That means that Rays and Tigers players were being artificially buoyed in terms of their Defensive Runs Saved—and, because DRS has to add up to zero, the rest of the league was artificially suppressed to a small extent. More recently, it’s been closer to a 50/50 split in terms of teams who were helped or hurt by this phenomenon.

    To show what the results look like for a full season, here is the extent of the changes by team from the 2019 season. Obviously, this is fairly dramatic, but that’s because the size of the effect depends on the park, so most of the players on a given team will move in the same direction.

    Defensive Runs Saved Changes by Team After This Update, 2019 Season

    TeamPreviousUpdatedDiffTeamPreviousUpdatedDiff
    Mets-86-3452Brewers4027-13
    Cubs-143246Padres174-13
    Orioles-95-5342Indians8267-15
    Mariners-88-4840Marlins2510-15
    Blue Jays04040Reds5841-17
    Red Sox-281038Cardinals9173-18
    White Sox-56-2531Braves4122-19
    Rockies94031Royals5-16-21
    Nationals-32528Diamondbacks11291-21
    Pirates-46-2125Dodgers126105-21
    Phillies516817Tigers-84-111-27
    Astros96960Giants4719-28
    Yankees-5-14-9Twins3-28-31
    Rays5342-11Angels9-23-32
    Rangers-52-65-13Athletics363-33

    How are players impacted by this change?

    In general, players don’t change much as a result of this update. Roughly 90% of player seasons move by +/- 2 runs. But some players do move by more than five runs. That might not make the difference between someone we view as a good defender or a bad defender, but it is definitely noticeable.

    Players Most Affected by the Update to Defensive Runs Saved, 2019 Season

    PlayerPosTeamPreviousUpdatedDiff
    Trevor StorySSRockies14217
    Amed RosarioSSMets-10-37
    Freddy GalvisSSBlue Jays396
    Tim AndersonSSWhite Sox-12-66
    Adam Frazier2BPirates-156
    Kris Bryant3BCubs-606
    Vladimir Guerrero Jr.3BBlue Jays-9-36
    Brandon CrawfordSSGiants-4-10-6
    Kole CalhounRFAngels-1-7-6
    Miguel Sano3BTwins-7-13-6
    Matt Olson1BAthletics1812-6
    Matt Chapman3BAthletics3428-6

    Two Fielding Bible Award winners in the Oakland infield take a dip as a result of this update, but you can see that we’re still quite bullish on the Matts. And on the flip side, a couple of well-regarded shortstops look even more legit, and two less-stellar defenders have their poor numbers tempered a bit.

    As mentioned before, these numbers are updated on FieldingBible.com right now, and we are working with other sites to update previous seasons in advance of the start of the 2021 season. We will make sure to communicate when those updates happen so that you can use whichever source you prefer going forward.

  • Enhancing the Way Total Points Evaluates the Running Game

    Enhancing the Way Total Points Evaluates the Running Game

    By ALEX VIGDERMAN

    Sports Info Solutions’ flagship player value system, Total Points, has been upgraded again.

    The incorporation of relatively new data points like defensive line techniques and the combination of initial and eventual run directions allow us to evaluate players on both sides of the ball with much more confidence as to who was involved on each play and to what extent. This new methodology combines new data points, improved evaluation strategies, and the usual tweaks and bug fixes that come every season.

    Overview of the Methodological Improvements

    1. Identify the blockers and relevant defenders based on the players’ alignment and the run direction (both the designed run direction and the eventual run direction, where previously only the designed direction was used)
    2. For the offense, divide responsibility for the yards before contact (plus expected yards after contact) so that the back has more responsibility if he is contacted late in the play and less if he is contacted early in the play. Previously the distribution was a consistent amount across all kinds of runs.
    3. For the defense, estimate how likely each player is to have made the tackle given his alignment, and compare his actual results with that expectation. Tackles upfield are better than tackles downfield, and both are better than not making a tackle at all. Previously players who made a tackle downfield were losing value relative to not making a tackle, and players who did not make a tackle were not evaluated.

    Details on the Methodological Improvements

    Using both initial and eventual run direction

    For the purposes of this discussion, a “bounce” is any run where the runner eventually ran to a gap further outside than intended, and a “cutback” is any run where the runner eventually ran to a gap further inside than intended.

    This enhancement makes each run that features a bounce or cutback evaluated first based on the initial run gap (where the run was designed to go) and then again based on the eventual run gap (where the runner ended up going). The difference between those two evaluations is based on the number of gaps moved and the blocking scheme, treating any moves of at least three gaps as similar. Then, any evaluation of the initially-run-behind linemen is based on the change in expectation for runs with a similar bounce or cutback.

    In most circumstances, a back bouncing a run outside or cutting it back means that the frontside linemen will be debited and the backside linemen credited. That’s consistent with the idea that if the play develops as designed the blockers at the point of attack are likely to have done well. And typically, the blockers at the gap the back ends up targeting are doing a good job to allow the cutback lane. That said, for example, cutbacks of three or more gaps are better than most bounces or cutbacks, so the frontside linemen won’t be debited as much (because a cutback isn’t such a bad thing in that spot) and the backside linemen won’t be credited as much (because the result isn’t expected to be as bad).

    Adjusting credit based on the yards before contact

    SIS uses a stat called “adjusted yards before contact”, which adds the expected yards after contact to the yards before contact on a play based on what typically happens on plays with similar blocking scheme and run direction. In the context of Total Points, the blockers on a play don’t get credit beyond the adjusted yards before contact.

    Adjusted yards before contact are now split into “first level” (YBC <= 5), “second level” (YBC between 6 and 15), and “open field” (YBC > 15). In allocating the EPA associated with adjusted yards before contact, the offensive line now receives 3/4 of the credit at the first level, half of the credit at the second level, and 1/4 of the credit in the open field. For runs that are stuffed where the rusher is contacted behind the line of scrimmage, the line is given 90% of the responsibility.

    Here is an example of how reworking this breakdown affects how one would distribute EPA responsibility between the back and the offensive line, depending on the yards before contact on the run. The EPA shown is the value of the adjusted yards before contact.

    The new system punishes the offensive line much more when the back is contacted early, and dramatically increases the back’s responsibility for downfield yards.

    Adding defensive technique data

    Defensive alignment data informs which players were run toward, and correspondingly how responsibility for a run’s initial success or failure should go to each player on the defensive front.

    From 2019 forward the defensive alignment includes technique info for all down linemen. This allows for much more accurate judging of which defenders are most relevant. 

    To go with that, the alignment of defenders is now being considered along a continuum, where 0 is an outside cornerback on the offense’s left, 1 is an outside cornerback on the offense’s right, and anyone positioned between them has a number assigned based on their relative position. This makes it so that it’s not assumed that all adjacent defensive linemen are the same distance apart, which helps handle the variety of fronts defenses employ. This is used for determining which defenders are most relevant for runs in a certain direction or blown blocks by a certain offensive player.

    Refining how tackles are evaluated

    Before the 2019 season, SIS overhauled its evaluation of tackling in the run game to allow players to be measured based on how their tackle compared to the average tackle on similar plays from a similar position. For example, a tackle made by a MIKE on a strongside run into a heavy box would be compared to the average tackle made in those circumstances.

    One big improvement is using what’s called a plus-minus system. We measure each player’s odds of making a tackle using his alignment and the run direction, and every player is given credit or debit based on whether he made the tackle and how likely he was to make it. That plus-minus value—which will be positive for the tackler(s) and negative for everyone else—is multiplied by the EPA value of that tackle. For the non-tacklers, that EPA is the average result of similar plays, since we don’t know where they would have made the tackle. 

    The plus-minus calculation described above is modified such that it’s better to make a tackle than not, even if it was after a big gain. 

    Here is how this works out for two sample plays from last season.

    Each player has a percentage that indicates how likely he was to make a tackle based on historical data, and a decimal value that shows how much value (in terms of EPA) he was credited or debited based on his tackling or lack thereof.

    Please keep in mind that the positions of the safeties and off-ball linebackers are estimated based on typical locations for those players and are not the players’ specific locations for that play.

    Total Points Run Tackle Evaluation: Pitch to right D-gap for 30 yard run

    On this play, the left safety is one of the more likely players to make the tackle, and he does so, getting some credit. Everyone else on the play is dinged slightly, with the linebackers punished most because they were the most likely possible tacklers.

    Total Points Run Tackle Evaluation: Goal Line Power to right C-Gap for 0 yard gain

    In this case, the left end plugs up the hole and tackles the ballcarrier at the point of attack, getting a decent chunk of credit. The other players get a very small demerit for not being involved on the tackle (even though it was very unlikely for the backside players).

    It’s worth noting that in both of these cases, any of the value that these players might have accumulated based on the yards before contact on the play (as described above) are carved out, so that there isn’t any double-counting of responsibility for players on the defensive front.

    Other improvements

    • In calculating the expected yards after contact on a run play, research suggests that yards after contact is higher when the runner is contacted early or late (and lower around the line of scrimmage). As a result, several tiers of yards before contact (e.g. 0-1, 2-3, 4-5, 6+) are now being used to determine expected yards after contact more accurately.
    • To ensure that there isn’t double-counting when evaluating the defensive backfield and defensive front on a pass attempt, the value accumulated by the defensive front is subtracted from that of the defensive backfield when calculating Total Points. For example, if the defense forced two blown blocks and the quarterback attempted the throw under duress, the defensive backfield is punished more if the pass is successful and credited less if the pass is incomplete.
    • Over the last couple seasons, SIS has added several detailed route types, including a variety of screens (e.g. bubble, tunnel) which are quite similar to each other. In calculating how likely a throw is to be completed and therefore how valuable a completion or incompletion is, routes are now grouped into about a dozen categories, with screens being bundled as their own group.

    Whose Evaluations Changed the Most?

    Any of the new numbers can be found on the SIS DataHub. Let’s take a look at some players who were notable movers due to our adjustments.

    The Unanimous MVP is a Little More Unanimous

    Lamar Jackson set the league aflame last year, winning the MVP award unanimously. Total Points disagreed with the assessment, as his rushing value wasn’t enough to offset his merely above average passing value.

    The funny thing was, if you looked at EPA on run plays (designed or scrambles), Jackson’s runs were nearly three times as valuable as any other rusher, while Total Points didn’t even have him as the most valuable rusher in the league.

    The new update gives him more credit for his results, particularly on short yardage, and now he’s comfortably the most valuable rusher in terms of Total Points.

    2019 Rushing Points Earned Leaders

    PreviousCurrent
    Lamar Jackson2846
    Ezekiel Elliott3231
    Nick Chubb3228
    Chris Carson2526
    Derrick Henry2226

    That surge in rushing value puts Jackson a bit ahead of Aaron Rodgers, the previous leader in terms of Total Points (139 vs. 133). And considering he didn’t play in Week 17, the gap in performance is a bit larger than that 6 point difference suggests.

    Meanwhile, in 2020…

    The changes in terms of rusher/blocker division of credit have an impact on big plays as well. The biggest gainers in terms of Rushing Points Earned line up quite well with the players who have had long runs where they were untouched into the open field. To illustrate, here are the players who have gained the most yards before contact beyond the first fifteen this season, and how their Rushing Points Earned change with this update.

    Most Yards Gained After the First 15 Yards Before Contact, 2020 (through Week 15)

    YBC Beyond First 15Rushing Points Earned Change
    Daniel Jones130+7
    Miles Sanders101+12
    Raheem Mostert94-7
    Lamar Jackson84+15
    Russell Wilson70-3
    Kenyan Drake67+9
    Kyler Murray62+19

    Moving Defensive Backs Forward

    Total Points used to lean its defensive back evaluation slightly in favor of those who made impact tackles in the running game, primarily because other defensive backs were being suppressed for their tackles (or lack thereof).

    The new system no longer debits DBs for making a tackle downfield (after all, it’s better to make a tackle than to not make one). As a result, the elite pass defenders bubble to the top in the updated DB rankings.

    2020 Defensive Back Total Points Saved Leaders (through Week 15)

    PreviousUpdated Change
    Tre’Davious White4257+15
    Xavien Howard4755+8
    Malcolm Butler3553+18
    Carlton Davis4150+9
    Kyle Fuller4250+8
    Jaire Alexander3048+18

    What to Expect Next

    These updates started as an offseason project to enhance how we evaluated run blocking, and (as projects tend to do) extended from there in a few different directions.

    What kind of things can we expect from Total Points heading into the 2021 season?

    • Using timing data and drop types to better divide credit between the offensive line and quarterback (and defensive line and defensive backs)
    • Enhanced evaluation of quarterback accuracy (i.e. using overthrown/underthrown as well as catchable/uncatchable)
    • Using the depth of broken or missed tackles to better measure their value

  • A Primer on Total Points, Our Total Value Stat for Football

    A Primer on Total Points, Our Total Value Stat for Football

    Below is the living documentation of the Total Points system, which is Sports Info Solutions’ player value metric for football (NFL and CFB).

    As updates get added to the system, we’ll add notes in this font to illustrate the most recent enhancements.

    The most recent set of enhancements were made in August 2024.

    Pass Plays / Run Plays / Additional Adjustments / How to Use It
     

    What does Total Points do?

    Total Points takes nearly everything that SIS measures about a play and uses it to evaluate each player on a scale that allows you to compare them more easily.

    It’s always useful to be able to understand the different ways in which players can be valuable. Does he break a lot of tackles? Does he get a lot of yards after the catch? Does he make the best out of a poor offensive line? Total Points offers the opportunity to take all of those elements and get a quick picture of how well a player is performing overall.

    What does the number mean?

    All of Total Points uses the Expected Points Added (EPA) framework. EPA works by taking any given situation and finding the odds that each different scoring possibility comes next. For example, if the next scoring play is a field goal by the current defensive team two drives from now, you count that as a -3. Average those values across all instances of the same situation and you get its Expected Points. Take the change in Expected Points on any given play and you get its EPA.

    Roughly, you can think of a 0 EPA play as one that “stays on schedule”, an EPA of 1 or more as a big play for the offense, and an EPA of negative-1 or less as a big play for the defense.

    Total Points starts by evaluating each player on that scale, where 0 is average. That’s what we call Points Above Average. Then to both reward players who play full seasons and keep the sum of Total Points around what we’d expect a team to score or allow, we scale the results to the league scoring average (around 22 points per game). So when you see Josh Allen’s 171 leading quarterbacks in 2023, you could take that as a rough estimate that he contributed just about 10 points per game to the Bills’ scoring average on his own.

    On the defensive side, it’s a little bit harder to wrap your mind around, because the scaling is exactly the same but points are bad for the defense. T.J. Watt’s 72 Points Saved in 2023 suggests that he was responsible for reducing his opponents’ scoring by that many points over the season.

    How does it work?

    We won’t go into complete detail here, but let’s run down the different data elements we consider, how they are evaluated in terms of EPA, and how they get bundled together.

    Total Points works on each of the passing game and running game as a whole, so we’ll walk through them that way.

    Pass Plays

    Blocking

    Everything starts up front. We start with identifying who was rushing the passer and who was blocking.

    Then, those players are assigned a base value based on the expected value of the play overall. Starting in 2019, this takes into account drop type. Prior to 2019, this just involved whether the play was a screen pass.

    Next, the line’s value is modulated by how long the play took to develop (starting in 2022). This uses our Expected Snap to Throw metric, and applies a multiplier to the baseline value corresponding to how much the timing of the play typically affects blown block rates (longer blocking, more credit for the line).

    Then, we estimate how likely each person was to either blow a block (offense) or force a blown block (defense). On each play, credit is assigned to each player based on how they performed compared to that expectation, and the resulting blown block plus-minus value is multiplied by the average EPA of a blown block.

    Players are additionally credited or debited if they were involved for good or for bad in a batted pass, deflection, or pressure, based on the average EPA of those events. For the 2019 season and beyond, our Pressures Above Expectation metric modulates the value assigned, with the form ({Pressure, No Pressure} – {Expected Pressure Rate}) x {Value of a Pressure}.

     

    Pass Attempts

    Each pass attempt gets split into five portions: throw, accuracy, catch, yards after catch before contact, and yards after contact.

    • Throw: We take the value of the route at the intended depth in terms of its completion rate and interception rate. Starting in 2023, throw openness (contested, wide open, or in-the-middle) is incorporated as well.
      • 75% owned by the quarterback, 25% owned by the receiver
    • Accuracy: Comparing actual throw accuracy to expected accuracy, and multiplying the difference by the value of an accurate throw (based on expected catch and YAC rates). Starting in 2020, this uses our Expected On-Target Rate metric. Prior to 2020, it uses catchable throw rate, because we didn’t have enough granular accuracy data.
      • 90% owned by the quarterback, 10% owned by the receiver
    • Catch: Comparing actual completion success to expected catch rate, and multiplying the difference by the value of a completion (with expected YAC). Drops are considered completions for the passer. Uncatchable passes are not evaluated for the receiver.
      • 10% owned by the quarterback, 90% owned by the receiver
      • From 2020-22, this is 30/70 because we don’t have throw openness data. Prior to 2020, this is 50/50 because we don’t have granular accuracy data.
    • Yards After Catch: Expected YAC is based on route, throw depth, and alignment. Starting in 2020, this includes granular throw accuracy. Starting in 2023, this includes throw openness. We give the receiver credit based on the difference in EPA between what he achieved and what was expected.
      • 0% owned by the quarterback, 100% owned by the receiver
      • From 2020-22, this is 40/60 because we don’t have throw openness data (and openness is the biggest driver of YAC). Prior to 2020, this is 50/50 because we don’t have granular accuracy data.
    • Yards After Contact: Expected YACon is based on route, throw depth, and alignment. Starting in 2020, this includes granular throw accuracy. Starting in 2023, this includes throw openness. We give the receiver credit based on the difference in EPA between what he achieved and what was expected.
      • 0% owned by the quarterback, 100% owned by the receiver

    The defense at large takes responsibility for the throw itself because many factors contribute to the throw that’s selected, but the primary defender in coverage is responsible for the catch and yards after catch.

    Any broken or missed tackles are evaluated according to their average EPA impact.

    If the pass is intercepted, the quarterback and defender are equally debited and credited based on where the ball was caught. The defender then gets extra credit for the change in field position from his return.

    All players running routes or defending in coverage have an expected target rate based on the coverage scheme, number of routes being run, route type, and alignment. Each player is assigned a value according to how many targets above expectation they had, scaled according to the EPA value of the potential target.

     

    Pressure, Sacks, and Fumbles

    Quarterbacks are given full responsibility for the sacks they incur (less the value of any blown blocks by the offensive line). They are given neither credit nor blame for pressure unrelated to blown blocks, with the idea that their throws are made more difficult but they also had some part in the pressure in the first place.

    Sacks or evaded sacks are measured using the EPA of the sack (or potential sack) and an expected sack rate. The sacker(s) get full credit, unless it was deemed a coverage sack, in which case the coverage unit splits the credit. Starting in 2019, expected sack rate uses the Pressures Above Expectation framework.

    Pass rushers are given credit for how well they generate pressures relative to the average of players lined up at the same position, with Pressures Above Expectation used from 2019 forward. Any pressure-related events that might have been debited from the line are given back to the receivers (and quarterback in the case of blown blocks), owing to their having a harder job as a result of the pressure.

    All fumbles, recovered or lost, are evaluated similarly. The value of the potential turnover from that spot on the field is multiplied by the odds that possession will be lost based on whether it was in the backfield or not. Lamar Jackson was docked a lot of value for his 15 fumbles in 2018, even if the Ravens recovered most of them.

    The person who recovers the fumble gets the inverse of the value that would be lost if the offense recovers or the “rest” of the fumble value if the defense recovers (i.e. the value of the turnover multiplied by the odds that it is recovered).

    Pass Plays / Run Plays / Additional Adjustments / How to Use It

    Run Plays

    Blocking

    Like with passing, the first step is to identify the blockers and box defenders. In addition, we use the intended and eventual run direction to identify the key blockers and defenders on the play (based on data elements like defensive techniques).

    From there, we calculate the play’s expected yards before and after contact based on the number of box defenders, the blocking scheme, the run direction, the spot on the field, etc. The blockers are evaluated based on the play’s performance above that expectation, with most of the credit or blame going to the key blockers identified earlier (unless the runner cut the run back or bounced outside, in which case things are more balanced among blockers).

    Starting in 2018, missed tackles (i.e. eluded without meaningful contact) that occur in the backfield are considered to be the point of contact for the purpose of evaluating the line.

    The earlier the back is contacted on the play, the more responsibility the offensive line takes for the result of the play. That ranges from taking on 90% of the responsibility for plays that are blown up to 25% of the yards before contact beyond the first fifteen.

    The same value is distributed among the box defenders, again focusing on the defenders at the intended gap. Blown blocks are evaluated similarly to what’s done in the passing game.

    On plays where the back bounces or cuts the run back, the linemen initially run behind are evaluated differently from those who the back eventually runs behind. The extent of the difference depends on the direction and magnitude of the back’s movement. For example, cutbacks of 3 or more gaps are the most valuable bounce or cutback, so the value lost by the initial linemen is small because the cost of the cutback is small.

     

    Rushing

    The runner is evaluated against the offensive line’s expected performance calculated above. The rusher is given some credit for yards before contact because elusive runners can generate their own space, but most of his value will come after contact. The back’s responsibility for yards before contact increases the more yards he gains before contact, as it’s more likely he had a role in that result.

    On any play where a broken or missed tackle was charted, we give the back a standard EPA amount based on the average value of a broken or missed tackle, with an adjustment for how likely an eluded tackle is on average. The EPA value is determined by comparing what happens when a tackle is made or eluded at the same yards downfield.

    Fumbles are treated like they are on pass plays.

     

    Tackling

    Given each defender’s initial alignment, the heaviness of the box, and the run direction, we estimate the probability that each player would make the tackle and the EPA that would be expected if each of the possible defenders made the tackle.

    A plus-minus system is used to combine the expected tackle rate and tackle value for each player and measure that against whether the player actually recorded a tackle. That system is also modified to ensure that making a tackle is always better than not making one, regardless of the value of said tackle compared to expectation.

    Broken or missed tackles are taken independent of where they are on the field, so each one is considered worth the value of an average broken or missed tackle in terms of EPA.

    Pass Plays / Run Plays / Additional Adjustments / How to Use It

    Additional Adjustments

    Play Selection

    At this point it’s common knowledge that run plays are less valuable on average than pass plays. At a basic level we can see this because the average yards per attempt on passes is much higher than it is on runs. In a similar way, play action passing is generally more effective than straight dropback passing.

    At a more granular level, coaches can make inefficient decisions by electing to, for example, run from heavy personnel on second-and-10. 

    In order to more accurately evaluate the players on a play as opposed to the coaches or situations, we implemented a Play Selection Adjustment, which applies to each player on each play. We take the expected value of the play given the run/pass decision—including whether there was a play fake on a dropback—and some personnel and game state information, compare it to an average play, take the difference, and distribute that value among the players involved. That way, a back being run into a heavy box time and again isn’t punished simply for being on the field in a sub-par situation for him.

    This adjustment generally moves a player a handful of points one way or the other depending on how often he was involved in pass or run plays over the course of a season.

     

    Season Scoring

    As mentioned above, after all of the initial calculations are done, we re-scale everything so that the league total is in line with the league’s scoring average, or just over 22 points per team per game. Because the quarterback represents the most obviously critical position, he’s given 1/3 of this adjustment for the offense, and the rest is split among the other offensive players.

    The Gist

    Let’s say that you read all this stuff and already kind of forget what you read at the beginning. Here’s a quick-and-dirty version:

    • We take Expected Points Added and give individual value to every player on every scrimmage play, starting in 2016
    • You can find it on the SIS DataHub player pages and leaderboards. Here’s the leaderboards for quarterbacks, offensive linemen, defensive linemen, and defensive backs as examples. The SIS DataHub Pro offers more detailed filtering ability and even more in-depth stats.
    • Pass Offense: Quarterbacks and receivers split value for the throw, the catch, after-catch yards, and after-contact yards. Additional considerations for offensive line performance, uncatchable passes, and drops.
    • Pass Defense: Defensive backs are measured on how often they are targeted above expectation, and much of the value that the receivers or QB get on a completion is correspondingly taken away from the defender. Pass rushers are credited for forcing blown blocks and disruptions at the point of attack. 
    • Rush Offense: The offensive line and running back both take responsibility for yards before contact (weighted towards the O-line), while yards after contact beyond what’s expected are totally owned by the back. Broken tackles hold a lot of value.
    • Rush Defense: Preventing yards before contact is the name of the game for the defensive line, while linebackers and defensive backs get value from making tackles that limit yardage compared to expectation and not missing out on easy tackle opportunities. 
    • In general, there’s a lot of value to be gained and lost from turnovers (or turnover-worthy plays) and plays in key spots (e.g. just outside field goal range, third down).

    Pass Plays / Run Plays / Additional Adjustments / How to Use It

    What do we do with it?

    Now that you’re familiar with what goes into Total Points, what do you do with it?

    The first thing you might do is find players whose traditional stats or reputation don’t line up with their rank in Total Points.

    How was Lamar Jackson barely above zero value rushing in 2018? You saw the reason for that above (his propensity to fumble).

    Why was James Conner such a standout in 2023, even above Offensive Player of the Year Christian McCaffrey? He had a worse offensive line and was elite when it came to yards after contact, production that he doesn’t split with anyone.

    Total Points gives us the opportunity to more critically engage with the stats players compile and consider the context in which he compiled them. And as SIS continues to add more data points to its operation, our assessment of those things will only get better.

  • Machado’s catch on Tuesday was as weird as it gets

    Machado’s catch on Tuesday was as weird as it gets

    BY ALEX VIGDERMAN

    A day after one Padres star infielder was in the spotlight, third baseman Manny Machado turned some virtual heads on a play where he made a catch here:

    A few more feet to the right and those fans might have had a souvenir. Bummer.

    Of course, Machado wasn’t playing a “typical” third base on that play. He started in shallow right field as part of a Full Ted Williams Shift. But even given that, it’s shocking to see an infielder make a play nearly 300 feet from home plate.

    That got me thinking: is that the weirdest place we’ve seen someone catch a flyball (given the position they were playing)?

    The answer is “yes,” perhaps unsurprisingly. If you consider where each position tends to be aligned (shifts notwithstanding), Machado’s catch was an incredible 300 feet from where you’d expect a third baseman to catch a flyball.

    Of course, nowadays, the notion of a player’s “position” is a little fuzzy. Infield shifts are as common as “standard” alignments, not to mention the slow growth of outfield shifting, four-man outfields, and five-man infields. Like I said before, there’s no way that any infielder—especially not a third baseman—is in position to make that catch if shifts aren’t a thing.

    Just for fun, let’s put ourselves in the mindset of a baseball viewer ten years ago and check out the wonkiest locations for each position to field a ball since 2012 (when shifts started ramping up).

    Pitcher

    Unsurprisingly, pitchers have the shortest distances from typical positioning. They’re not shifting, they are trained to defer on balls in the air, and they start the play falling off the mound. The most extreme play I found was a nice play by Rays reliever Jaime Schultz, but he only had to range 100 feet to get there.

    Honorable Mention: Mike Fiers has thrown two no-hitters and has been newsworthy in other ways, but here we get to watch him stumble his way to a rather incredible foul out.

    Catcher

    Foul territory in Oakland is notoriously extensive, so it’s no surprise that the most extreme catch location for a catcher was made in the Coliseum. On this play, Josh Phegley gets on his horse to make a play just before getting to the dugout. Again, this isn’t so notable a play from the perspective of this article for similar reasons to pitchers.

    First Base

    We’re still in tepid waters thanks to first basemen having to be anchored to their base to cover on groundballs. This play by Ryan O’Hearn takes him pretty deep into foul territory, but it’s only 125 feet from where first basemen typically field balls.

    Second Base

    Now we’re talking. You might think that catch location looks an awful lot like where a right fielder would normally stand. And you’d be right! The Rays employed a four-man outfield with three infielders on the right side of second base against Rio Ruiz up by a run in the ninth inning. Second baseman Brandon Lowe makes a routine play that is a small step for a right fielder, but a giant leap for a second baseman.

    Third Base

    Here we have the Machado play. This one’s unique on this list because it combines extreme positioning with pretty good range, so it’s no surprise that it’s the wonkiest fielding location we’ve seen.

    Honorable Mention: For a short time, infield shifts were destroying defensive metrics. Teams would play their third baseman in short right field and he would get credit for incredible range when it wasn’t warranted. Because he was the poster boy for this phenomenon at the time, we at SIS call this the “Brett Lawrie Problem.” Here’s an example of the kind of thing we were seeing at the time.

    Shortstop

    This might be the most impressive play of the group, as Darwin Barney ranges from a similar position to Machado on the previous play to make a sliding grab near the wall in foul territory.

    Like with Oakland, Toronto’s foul ground features heavily in plays like this. In fact, there was a similar play from the same game where second baseman Devon Travis went a long way to make a catch.

    Left Field

    We’ll just call this the DJ LeMahieu Appreciation category. Multiple of the weirdest catch locations for left fielders involve NL West teams heavily shading their outfielders to right field to defend the former Rockies second baseman’s oppo-heavy approach. Of course, that calls into question whether we should be calling him a left fielder to begin with, but that’s a larger question for another day.

    On the most extreme play, David Peralta is playing center field (sort of), makes a relatively easy play in right field, and then starts running back to his usual spot.

    Honorable Mention: Another fun LeMahieu example is this play by Matt Szczur, who starts the play in left center field and records the out in deep center field.

    Center Field

    When the Dodgers traded for Mookie Betts, I was kind of hoping that they would put him back in center field, where he had started his outfield career. This play from his first month as a major leaguer shows the kind of range Mookie has from that position.

    After a handful of plays whose uniqueness is very much driven by positioning, it’s nice to see a rangy play that’s more “pure.”

    Right Field

    What DJ LeMahieu presents outfields from the right side of the plate, Joe Mauer presented from the left side. His heavy opposite-field profile led many teams to shade their outfielders towards left field, and this dead-center catch by Avisail Garcia is a great example of it.

  • Emphasizing the value of a first pitch curveball

    BY ALEX VIGDERMAN

    This Sunday I spent some of my afternoon watching top prospect Spencer Howard’s debut for the Phillies. His debut was forgettable, but the opposing starter, the Braves’ Max Fried, caught my eye with a strong outing. After five shutout innings allowing four hits and striking out six, Fried is among the early leaders in Fangraphs’ Pitching Wins Above Replacement.

    One thing that Fried showed early in Sunday’s contest was confidence throwing his curveball to lead off an at-bat. He started three hitters off with a curve in the first inning, getting two called strikes. Over the course of the game, he threw seven first-pitch strikes, and none of them were swung at.

    Despite being part of a strong start for the young lefty, Fried’s curve hasn’t been an effective pitch. This season, it’s been the least valuable curveball in the league by Fangraphs pitch values. But getting it over early in the count gives him an opportunity to “pitch backwards” and keep hitters off balance a bit.

    Starting a hitter off with a “get me over” pitch every once in a while makes sense given how passive hitters tend to be in that spot. Here’s a quick comparison between the first pitch of an at-bat and any other pitch.

    Swing %Misses / SwingHard Hits / Swing
    First Pitch21%33%14%
    All Others49%32%14%

    Hitters swing nearly twice as often on curveballs after the first pitch, but their general performance when swinging is similar. So if you’re a pitcher concerned about hitters getting after a shaky hook, leading with it is a nice option.

    Here’s a list of the pitchers who have started hitters off with a curveball most often this season.

    PlayerFirst Pitch CurveballsCurve % – First PitchCurve % – After First Pitch
    Lance McCullers Jr.3439%31%
    Sonny Gray3335%23%
    Alex Cobb2642%11%
    Jose Berrios2629%27%
    Dylan Bundy2531%5%

    Lance McCullers Jr. , who famously threw 24 consecutive curveballs to close out Game 7 of an ALCS against the Yankees, tops the list. He threw 10 first-pitch curveballs to the Giants in his no-hit bid against the Giants on Monday night.

    Sonny Gray—who was the only pitcher to throw more first pitch curveballs than Fried on Sunday—is having an outstanding start to the season. That’s been driven by an excellent curveball, which has been the most effective in MLB.

    Alex Cobb and Dylan Bundy stand out because they throw their curveballs more than three times as often on the first pitch compared to any other point in the at-bat. Bundy in particular is having a resurgence to start his Angels career, with his curveballs already accumulating more value than any other season of his career.

    Let’s continue down this rabbit hole, with Bundy as the poster boy.

    So far, hitters have swung only twice at his 25 0-0 curveballs. He’s taken advantage with 16 called strikes, but he’s also taken advantage by avoiding any risk of hard contact. Thanks to a combination of batter patience and the element of surprise, he can afford to leave his breaking ball out over the plate a bit more.

    With that in mind, we can try to find candidates who could benefit from mixing things up to lead off an at-bat and get away with an underperforming curveball. For example, here’s a list of players who since the start of 2019 have allowed hard contact on at least 15 percent of swings on curveballs, thrown at most 10 percent curveballs on the first pitch, and have thrown at least 10 percent curveballs after the first pitch. Players also needed to have thrown at least 1,000 pitches in that time, as well.

    Candidates to Throw More First Pitch Curveballs

    • Clayton Kershaw, Dodgers
    • Walker Buehler, Dodgers
    • Antonio Senzatela, Rockies
    • Jon Gray, Rockies
    • Daniel Mengden, Athletics (since moved to the bullpen)

    One thing that jumps out from this list is that multiple names appear from two teams. There’s some possibility that a team approach or game plan leads to more or less “pitching backwards.” This brings to mind that any kind of recommendation like “throw more curveballs to start off a plate appearance” has to be taken within the context of the broader strategy that a team or pitcher wants to employ.

    But it’s interesting that even players with tremendous track records (e.g. Kershaw) or varying degrees of recent success (e.g. Buehler in recent seasons or Senzatela in recent weeks) still would benefit from finding new opportunities to keep hitters off balance.

  • Highlights from submissions to our first Football Analytics Challenge

    BY ALEX VIGDERMAN

    Many (hopefully all) of you know that we recently concluded the initial judging of our first Football Analytics Challenge. We released some previously-locked-down defensive alignment data to the public and asked people to come up with an answer as to which defensive line position is the most valuable. To go with the competition, we also asked registrants to donate whatever they could to the United Negro College Fund.

    To bring in 133 donations totaling $3,300 so far was beyond our expectations. And we ended up with a solid crop of 34 submissions for the competition, with the finals being presented on YouTube tomorrow night (Wednesday July 29)!

    While we are obviously excited to show you the research that was done by the finalists, we didn’t want to turn away from the work of the other 31 teams off to the side. So here are a few highlights of the efforts of the rest of the participants.

    As a company that dabbles in multiple sports, we appreciate it when analysts draw from multiple sports in their work. Both Nate Rowan and Sam Chinitz cited baseball’s Weighted On-Base Average (wOBA) as the inspiration for their approach to valuing the events on a play. Rowan called his key metric “Points Gained,” which essentially measures the value of a charting data point by taking the difference in EPA/play between plays with and without that event occurring.

    Matthew Reyers, Meyappan Subbaiah, Dani Chu, and Lucas Wu leveraged two key resources outside of the provided data set to aid with their research. The first was the nflWAR paper by Yurko et al (whose work multiple submissions referenced), and the second was the predicted yards at the time of the handoff from the 2019-20 NFL Big Data Bowl winners.

    Sam Struthers and Adrian Cadena used ideas about division of credit from the Yurko paper to distribute EPA among the players who had a chance to be involved on a play. They also estimated the extent to which edge pressure affects the performance of the interior line and vice versa, which was a unique approach.

    Alex Stern invoked multilevel modeling (which does a good job in measuring player-to-player variation when sample sizes can differ wildly) to evaluate the same concept of Individual Points Added. In the passing game, the model focused mostly on generating pressure, which was a decision that many teams made thanks in part to recent research from Timo Riske of Pro Football Focus.

    Calvin Smith used a linear model to predict the EPA of a play based on the existence and direction of pressure. Unsurprisingly, avoiding pressure altogether is the most valuable, with outside pressure being the most effective at reducing the offense’s EPA.

    Matt Colón, Silas Morsink, Robbie Thompson, and Peter Gofen were one of a couple teams (including one of the finalists) who used Madden ratings to help quantify player talent. The group’s approach to evaluating play outcomes was what stood out the most, however. They figured that defensive linemen don’t have much impact on the specific final result of the play, but they do affect what kind of play it was, roughly. So, when evaluating the contributions of each defensive line position, instead of using actual play results, they replaced each play’s EPA with the average EPA value for many different kinds of play results (e.g. “Rush big loss”, “Screen under pressure”, “Medium pass”).

    Dan Rees used some notions of how to break down a play using charting data that we use ourselves within our Total Points statistic. He also focused on the range of possible EPA values on a play when judging a player’s opportunities instead of just the EPA itself, which he called a play’s EPA Range. David Schmerfeld also took a Total-Points-esque angle at valuing plays, and added in explicit measures of “Indirect Impact” that allowed interior linemen to receive identifiable credit for their more subtle play-to-play value.

    Keegan Abdoo and Mehmet Erden used a similar approach, using a linear model that controlled for situational factors to estimate the EPA contribution of a streamlined set of charting data points on each of run plays (forcing the rusher to bounce or cut back) and pass plays (pressuring the quarterback or breaking up the pass).

    A few teams used clustering to robustly characterize player positions using some combination of roster position and play-to-play alignment. One of the better implementations of that belonged to James Hyman, Colin Krantz, Brendan McKeown, and Kushal Shah, who used a random forest to model the most likely roster position for a player (including a Hybrid DE/DT position) and then combined those with defensive line techniques to feed the clustering algorithm.

    We’re so glad to have received so many great submissions to our competition. Feel free to check out work by the finalists or by anyone else in the competition on the competition’s GitHub repository.

  • How amazing of a defensive season could we see in 2020?

    BY ALEX VIGDERMAN

    One of the silver linings of the shortened MLB season is that we could see some outstanding small- sample performances. The Athletic’s Jayson Stark and Eno Sarris reviewed some exciting possibilities along those lines.

    Here, as we tend to do at SIS, we’ll look at things from a defensive perspective. What are the best 60-game runs of defensive excellence that we could see this season?

    To look into this question, I used our new PART Runs Saved, which we introduced earlier this year as the primary component of Defensive Runs Saved. How well has a player done in a 60-game span in a season (going back to 2013, at least)?

    Best 60-game single season PART Runs Saved performances, 2013-19

    PosPlayerSeasonPART Runs Saved
    1BFreddie Freeman201810
    2BJonathan Schoop201716
    3BMatt Chapman201919
    SSAndrelton Simmons201720
    LFAdam Duvall201814
    CFJuan Lagares201415
    RFMookie Betts201717
    Pitchers and catchers excluded because they accumulate games and PART Runs Saved differently

    First off, let’s acknowledge how impressive some of these totals are! Saving 15 runs is enough to lead the position in some seasons, and these players accomplished that over less than half a full season.

    It might not surprise you to find out that the best of the best are also quite good in short sprints. Each of Chapman, Simmons, Lagares, and Betts won the Fielding Bible Award in the season they had their outstanding stretch, and except for Lagares that wasn’t the only season they were crowned the best defender at that position.

    The outstanding runs from Freeman and Schoop are more representative of the funky results we might see in a shortened season. Both of them were below average at converting batted balls into outs over the balance of their outstanding seasons, with Freeman costing the Braves two runs and Schoop costing the Orioles five runs.

    While we’re not in the business of disparaging players in any way, it’s only fair to look at the other side of the coin, the players who were exceptionally poor defensively in a short spurt.

    Poorest 60-game single season PART Runs Saved performances, 2013-19

    PosPlayerSeasonPART Runs Saved
    1BLuke Voit2019-10
    2BRickie Weeks Jr.2014-16
    3BColin Moran2019-16
    SSEduardo Nuñez2013-21
    LFTrey Mancini2018-12
    CFCharlie Blackmon2018-20
    RFMelky Cabrera2019-13
    Pitchers and catchers excluded because they accumulate games and PART Runs Saved differently

    Like with the runs of excellence, this list includes some of the most outstanding single-season performances we’ve seen. Blackmon and Nuñez finished with the most and second-most runs cost in a single season in the PART era, with Nuñez doing so in roughly half of a full season of innings.

    Which of the perennial Fielding Bible Award contenders will dominate this sprint of a season? Who will crash that party with a surprising stretch of excellence? We’ll know in a couple months!