Injuries. Everyone hates them, and we all can agree injuries are the worst part of sports. 

Part of the reason is that injuries can occur when you least expect them, and there are so many different variables that can come into play for why they happen, making it very difficult to accurately predict them. Unfortunately we cannot just turn injuries off like we are playing a game of Madden, and so there can be great value in trying to provide something to base expectations off in terms of how much time a player misses. With our multi-year injury risk model, we aimed to do just that.

Methodology

Football can vary greatly depending on what position you play, and so we split our player dataset into three different position groups: offensive skill players, offensive linemen, and defensive players. We also wanted to test various time frames, so we made predictions for the number of games players will miss in one-, two-, and three-year spans. Features incorporated in our dataset include biographical data, injury history data, playing time data, and other stats that convey what the player does on the field.

We used the XGBoost machine learning framework to build our regression model. For our target variable, to lessen the impact of outlier cases, we took the average of the number of games missed in the next year (or two or three) and the number of games the player was expected to miss based on the injury prognosis (which is something that our injury staff logs for most injuries). 

Findings

Games Missed Prediction Error by Position Group and Timeframe

One Year Two Years Three Years
Offensive Skill Players 3.6 5.9 7.1
Offensive Linemen 3.7 5.3 7.3
Defensive Players 3.7 6.2 7.3

Root mean squared error (RMSE) was used to evaluate the accuracy of our model. As we can see above, overall, the accuracy is similar across positions, with accuracy predictably being better the shorter the time frame is. However, it can also be argued that the three-year model was the most accurate because when you divide the RMSE by the number of games corresponding to that time frame (i.e. 17 games in one year, 51 games in three), the ratio is smallest for the three-year interval. Therefore, the model predicted games missed for the longest time span with the least error percentage-wise. 

Average Predicted Games Missed by Position Group and Timeframe

One Year Two Years Three Years
Offensive Skill Players 2.2 5.2 7.7
Offensive Linemen 2.4 6.4 8.2
Defensive Players 2.1 4.6 6.4

We observe more of a discrepancy across position groups when examining the average predicted games missed. For all three time intervals, offensive linemen were predicted to miss the most games, followed by offensive skill players, and then lastly the defensive players. This is in line with the trend for actual games missed by position group.

Top 10 Feature Importances by Position Group

Here are the top 10 features in terms of importance (how much a feature contributes to a model’s predictions) by position group for the three-year interval.

Offensive Skill Offensive Linemen Defensive
1. Snap % in the Slot Snap % Playing Tackle Age
2. Snaps Blocking Projected Snaps Total Points per Snap
3. Age Games Missed Past Year Projected ST Snaps
4. Total Points per Snap Blown Block % Snap % Playing Linebacker
5. Special Team Snaps Total Points per Snap Special Team Snaps
6. Snaps in Motion BMI Projected Snaps
7. Projected Snaps Snaps on Pass Plays Snap % Playing Run Defense
8. Routes Run Age Snap % Pass Rushing
9. Games Missed Past 3 Years Games Missed Past 3 Years BMI
10. Snaps Player was Hit Snap % Playing Guard Tackles Made

We can see that projected snaps and age were among the most significant for each position group. To dive deeper into why, we ran a comparative analysis between players in the top half of predicted games missed and players in the bottom half.

Comparison of Key Features for Skill Position Players, by Predicted Games Missed

The table below shows an example of one of our datasets, skill players in a two-year time span. 

High Prediction Low Prediction
Predicted Games Missed, Two Years 6.1 4.4
Snaps Played Past Season 835 554
Projected Snaps Next Two Years 1,503 1,045
Age 26.6 27.4

We can observe that on average, players who are predicted to miss more games have played more snaps in the previous season, are projected to play more snaps the next two years, and are about a year younger than players who are predicted to miss less games. 

This pattern was more or less present across all our datasets. At first, this may seem counterintuitive. One could think that older players are more injury prone or that players are projected to play less because they got hurt. However, the data is telling a different story; players are predicted to miss more games when they are expected to also play more games. Playing football is itself a hazard.

One place where this trend reverses is at the very end of a career. Among the 1,700 players we made predictions for 2025 and beyond, about two dozen players were predicted to miss more games in a two-year period than in a three year one, a finding that was quite confounding at first. However, when comparing these players to the rest of the population, an even larger discrepancy in age existed, with them being three years older on average than their counterparts. The model was picking up signals for when a player was close to retirement and inferring reduced playing time over a three-year span. For these cases, we adjusted the number of predicted games missed in three years to be equal to the two-year number. 

We observed more trends when taking a look at who our model predicted to miss the most games. 

Top 10 Players Predicted to Miss the Most Games Over a Three Year Span

Player Position One Year Two Years Three Years
Kamari Lassiter, HOU CB 4.0 8.7 14.6
Rashawn Slater, LAC OT 3.5 9.1 13.7
Paulson Adebo, NYG CB 4.4 8.1 13.6
Derek Stingley Jr., HOU CB 3.4 8.3 13.3
Dalton Kincaid, BUF TE 3.3 9.5 12.9
Kyle Pitts, ATL TE 2.3 7.0 12.9
Luke Goedeke, TB OT 3.7 9.1 12.8
Pat Freiermuth, PIT TE 2.8 8.5 12.8
Alaric Jackson, LAR OT 3.0 7.8 12.6
Jake Ferguson, DAL TE 3.2 8.1 12.6

First, every player was drafted in 2021 or later, keeping in line with the propensity of our model to predict younger players missing more games than older players (in large part because younger players are on the field more). 

Second, all players had sustained a multi-week injury at some point during the last three years. Indeed, though not as prevalent as the other key features, missed games in past years were greater on average for players in the top half of predictions than for those in the bottom half. 

Lastly, within the three position groups, our model predicted tight ends, offensive tackles, and cornerbacks to be more prone to injury, and accordingly, everyone on this list plays one of those positions.

Conclusion

While attempting something as inherently difficult as predicting the number of games a player will miss due to injury in a given time frame may seem like a lemon that is not worth the squeeze, our study shows there is value in going through the exercise and extracting inferences from the data. Chief among them is that the more a player is on the field, the more he invites risk of getting injured. 

This is a simple premise, but one that is often more overlooked than it should be, and emphasizes even more just how much we should appreciate players who exhibit extraordinary durability.