I have always been suspicious of Plus-Minus in football.
Plus-minus is a statistical method for evaluating individual players in team games. It works by measuring the performance of a team when different combinations of individuals are on the pitch, and teasing out from that the contributions of each player. Plus-minus is a ‘top-down’ estimation procedure, which encapsulates the performance of a player in a single number. It can be contrasted with the kind of ‘bottom-up’ procedure used to develop rating schemes like the Opta Index and the Castrol Index. Here a player’s performance is represented by a series of key performance indicators (KPIs), each representing a different skill, and an overall performance score is derived by calculating a weighted sum of the KPI scores.
Conceptually, Plus-Minus is a neat idea, but there are some practical problems in estimating the player contributions. Plus-minus works best when there is a lot of variance in the player lineups and a high frequency of scoring, so that the effect of a player being on the pitch or not being on the pitch can be easily seen. These conditions hold in a game like basketball, where there are frequent changes in players on court and a constant stream of scoring activity. However, football lineups are relatively static throughout the game and scoring is a rare event. Furthermore, the estimation of plus-minus scores has traditionally involved solving a regression problem with thousands of features, and deploying some statistical contortions to ensure the stability of the player coefficients and to maximise predictive power. So there are good reasons to question how well a plus-minus system can work in football.
I was therefore intrigued to come across a series of videos by Lars Magnus Hvattum in which he describes the warts-and-all development and testing of a plus-minus system for rating football players. Lars and I decided to collaborate on an investigation into the properties of plus-minus ratings.
One facet of our work is to ask whether plus-minus ratings can be predicted from on-the-ball KPIs, such as goals/90, passes completion percentage, and so on.
The analysis was based on eight seasons of player event-level data from matches played in each of the Big 5 European domestic leagues (EPL, Bundesliga, La Liga, Ligue 1 and Serie A.) between 2009 and 2016.
First, plus-minus ratings and performance indicators were calculated for 5,121 players who had played at least 540 minutes. The players were then split into six positional categories, Goalkeeper (N=395), Defender (N=1,859), Defensive Midfielder (N=290), Midfielder (N=1,110), Attacking Midfielder (N=565), and Forward (N=902) according to the positions they most frequently played in.
Separate regression equations were then estimated for each category. Because we had no prior reason to choose particular performance indicators, we started with an array of 23 candidate predictors some of which were variants of each other, for example Successful Aerial Duels/90 and Percentage of Successful Aerial Duels. For each category we conducted a feature selection procedure in which the candidate predictors were screened individually for a functional relationship with the plus-minus rating using a five-fold cross-validation scheme. Predictors that survived this filtering process were used in the category regression equations, and finally we calculated the relative importance of the indicators.
The results are shown in Table 1 below. The cross-validation R2 is the average variance explained across the five iterations of the cross-validation procedure. The entries in the table are the relative importances (Zuber & Strimmer CAR scores) of the KPIs in each regression equation, and the values in each column sum to 100%. So for example the value of 8.4% for Successful Aerial Duels/90 in the Defenders column means that this KPI contributes 8.4% of the explained variance in plus-minus scores for Defenders, while the value of 1.8% for Assists/90 means that this KPI contributes a much smaller proportion, and is accordingly a less important predictor of plus-minus scores.
Table 1. Relative importances of plus-minus predictors for six positional categories of player.
|KPI||Defender||Def. Mid.||Midefielder||Att. Mid.||Forward||Goal-keeper|
|Successful Aerial Duels/90||8.4%||0.0%||0.0%|
|Fouls committed/90 [neg]||1.1%||1.1%||1.9%|
|Saves to Shots Ratio||33.7%|
|Interceptions & Blocks/90||2.8%||10.9%||5.9%||4.3%|
|Successful ground duels %||5.2%|
|Pass Completion Rate||14.3%||13.8%||21.7%||8.0%||7.2%||58.2%|
|Shots on Target/90||14.0%|
The results are intriguing. First, the substantial R2 values show a non-trivial relationship between plus-minus ratings and key performance indicators. But the relationship is by no means perfect; a considerable amount of variance in the plus-minus ratings remains unexplained by our event-level KPIs, especially for Goal-keepers.
Next, the pattern of relationships looks fairly plausible. For example, Goals, Assists and Key Passes are important predictors of plus-minus ratings for Attacking Midfielders and Forwards, but not for the other player categories; Pass Completion Rates are more important predictors of the plus-minus ratings of defensive/midfield players, and less important predictors for the ratings of attacking players.
Other results are provocative. Here we might include the high importance of Successful Passes per 90 for Defenders and Defensive Midfielders, which accounts for roughly 50% of the explained variance. (This KPI does not appear for Midfielders, but it may have been displaced from the regression by Touches per 90.)
Finally, there are also some anomalous results. Key Passes /90 appears as a predictor for Defenders, which seems strange. Secondly, the importance of Pass Completion Rates for Goal-keepers, and the absence of an Aerial Duels predictor in the regression also seem counter-intuitive.
What can we conclude from these results?
The key finding is that plus-minus scores contain information about positionally relevant KPIs. Despite the statistical difficulties in estimating plus-minus ratings, they do seem to capture something of what a player actually does on the field. Proponents of plus-minus systems can take some comfort from this.
However, the ratings cannot simply be explained by these KPIs, and the bulk of the variance in plus-minus ratings remains unexplained, and more research is needed to discover what other factors might be influencing the ratings.
Possible influencers could be individual performance attributes such as pace or endurance, or differences in off-the-ball actions like finding space, anticipation, and pressing; none of these are captured directly by our present set of regressors, although it could be argued that they are included indirectly. Other influencers might include interpersonal skills like leadership or temperamental attributes like calmness under pressure; we would expect players strong in these areas to contribute to overall team performance in ways above and beyond what is reflected in their individual performance statistics. They would be captured by global plus-minus measures, but not by event-level data.
So have I learned to love plus-minus? Let’s say we are dating - but I am still seeing other rating systems.