In my last post, I looked at passing performance and identified players with pass completion rates higher than expected, given the difficulty of the passes they attempted. This seemed to pass the eyeball test, throwing up players like Kroos, Ozil, Arjen Robben and Messi as top performers on this metric.
But closer inspection reveals a bit of a problem. Kroos for example, who turns out to be the top-rated midfield passer, has a pass completion rate 3.9% higher than average. This equates to only 3 more completed passes per match. The numbers for the other top players are similar; it seems difficult to believe that a tiny handful of extra passes could have any noticeable affect the outcome of a match.
So maybe, the answer lies in the 80-90 percent or so of passes that all players complete anyway. Perhaps top players somehow make better passes. This brings up the question of how to value a pass, which is the initial topic of this post. Though, fair warning, I do meander and drift somewhat into more open waters.
The scheme I will use depends on the idea of a ‘value surface’; this associates each location on the pitch with some value for being in possession at that location. The reward of a pass is then the difference in values between the start and end of the pass. So how do we assign the values? For present purposes, I’m going to define the value at any location p as the probability that possession at p leads to a shot. (I could have used xG instead of shots, but let’s keep it simple.) We can now define the Pass Reward as the increase in probability of a shot before and after the pass.
At this point it is probably worth mentioning some other models that have been developed to value passes, or the passing contributions of individual players. They all look somewhat alike on casual inspection, but there are some key differences to note. (In the next paragraph I oversimplify, but mainly to highlight the real differences in approach.)
- Progressive Passing Value Added (Opta). This model uses a value surface, defined by distance from goal. The reward of a pass is the difference between the values of the start and end locations. Unsuccessful passes count against.
- xG_added. (Nils Mackay). This model also uses a value surface, with values determined by a transform of location. Specifically, the distance to goal and the angle subtended by the ball and the goal posts are transformed to an xG value by an xG model, and the values of locations are xGs. Advanced versions of the model include additional pass attributes, but it remains at heart a weighted location model.
- The xGChain (StatsBomb). This scheme is used to assess players rather than passes. In this model, the value of a pass depends on the outcome of the passing sequence it belongs to. If the sequence ends in a shot, each player who participated in the sequence is credited with the xG value of the shot.
- Goal Probability Added. (Sarah Rudd). In this model, the reward of a pass is the change in probability of scoring a goal before and after the pass. Despite using a different value metric (goals as opposed to shots), the underlying philosophy of this model is the same as mine, in that both determine the values of actions in terms of probabilities. The main difference is that SR’s model (to use Markov decision process language) uses discounted rewards (i.e. including the anticipated future rewards of an action), while my model uses only the immediate rewards.
This by no means exhausts the number of pass evaluation schemes I have seen. For example, in the IMPECT model, the value of a passes depends on the number of critical defenders it cuts out. The point I want to stress is we have quite a number of schemes (and although they might have their individual pros and cons, we have relatively little evidence of their respective merits.)
With that out of the way, let’s look at the value surface for shots. The data for the surface pictured below comes from the three seasons 2015-2017 in the top five European leagues.
As might be expected, the value of possession increases as we get closer to the opposition goal. To get a feel for the sizes of pass rewards, the figures below show the rewards for four common types of pass.
The Madness of Metrification
Now let’s look at some of the attributes of pass rewards. The mean is 0.046, and the sd is 0.14. The correlation between pass reward and pass completion is -0.58. This makes sense: high reward passes are more difficult to complete. It seems we have a plausible measure of the importance of a pass.
At this stage of the proceedings, few analysts and bloggers seem able to resist the temptation to metrify – if that’s even a word – the measure they have just constructed, and use it as a yardstick to evaluate players. It goes something like this.
Step 1. For each player, compute a per-90 score for the measure. In the present case, we would tot up the pass rewards for each player, divide by his minutes played and multiply by 90.
Step 2. Construct a list of the 20 or so players who score highest on the per-90 measure.
Step 3. Scour the list for the presence of Messi. If found, declare the birth of a new metric.
Needless to say, this ignores a few problems, not the least of which is that Messi has appeared on every top 20 list constructed by anybody anywhere, but let’s put that aside for now, and boldly metrify pass reward. The table below shows the results.
|Position||Player||No. Passes||Minutes played||Pass Rewards/90|
|Attacking Midfielders||Mesut Özil||6078||8557||1.93|
|Ángel Di María||4033||6527||1.68|
Well it doesn’t look to bad. It’s got Messi and Neymar on it, and a general selection of players who grace some of the best teams in Europe. But there are at least two other things we should do before announcing yet another football metric.
First, we should be clear to what extent we have constructed a state measure, i.e. something that is expected to fluctuate across time or context, or a trait measure, i.e. something with a measure of stability. Of course the distinction is not always clear-cut, but it is important know where our proposed new metric sits. In the present case, we have 704 players who have played at least 540 minutes in each of two teams. The correlation between Rewards/90 in the two teams is 0.74, which suggests the metric is quantifying a real player characteristic, and can be considered a trait measure.
Even more important, before burdening the world with yet another metric to confuse scouts/upset Craig Burley/make money, we should provide at least some shred of evidence that what we have to offer is useful. Useful can mean for instance, illuminating a new aspect of the game, or predicting outcomes on the field better than existing metrics of the same type. How does Pass Reward (i.e. the average reward of a pass) compare in this respect? Actually not too well. It’s obviously not measuring anything that has not been measured before. And it doesn’t seem to be that brilliant at predicting outcomes on the field, at either the match level or the team*season level.
Take match level first. Pass Reward does add something, but its at the margins. Consider the regression
where i and j are the home and away teams, Goal Diff is home team goals minus away team goals, Passes_Completed is the number of home team successful passses minus the number of away team successful passes and Avg_Reward is the average reward per pass for the home team minus the average reward per pass for the away team. The difference in average Pass Rewards is about 10% as important as the difference in the number of passes.
If our outcome is a team’s average points per match over a season, and we estimate the linear regression equation below:
it turns out that Passes Completed explains about 53% of the variance, and Pass Reward a further 4%.
Well 4% is worth having when you’ve already explained 53%, but it isn’t earth shattering. Finally, the chart below illustrates the relationships between Pass Completion rates, Pass Reward rates and team performance in terms of points per match. High performing teams are shown in lighter colours and we can see they cluster in a region of high completion rates and moderate rewards. The difference between high and low performing teams doesn’t seem to be a matter of producing more aggressive passes. Manchester City stands out in fact as having a low Pass Reward rate.
Now we could probably extend or adapt the pass reward measure in various ways to increase its utility somewhat, for example by using xG instead of shots, or accounting for unsuccessful passes, but I doubt whether it would ever become a major factor in explaining team performance. For these reasons, I would be cautious about metrifying it at this stage; it may or may not be better than other measures of pass importance.
But writing this post has made me realize that – in the public domain at least – we seem to be lacking information about the characteristics of the various metrics that have been developed to assess players, teams and performances. The analogy that springs to mind is psychology. A vast range of psychological scales has been developed to assess individual differences in the areas of ability, personality, values, preferences, temperament and so forth, and other sub-disciplines of psychology boast a similar forest of measures. When other scientists want to use a measure in their own research, they will look for minimum levels of reliability and validity, and some idea of the relationship between that measure and others (what Cronbach and Meehl called the nomological network). Something like this would be very useful in football. We should be clear what our metrics do and do not measure, and what their limitations are; we should not be in the business of assessing players on metrics whose properties we don’t understand.
Finally, to return to the question I began with, I don’t think this post solves the problem of why pass completion rates are so powerful as predictors of team success, and why high pass completers like Kroos are so prized. I thought I that the missing dimension was pass quality, and that the key to success was making “better” i.e. more aggressive passes. However, it seems that this is unlikely. Perhaps the missing dimension in all this is the way teams string passes together. I plan to look at this next.