In this post I want to look at shooting decisions from an expected goals perspective.
The concept of expected goals (xG) was first proposed by Alan Ryder in 2004, in a paper about ice-hockey entitled Shot Quality. Since its adoption by the football analytics community, xG has become a widely applied metric, regularly presented and discussed by bloggers like DeepxG, Ted Knutson and Michael Caley.
xG is calculated from actual shots attempted. Given an array of shot descriptors such as the pitch location, whether the strike was kicked or headed, assisted or solo, the game situation preceding the shot and so on, we can estimate the probability of it becoming a goal. xG is obtained by summing such probabilities over an appropriate universe of shots such as a match, and it measures the most likely total goal tally from those shots. All the expected goals models I know of are based on this principle, and differ only in the shot parameters used to compute the likelihoods.
However, relatively little attention has been paid to the individual shot probabilities themselves, and how they are distributed. I denote these individual probabilities xg (lower case g to distinguish it from the overall xG statistic). So in this post I thought it might be useful to have a look under the hood, see what goes into making up the total xG, and think about the implications.
My data sample for this post was 29,800 Premier League shots (excluding direct fee kicks and penalty situations) returning 2,732 goals. The corresponding conversion ratio is 9.2%.
We might conclude from this that the probability of scoring from a typical shot is about 1 in 11. But we would be wrong. Because it turns out that the distribution of xg is highly skewed, with a long tail.
The probability density function for xg shows that most shots have a very low probability of scoring; the density function peaks at 2.7% indicating that the most frequent type of shot has only a 1 in 37 chance of scoring.
Interestingly, the shape of the xg density function is quite consistent across the top tier of European football. The chart below shows the density functions for a single season in each of 5 European leagues, and we can see they are almost identical.
xg distributions can also be plotted for individual players, as illustrated below for Charlie Adam and Luis Suarez. To keep things honest, I’ve plotted the individual curves against the overall.
We can immediately see that Charlie Adam is a highly speculative shooter – even for a midfielder. His xg density function is even more peaked than the average, showing he takes a greater proportion of low-scoring chances than the average player. Suarez’s function on the other hand is much flatter, and shows he takes far fewer speculative shots, and more high quality ones than average.
What xg and its associated density curve give us, that xG does not, is some insight into shot selection. The same value of xG can result from either a small number of high xg shots, or a large number of low xg shots, but the density function makes the pattern of shot selection quite clear.
We can also derive xg density curves for teams. And they also show differences. To illustrate this, here are Arsenal and Newcastle; we can see that Arsenal attempt fewer speculative chances .
High and Low Return Shots
Of course we don’t want to be plotting graphs all the time – pretty as they are – so it would be nice to have a number to summarize the different xg distributions. One choice would be the mean. The mean xg for Charlie Adam is 0.056, and the mean xg for Suarez is 0.110. So clearly the mean conveys some information to distinguish the two functions. But because of the skewness of the xg distribution, it does not necessarily convey an intuitive sense of the difference between the two players.
If forced to choose a single number to define an xg distribution, my preference would be for the percentage of high-return shots. I define high-return shots as shots with more than a 1 in 20 chance of scoring. And conversely low-return shots are defined as having less than a 1 in 20 chance of scoring. This split is quite convenient because on average the share of high- and low- return shots turns out to be about 50-50. For the European data shown above:
|Competition||Percentage of high-return shots|
in my Premier League dataset there were 14,805 low-return shots and 14,995 high-return shots , producing respectively 419 and 2313 goals. This means that high-return shots (which have an average probability of scoring of 0.154) are more than 5 times as productive as low-return shots (average probability of scoring = 0.028).
The remarkable corollary is that high-return shots completely swamp low-return shots as a source of goals. Fully 85% of goals come from high-return shots and only 15% from low-return shots.
For the two players we saw above, Suarez’ percentage of high-return shots is 67% while Adam’s is only 29%; this metric gives a good indication of the shooting strategies of these two players.
To Shoot or Not to Shoot
What can the return classification tell us about shooting decisions?
At the team level, Arsenal and Manchester United attempted 58% high-return shots, Chelsea took 50% and Newcastle 43%. Presumably, Newcastle took such a low percentage of high-return shots because they were limited in ability, and unable to create a greater proportion of quality chances. But if Arsenal took 58% high-return shots they could also have chosen the easier mix of 43% high-return shots. That they did not do so suggests they were refusing to take low-return shots that they could have taken, in favour of high-return ones.
And now for some fag-packet calculations. On average a low-return shot is taken 19.6 m. away from the opponent’s goal (i.e. generally outside the penalty area) and a high-return shot is taken from 10.3 metres out (i.e. generally inside the area). Closing that gap, by moving the ball 9.3 m. closer the goal increases the chances of scoring 5-fold. So would it be better to attempt a low-return shot or attempt the pass? It depends on the probability of pass completion.
A quick and dirty model suggests that the probability of completing a 5-10 m. pass, from about 20 m. out, and within 30 m either side of the pitch mid-line varies between about 62% and 75%. If we take the lower figure to be on the safe side, the probabilities work out as follows:
|Action Sequence||Probability of success||Probability of scoring|
|Pass + High-return shot||0.62 X 0.154||0.095|
|Pass+ Pass + High-return shot||0.62 X 0.62 X 0.154||0.059|
A team could theoretically attempt two passes when 20 m. from goal and still double the probability of scoring from the possession. Of course, this is only possible if a team-mate is in the right position to receive the ball. But if and when the choice is available, the pass is a considerably better bet than the low-return shot. That’s why it makes sense to get players inside the box as much as possible when attacking.