In 2017 the BBC began reporting ‘Expected Goals’ for matches featured on its Match Of The Day program, and Sky introduced the statistic on its Monday Night Football show.
Expected Goals or (xG) began life as a hockey statistic; it summarizes the value of the shooting opportunities enjoyed by a team or player over some defined period, in units of goals. The term ‘expected’ comes from probability theory; the expected value of a random variable is the long-run average value of repetitions of the event it quantifies. In the case of football or hockey, the expected value of a shooting opportunity is the probability of scoring from it. An xG of g means that given the same number and quality of shooting opportunities, the average player would score g goals.
Although xG is well-understood within the analytic community, outside that community it seems to baffle some people and enrage others; among the anti-analysts it has become a symbol of everything that is wrong with the analytics movement – geeky, opaque and irrelevant. ESPN pundit Craig Burley famously went into melt-down when co-presenter Gabriele Marcotti mentioned xG in his analysis of a Champions League match between Bayern Munich and Athletico Madrid. Mysteriously, I can’t seem to locate the video any more, but the exchange is described here. And Jeff Stelling, the Soccer Saturday presenter called xG “absolute nonsense – the most useless stat in the history of football“.
I’m not sure what people find so upsetting about xG. Perhaps it’s the concept of decimal goals, perhaps it’s the concept of probabilistic expectation, perhaps it’s a distrust of anything slightly more abstract than a ball fizzing into the back of a net, or perhaps it’s simply fear of the new – not an unknown phenomenon in football.
Perhaps we need a better statistic for reporting the undoubtedly useful information which xG contains. In this post I want to describe a way of reporting potential match outcomes which is based on xG, but easier I think to understand. I call it the Most Probable Result (MPR), and it has a little brother called MPR2.
The Most Probable Result
To construct the MPR of a match, we first calculate the expected values of the individual shots in the normal way. For the purposes of this post I used a simple xG model based on location, assists, body part and type of play, but any xG model could be used. Instead of summing the expected values to give the total expected value (i.e. xG) for each team, we simulate a large number of matches to obtain a distribution of probable scores. As the expected value of a single shot is the average probability of scoring from it, we can use a random process to simulate match outcomes. For each shot we pick a random number between zero and 1; if this number is less than the expected value, we record a goal, otherwise we record a non-goal. If we simulate a match many times, we will end up with a distribution of scores for each team.
Let’s look at a concrete example. Figure 1 shows the results of simulating the scores in Manchester United’s game against Crystal Palace on 5th March 2018. United had 17 shots in this game, scoring from three of them, and their xG was 1.44. Palace had ten shots and an xG of 0.78 and scored twice.
Figure 1. Distribution of Scores for Crystal Palace v Manchester United
Both teams considerably outperformed their xG in this game, and Figure 1 shows that Manchester United’s most probable score was 1 goal, while Palace’s most probable score was no goals. The MPR for this match was a 1-0 win for Manchester United.
We can also find the second most probable result (MPR2) by computing the joint probabilities of the scores for each team assuming independence. This turns out to be a 1-1 draw. The MPR scores thus suggest that Manchester United deserved their win, but that Palace might have managed a draw.
Table 1 shows the MPR and xG values for the last 20 Premier League matches so far this season. The MPRs are based on 1,000 simulations per match.
Table 1. Most Probable Results: Premier League,last 20 matches
Scrolling through Table 1 gives an idea how MPR works.
For example, Chelsea’s 2-1 win over Crystal Palace on 10th March seems to underestimate their dominance in the match; the MPR was a 4-1 win, and the next most probable result was 3-1. On the other hand, Tottenham were rather flattered by their 4-1 away win at Bournemouth on the 11th March. The MPR indicates that 2-1 was more reflective of the chances they created.
MPR, either alone, or especially in combination with MPR2, seems to capture most of the useful information in xG. But I think it communicates the statistical expectation of match results more clearly, in a way that non-specialists can grasp. Everyone knows that some match outcomes are more probable than others before the match takes place. So it a small step to imagine outcomes that could have happened but perhaps didn’t, either because some good chances weren’t taken, or because some slim chances were.
Football analysts often need to convey unfamiliar and complex concepts like xG to a non-specialist audience. If we cannot do so effectively we cannot really complain if no-one takes any notice of what we say. I don’t really know if reporting MPR will help to demistify expected goals or endear it to the sceptics. Perhaps there are better ideas than MPR; perhaps it will please no-one and be disdained by analysts and non-analysts alike. But if Craig Burley and Jeff Stelling like it, my work here is done.