In 2017 the BBC began reporting ‘Expected Goals’ for matches featured on its Match Of The Day program, and Sky introduced the statistic on its Monday Night Football show.

Expected Goals or (*xG*) began life as a hockey statistic; it summarizes the value of the shooting opportunities enjoyed by a team or player over some defined period, in units of goals. The term ‘expected’ comes from probability theory; the **expected value** of a random variable is the long-run average value of repetitions of the event it quantifies. In the case of football or hockey, the expected value of a shooting opportunity is the probability of scoring from it. An *xG* of *g* means that given the same number and quality of shooting opportunities, the average player would score* g* goals.

Although *xG* is well-understood within the analytic community, outside that community it seems to baffle some people and enrage others; among the anti-analysts it has become a symbol of everything that is wrong with the analytics movement – geeky, opaque and irrelevant. ESPN pundit Craig Burley famously went into melt-down when co-presenter Gabriele Marcotti mentioned *xG* in his analysis of a Champions League match between Bayern Munich and Athletico Madrid. Mysteriously, I can’t seem to locate the video any more, but the exchange is described here. And Jeff Stelling, the Soccer Saturday presenter called *xG* “*absolute nonsense* – *the most useless stat in the history of football*“.

I’m not sure what people find so upsetting about *xG*. Perhaps it’s the concept of decimal goals, perhaps it’s the concept of probabilistic expectation, perhaps it’s a distrust of anything slightly more abstract than a ball fizzing into the back of a net, or perhaps it’s simply fear of the new – not an unknown phenomenon in football.

Perhaps we need a better statistic for reporting the undoubtedly useful information which *xG* contains. In this post I want to describe a way of reporting potential match outcomes which is based on *xG*, but easier I think to understand. I call it the Most Probable Result (*MPR*), and it has a little brother called *MPR2*.

## The Most Probable Result

To construct the *MPR* of a match, we first calculate the expected values of the individual shots in the normal way. For the purposes of this post I used a simple *xG* model based on location, assists, body part and type of play, but any *xG* model could be used. Instead of summing the expected values to give the total expected value (i.e. *xG*) for each team, we simulate a large number of matches to obtain a distribution of probable scores. As the expected value of a single shot is the average probability of scoring from it, we can use a random process to simulate match outcomes. For each shot we pick a random number between zero and 1; if this number is less than the expected value, we record a goal, otherwise we record a non-goal. If we simulate a match many times, we will end up with a distribution of scores for each team.

Let’s look at a concrete example. Figure 1 shows the results of simulating the scores in Manchester United’s game against Crystal Palace on 5th March 2018. United had 17 shots in this game, scoring from three of them, and their *xG* was 1.44. Palace had ten shots and an *xG* of 0.78 and scored twice.

**Figure 1. Distribution of Scores for Crystal Palace v Manchester United**

Both teams considerably outperformed their *xG *in this game, and Figure 1 shows that Manchester United’s most probable score was 1 goal, while Palace’s most probable score was no goals. The *MPR* for this match was a 1-0 win for Manchester United.

We can also find the second most probable result (*MPR2*) by computing the joint probabilities of the scores for each team assuming independence. This turns out to be a 1-1 draw. The *MPR* scores thus suggest that Manchester United deserved their win, but that Palace might have managed a draw.

Table 1 shows the *MPR* and *xG* values for the last 20 Premier League matches so far this season. The *MPR*s are based on 1,000 simulations per match.

**Table 1. Most Probable Results: Premier League,last 20 matches**

Scrolling through Table 1 gives an idea how *MPR* works.

For example, Chelsea’s 2-1 win over Crystal Palace on 10th March seems to underestimate their dominance in the match; the *MPR* was a 4-1 win, and the next most probable result was 3-1. On the other hand, Tottenham were rather flattered by their 4-1 away win at Bournemouth on the 11th March. The *MPR* indicates that 2-1 was more reflective of the chances they created.

## Final Thoughts

*MPR*, either alone, or especially in combination with *MPR2,* seems to capture most of the useful information in *xG*. But I think it communicates the statistical expectation of match results more clearly, in a way that non-specialists can grasp. Everyone knows that some match outcomes are more probable than others before the match takes place. So it a small step to imagine outcomes that could have happened but perhaps didn’t, either because some good chances weren’t taken, or because some slim chances were.

Football analysts often need to convey unfamiliar and complex concepts like *xG* to a non-specialist audience. If we cannot do so effectively we cannot really complain if no-one takes any notice of what we say. I don’t really know if reporting *MPR *will help to demistify expected goals or endear it to the sceptics. Perhaps there are better ideas than *MPR;* perhaps it will please no-one and be disdained by analysts and non-analysts alike. But if Craig Burley and Jeff Stelling like it, my work here is done.

NikolaMarch 13, 2018 at 2:38 pmSo is this just a poisson model?

adminMarch 27, 2018 at 2:54 pmHi Nikola

No, the underlying model is a logistic model for individual chances. Simulation is then used to get a distribution of match scores, and the MPR is derived from that distribution.