In a previous post I described a multi-stage model for expected goals (xG) which included post-shot information such as the ball trajectory and goalmouth location.  In this post I want to compare this model to the more traditional type of xG model which only includes pre-shot information.  Does the multi-stage model produce more accurate predictions than a pre-shot only model?

Predicting Goal Ratios

The methodology I use is borrowed from Sander Ijtsma’s neat study reported at 11tegen11, which I encourage you to read for full details. Briefly, SI used Goal Ratio as a measure of team performance:

SI compared the predictive performance of a ratio based on expected goals with - amongst others - metrics based on observed goals and observed shots. In each case, he computed his ratio metrics on a predictor sample of games early in the season, and predicted the goal ratios for the remaining games in the same season. He varied the size of the predictor sample, and found that over most of  the range he expected goals ratio predicted future team performance better than either the observed goals ratio or the total shots ratio.

I use the same kind of idea here.  I had ten seasons data to work with: the 2010 and 2011 seasons from the Italian Serie A, Spanish La Liga, and French Ligue 1, and the 2010 to 2013 seasons from the EPL.  My post-shot model was a three-stage model which included ball trajectory information and the end goal-mouth coordinates as well as location and shot type (assisted, fast-break, free-kick etc.) , while my pre-shot model used ball location and shot type only.  Penalty goals and own goals were excluded from the analysis.

I used a train-test paradigm in which the xG models were developed on one season’s data in a competition, and the predictions were made on the succeeding season’s data.  So for example, I used the 2010 Serie A data to develop  xG models, and then computed the prediction metrics on the Serie A 2011 season.  In this way I predicted data for six seasons - three from the EPL and one from each of the other leagues  - each set of predictions being based on its own model training season.  The results shown here are the averages over the six seasons.

Assessing the Predictive Performance of the Models

I assessed the predictive performance of the models using the Mean Absolute Error (MAE).  This is a “smaller is better” measure of fit calculated by averaging the differences between observed and predicted values.  The smaller the MAE, the more closely the predictions match the data.  To account for decreases in the test sample size as we increase the predictor sample, all MAE numbers are based on a notional 38 games.

I’ll get to Goal Ratios in a moment.  First let’s look at Goals For.  The chart below compares three models; one based on observed goals (goals scored in the predictor sample) and a pre-shot and a post-shot model.  The chart shows how the MAE changes as we increase the size of the predictor sample.

Predicting Goals For

The MAE values for observed goals are considerably larger than the MAE’s for the expected goals models. This means that both xG metrics outperform observed goals as a predictor of goals scored; however, apart from the very beginning of the range, the post-shot metric generally outperforms the pre-shot metric. Adding post-shot information improves the predictions of Goals For. What about Goals Against?

Predicting Goals Against

For Goals Against, the situation is reversed; although the observed goals metric still performs realtively poorly, here the pre-shot metric performs better than the post-shot metric.

With these results in mind, we can now look at Goal Ratio. The results so far suggest that the best predictor of Goal Ratio would be a hybrid model, which uses a post-shot xG model to predict Goals For and a pre-shot xG model to predict Goals Against.   The next chart compares the MAE measures for Goal Ratio, and this time we include the hybrid xG model as well.

Predicting Goal Ratios

All the xG models outperform Goal Ratio predictions based on observed goals, but the hybrid xG model performs best.

How do these results compare to Sander Ijtma’s study?  Instead of using the MAE to measure model fit, Sander used variance explained (R-squared). For comparison with his results, the next chart shows the R-squared values for predictions based on  the three xG metrics and observed goals.

R-squared values for Prediction of Goal Ratios

The R-squared results agree well with Sander’s study. All three xG metrics explain more variance than observed goals at almost every point in the range.  However, the new finding is that a hybrid model does best of all.

A Prediction Asymmetry: Goals For and Goals Against

Including post-shot data in the expected goals model improves the prediction of Goals For.  This makes sense. Post-shot data tells us how well or poorly the ball tends to be struck, which is a characteristic of the strikers in the team.  Aggregating by the scoring team captures this extra information and increases prediction accuracy, as compared to a pre-shot only model.

But there is another interesting comparison we can make.  The MAE chart below shows how the pre- and post- shot models fare in predicting Goals For and Goals Against. Here, I am not so much concerned with comparing the models to each other, but on how each predicts the two types of goal outcome.

The red lines represent the post-shot model.  The lower MAE for Goals For (the solid red line) shows that the post-shot model is substantially better at predicting Goals For than Goals Against.  I am not completely sure why the Goals Against predictions are degraded, but I presume it is partly because post-shot information is not characteristic of the defending team, and the irrelevant information simply adds noise.

Next, comparing at the blue lines (the pre-shot model) we see the exact opposite.  The pre-shot model is noticeably better at predicting Goals Against than Goals For.  This seems to be a real effect; at any rate Michael Caley reported  the same thing, so it seems not to be a quirk of the particular data or the particular model I used.  At the moment I don’t have a good explanation for this effect.  But if we want to build better xG models, understanding the prediction asymmetry between Goals For and Against would seem a good place to start looking.