A few weeks ago Paul Riley (aka @footballfactman) put out a tweet which showed goalkeeper rankings in the EPL.

The xG model Paul used to determine the goalkeeper rankings was based on shot location and shot context (regular play, penalty direct free kick etc.). I contended that it should also have included post-shot data, but Paul doubted it would make much difference. I proposed to do an analysis to find out, and the answer is in.

This post briefly explains the methodology and the results.

Data and Modelling

I developed two models. The pre-shot model (Model 0) was meant to be comparable to Paul’s model and used location (shot xy co-ordinates) and contextual features. The post-shot model (Model 1) used location, contextual and placement features (Ball height and distance from goal centre).

The features in each model are shown below:

FeatureNotesPre-shot model featuresPost-shot model features
Big ChanceYY
Shot AngleAngle subtended by ball and goal postsYY
Shot-xx distance from shot location to goal lineYY
Shot-yAbsolute y distance between shot location and pitch mid-lineYY
HeadedYY
AssistIntentional assistYY
Regular PlayYY
Direct FreekickYY
From CornerYY
Fast BreakYY
Set PieceYY
PenaltyAll penalties are Big ChancesYY
Ball heightBall height at goal mouthNY
Ball height squaredNY
Goalmouth-yAbsolute y distance between ball and goal centreNY
Ball height*Goalmouth-yInteraction termNY

I used nine seasons of EPL data (2010 to 2018) with a 70-30 train-test split to develop the pre-shot model.  I used an xGBoost classifier, and optimised the parameters using a grid search.  The MSE for the pre-shot model  in the test set was 0.16 and the AUC was 78%. For the post-shot model, the MSE was 0.13 and the AUC was 87%, a considerable improvement.

Results

Keeper performance was assessed on the 2017 and 2018 seasons.  This was slightly different to Paul, who included data from the 2019 season-to-date, but as I did not have that data to hand, I did not use it. The table below shows the results. The columns labelled “Ratings” are the ratios of the expected to actual goals conceded; a keeper with a ratio above one concedes fewer goals than predicted by the relevant xG model, and a keeper with a ratio below one concedes more.

Rank Preshot ModelRank Postshot ModelRank PRKeeperShots OT FacedGoals ConcededxG Conceded Preshot ModelxG Conceded Postshot ModelRating Preshot ModelRating Postshot ModelRating PR
121Alisson962229271.341.241.38
234David de Gea3107797921.251.201.17
3111Nick Pope1453442431.251.251.05
467Bernd Leno1444149441.191.071.11
542Lukasz Fabianski3881081281231.191.141.18
653Hugo Lloris2476575721.161.101.17
785Martin Dubravka1785864611.111.051.13
898Jack Butland2015963611.071.031.06
91113Tom Heaton982829291.051.021.01
10149Ederson1584749471.041.001.06
111614Sergio Rico1655456531.040.971.00
121318Jordan Pickford3141021051031.031.010.99
13710Mat Ryan3241041071101.031.061.05
141217Wayne Hennessey1946364641.021.020.99
152212Petr Cech1725657521.020.931.01
161815Ben Foster3311091091051.000.971.00
172416Rui Patricio1364242391.000.921.00
181023Kepa Arrizabalaga1213938400.971.020.89
191720Neil Etheridge2076966670.960.970.93
202319Adrian962826260.950.930.94
212125Asmir Begovic27510599990.940.940.88
2215-Thibaut Courtois1063230320.941.00-
231922Alex McCarthy1876762640.930.950.92
242524Jonas Lossl297111101980.910.880.88
252621Kasper Schmeichel2739284790.910.860.92
2628-Fraser Forster1033330270.900.82-
272027Joe Hart2077969750.870.950.84
2827-Heurelho Gomes1014236360.860.85-

The correlation between Paul’s ratings and my pre-shot ratings is .92, which is quite high. The differences seem largely due to differences in the data sample, and small sample sizes. For example, Nick Pope turns out to be high in my rankings and only middling in Paul’s.  Pope didn’t play in the EPL in 2018 due to a serious injury, so my ranking is based only on the 2017 season which isn’t really enough. In addition since returning from injury, he hasn’t yet been able to replicate his 2017 performance.

However, that is not really important for our present purposes. The crucial point is how much my pre-shot and post-shot rankings differ.  The comparison is illustrated in the chart below:

We can see that when post-shot information is included, some keepers rise quite a lot in the rankings and some fall. Kepa for example looks a much better shot-stopper when shot placement is taken into account, while Petr Cech looks significantly worse.

In conclusion, when evaluating goalkeepers, I would always include post-shot information.