A few weeks ago Paul Riley (aka @footballfactman) put out a tweet which showed goalkeeper rankings in the EPL.
Shall we have one using an half-decent sample size? pic.twitter.com/V1JsnpEawX
— Paul Riley (@footballfactman) January 7, 2020
The xG model Paul used to determine the goalkeeper rankings was based on shot location and shot context (regular play, penalty direct free kick etc.). I contended that it should also have included post-shot data, but Paul doubted it would make much difference. I proposed to do an analysis to find out, and the answer is in.
This post briefly explains the methodology and the results.
Data and Modelling
I developed two models. The pre-shot model (Model 0) was meant to be comparable to Paul’s model and used location (shot xy co-ordinates) and contextual features. The post-shot model (Model 1) used location, contextual and placement features (Ball height and distance from goal centre).
The features in each model are shown below:
|Feature||Notes||Pre-shot model features||Post-shot model features|
|Shot Angle||Angle subtended by ball and goal posts||Y||Y|
|Shot-x||x distance from shot location to goal line||Y||Y|
|Shot-y||Absolute y distance between shot location and pitch mid-line||Y||Y|
|Penalty||All penalties are Big Chances||Y||Y|
|Ball height||Ball height at goal mouth||N||Y|
|Ball height squared||N||Y|
|Goalmouth-y||Absolute y distance between ball and goal centre||N||Y|
|Ball height*Goalmouth-y||Interaction term||N||Y|
I used nine seasons of EPL data (2010 to 2018) with a 70-30 train-test split to develop the pre-shot model. I used an xGBoost classifier, and optimised the parameters using a grid search. The MSE for the pre-shot model in the test set was 0.16 and the AUC was 78%. For the post-shot model, the MSE was 0.13 and the AUC was 87%, a considerable improvement.
Keeper performance was assessed on the 2017 and 2018 seasons. This was slightly different to Paul, who included data from the 2019 season-to-date, but as I did not have that data to hand, I did not use it. The table below shows the results. The columns labelled “Ratings” are the ratios of the expected to actual goals conceded; a keeper with a ratio above one concedes fewer goals than predicted by the relevant xG model, and a keeper with a ratio below one concedes more.
|Rank Preshot Model||Rank Postshot Model||Rank PR||Keeper||Shots OT Faced||Goals Conceded||xG Conceded Preshot Model||xG Conceded Postshot Model||Rating Preshot Model||Rating Postshot Model||Rating PR|
|2||3||4||David de Gea||310||77||97||92||1.25||1.20||1.17|
The correlation between Paul’s ratings and my pre-shot ratings is .92, which is quite high. The differences seem largely due to differences in the data sample, and small sample sizes. For example, Nick Pope turns out to be high in my rankings and only middling in Paul’s. Pope didn’t play in the EPL in 2018 due to a serious injury, so my ranking is based only on the 2017 season which isn’t really enough. In addition since returning from injury, he hasn’t yet been able to replicate his 2017 performance.
However, that is not really important for our present purposes. The crucial point is how much my pre-shot and post-shot rankings differ. The comparison is illustrated in the chart below:
We can see that when post-shot information is included, some keepers rise quite a lot in the rankings and some fall. Kepa for example looks a much better shot-stopper when shot placement is taken into account, while Petr Cech looks significantly worse.
In conclusion, when evaluating goalkeepers, I would always include post-shot information.