Passing is a skill that sits at the very heart of football. But for a long time the only available measure of passing performance was the pass completion rate, in other words, the percentage of successful passes. This is not very satisfactory, because it doesn’t recognise that some passes are more difficult to execute than others; so a player with a high pass completion rate might simply be a player who chooses to make easy passes.
To control for pass difficulty, football analysts have begun to develop passing models. A passing model uses the features of a pass, such as its start and end locations, to predict the probability it will be completed. Observed completion rates can then be compared to predicted completion rates to determine whether a player or team is performing above or below expectation.
Perhaps the most sophisticated passing model to appear in the public domain to date is Will Spearman’s physics-based model, which uses equations of motion to predict ball motion and player intercept trajectories. This approach requires tracking data, because it needs to know the position of every player at the moment the pass is attempted. Other recent models, such as those developed by Will Gurpinar-Morgan and StatsBomb use event data to predict pass completion probabilities.
In this post I will describe a model that uses deep learning to predict pass probabilities from event data. Deep learning is a powerful machine learning technique for finding patterns in data. A deep learning model consists of several layers of nodes connected to form an ‘artificial neural network’. Nodes in the first (input) layer of the network receive inputs from features in the data and transform it, passing the results onto nodes in the next layer. This layer in turn transforms its inputs and passes the results on to the next layer and so on. Learning by example, the network adjusts the strengths of the connections between nodes to give some desired response in the final or output layer of the network. In a binary classification task for example the output layer would be a single node that yields a high number when the input is a member of the positive class and a low number when it is a member of the negative class.
Deep learning models are particularly good at decoding high-dimensional data, where the inputs might contain hundreds or even thousands of features, such as the intensities of each pixel value in an image. Event data however is low-dimensional. A pass can be described by relatively few features - the start and end locations, and some contextual variables such as the previous event. In low-dimensional classification problems, deep learning doesn’t normally have much advantage over more traditional techniques like random forest or extreme boosting, and may in fact perform worse. However, deep learning can catch up and even outperform other methods when there is a lot of data to train the model on. This is why I thought it would be worth exploring a deep learning approach; we have millions of passes to analyse.
I described the dataset I’m going to use in a previous post. I extracted the following features for each pass:
- Pass geometry (Start/end locations, length, angle)
- Previous event type (Six binary variables)
- Previous event success/failure. (1 successful, 0 otherwise)
- Previous event team. (1 if the same as the passing team, and 0 otherwise)
- Kickoff indicators (1 if pass was a kickoff and 0 otherwise; 1 if previous pass was a kickoff and 0 otherwise.)
I also contemplated including higher-level contextual features such as the game state or match time. But while such variables might add predictive power, they seem rather remote from the mechanics of the passing process I want to model, so I decided to leave them out.
I split the data randomly into training, validation and test sets. The training set contained 80% of the data, and the validation and test sets 10% each.
Training the Model
The task of the model was to classify a pass as successful or unsuccessful when presented with the pass features listed above. I used a densely connected neural network, which is the most straightforward type of deep learning model. Neural networks need to be sized and configured for the problem at hand, to prevent both underfitting, when the network fails to learn at all, and overfitting, where the network learns irrelevant features in the data and fails to generalise to new data. Some experimentation was needed to find a satisfactory configuration, but I found a network with two hidden layers (i.e. two layers sandwiched between the input and output layers) worked quite well. Model performance was evaluated with the binary cross-entropy loss function; the lower the loss, the better the classification performance. I trained the model for 200 epochs, or passes of the data.
Figure 1 shows how the model learned during training. The top half of the figure shows the binary cross-entropy loss. The green curve shows the loss in the training data and the blue curve shows the loss in the validation data. The initial sharp drop shows the network learns quite quickly to start with, but after about 40 epochs, learning is very slow or non-existent. The similar levels of loss in the training and validation data indicate the network is neither underfitting nor overfitting.
The bottom half of the figure shows the corresponding improvements network accuracy (percentage of cases classified correctly.) After 200 epochs, the network achieved an accuracy of 88.4 in the training set and 88.3% in the validation set.
Evaluating the Model
How good is our deep learning model?
Figure 2 plots the observed and predicted probabilities of pass completion.
The results show the model is well-calibrated, with observed and predicted probabilities of completion being almost equal at all probability levels.
The model accuracy of 88.3% seems quite high, but of course this needs to be seen against the background of a ‘no-information’ model, where we simply assume every pass is successful. This would give us an accuracy of 81%. So we do have some uplift in accuracy, but its not fantastic. On the other hand, the AUC is .90 which is quite good.
To get an idea of whether its worth going to the trouble of using deep learning, I compare the neural network model with a random forest model. It turns out the neural network does noticeably better in terms of accuracy and AUC. The accuracy of the random forest model was 87.1% and AUC was .87. Figure 3 shows the ROC curves for the two models.
In the previous post I referred to a passing model developed by Will Gurpinar-Morgan. Coincidentally, his random forest model also had an AUC of 0.87 - however, this was based on data excluding passes shorter than 5 m. When I exclude these short passes, the AUC of my random forest model jumps to .89. The improvement over Will’s model can probably be attributed to the additional features in my data.
Expected Passes - or not?
We can treat the predicted completion rate of a pass as an expected completion rate, and in much the same way as we use expected goals as a baseline for measuring shooting performance, we could use expected passes as a baseline to measure passing performance. Teams or players who complete more passes than expected can be classed as superior performers. The next table shows the top five performers (with at least 1000 passes) in each position according to this method. The gain column is simply the actual completion rate minus the expected completion rate.
|Position||Player||Expected Pass Completion (%)||Actual Pass Completion (%)||Gain (%)|
|Defensive Midfielders||Santiago Cazorla||86.0||90.2||4.2|
|John Obi Mikel||87.8||90.8||3.0|
|Attacking Midfielders||Arjen Robben||78.9||81.8||2.8|
As we can see, the gain does a pretty good job of picking out some top-rated players. But the analogy with expected goals ignores an important factor; all goals count the same, but not all passes do. Some passes are ‘valuable’, and increase a team’s goal scoring opportunities, while other passes are comparatively inconsequential. In evaluating passing performance, we need to take pass value into account. I’ll deal with this in my next post.
The Bottom Line
We have seen that given enough data, a deep learning model can outperform a random forest model; with more carefully-crafted features than I used here it is possible that even more accurate models can be developed.