Archive: November 2015

Which Team Formations Produce the Most Expected Goals?

Over the last few seasons, teams in the EPL have used a variety of different playing formations.  OPTA currently recognise 16. In the four seasons 2010-2013, EPL teams used 3,819 different positional set-ups, which equates to each team using 1.26 formations per match.

Here’s the list of the OPTA formations used, starting with the most popular in terms of minutes played:

Team Formations used in EPL 2010-2013

FormationMinutes PlayedPercent usage
4231 85,248 29.7%
442 69,976 24.4%
4411 45,572 15.9%
451 30,103 10.5%
433 29,005 10.1%
4141 11,228 3.9%
343 3,179 1.1%
41212 3,194 1.1%
352 2,614 0.9%
541 2,068 0.7%
532 1,572 0.5%
35119140.3%
34125250.2%
34215910.2%
42225060.2%
43212930.1%

The 4213 and 442 formations together account for 54% of play in the EPL.  But are these the best formations for creating good shooting chances?

We can answer that by calculating the expected goals (Xg) produced in the different formations.  Goals scored is a notoriously noisy statistic because of the low numbers involved; using expected goals rather than actual goals removes some of the noise in the data, and gives a more accurate picture of the production of good scoring chances independent of the finishing quality of individual players.

The first step was to calculate the  expected goals from regular play for each of the 3,819 set-up periods. As the prime focus of the analysis was the production of chances, I used a simple pre-shot model based on shot location.

It would be tempting to simply calculate the average expected goals for each formation, but there are reasons why that would be misleading.  First of all,  teams might systematically adopt different formations home and away, so the effect of formation on Xg would be confounded with the effect of venue. Similarly, weaker teams might generally tend to adopt more defensive formations, confounding the effects of formation and overall team strength.

For these reasons, I used a statistical model to control for these other variables.  For the technically minded I used a Gamma distribution which fitted the observed values quite well. The control variables were venue, team and opposition, and an offset term was included to adjust for the  different time periods in each formation.

A preliminary analysis showed that Xg production for the five at the back formations, 534 and 541 were very similar so I combined these into a new formation labelled “5–“.  Similarly,  3511 and 352 were combined into a “35-” category.  I then eliminated the remaining rarely used formations 3412, 3421, 4222, 4321 which together only accounted for less than 1% of playing time.

The next table shows the Xg production rates for the various formations.  442 is the reference formation, and is given a production rate of 100%.

Production Rates of Expected Goals

FormationXg Production
Rate
442100%
41212100.5%
35-91.2%
441192.0% *
34388.3%
423185.0% ***
43383.2% ***
45178.4% ***
414176.5% ***
5--57.2% ***
Significantly different to 442: * p < .05, *** p < .001

The results show there are clear differences in productive efficiency. The only formation outperforming  442 is 41212, but the difference is negligble and not statistically significant.   The next best  formations are 35-, 4411, and 343 which are about 90% as efficient at producing expected goals. But because of their sample sizes, the differences  for 35- and 343 don’t reach statistical significance.  All the other formations however are significantly less productive than 442. The least productive formations are those with five at the back, which produce fewer than 60%  of the expected goals generated when playing 442.

Finally the figure below shows the expected goals per 90 minutes for an average team playing at home against an average opponent under various different formations.

Expected Goals Per 90 Minutes

Expected Goals per 90 minutes

Of course the picture I’ve presented is incomplete because I haven’t looked at goals conceded.  It’s very likely that there are significant differences here as well. I’ll pick this up at a later date, and see which  of the formations fare best on that metric.