Journey into Space: Using spatial metrics to compare and cluster football players

/, Recent Posts/Journey into Space: Using spatial metrics to compare and cluster football players

Journey into Space: Using spatial metrics to compare and cluster football players

Tracking data has long been used by football clubs to measure physical activity on the field, such as distance covered, sprints and top speed. In this post I show how TRACAB spatial location data can be visualised and quantified, and used to compare and contrast players.

The data for this post was a sample of five Premier League matches played in 2015.

POINT MAPS

An obvious first step is to map the player locations as points on the pitch. To illustrate, I show three players in the match between Hull City and Manchester United which took place at the KC Stadium on the last day of the 2014-15 season.  Hull fought strongly to avoid relegation, but to no avail, and the match finished goalless.

Figure 1 shows the point maps of three players in that match (whether they were in possession or not):

  1. Dawson the Hull captain and defender
  2. Manchester United’s attacking midfielder Juan Mata
  3. Manchester United’s right back Antonio Valencia

Figure 1. Point maps

1a. Dawson (Hull) 1b. Mata (Man Utd.) 1c. Valencia (Man Utd.)

To avoid overcrowding the plot I show the locations sampled at 2.5 times per second instead of 25 times.  This lets us see the main features of the player’s activity without unnecessary detail.  In all the plots the direction of play is left to right.

We can see that Dawson played centrally in Hull’s half of the field, but he was also active in Manchester United’s penalty area. Mata and Valencia obviously played on the right, but the difference between them is not terribly clear.

Point maps are useful visualisations, but are not especially revealing. In the next section I show how point maps can be quantified and used to develop more insightful visualisations and player metrics.

PLAYER RANGE

I define a player’s “Range” as the area encompassing most of his point map. ‘Most’ can be defined however we wish, for example 80%, or 85% or 90%.  In this briefing I will be using mainly 80%.  Importantly, I don’t restrict the Range to being a single continuous region. As we shall see, for certain players the Range consists of distinct areas or islands.

In the next set of diagrams, I have drawn the computed Ranges on top of the point maps. In each case the Range covers 80% of the player’s locations on the field.

Figure 2. Point Maps and Ranges

2a. Dawson (Hull) 2b. Mata (Man Utd.) 2c. Valencia (Man Utd.)

Here we see that Dawson’s Range consists of two regions; a main region in the defensive area of the field, and a smaller region in the penalty box as well. We can also see the difference between Mata and Valencia more clearly; Mata ranges deeper into the opposition half than Valenica, while Valencia is more active on the right wing and in the centre of his own penalty area.

At this stage, you might be wondering why not just draw heat maps? Heat maps are fine if you want to visualise a single player, but they don’t let you visualise multiple players, and they don’t lend themselves to metrics. I consider these ideas next.

VISUALISING MULTIPLE PLAYERS

If we want to visualise how two or more players work together on the field, we can superimpose their Ranges. The next map shows Liverpool’s back four in their match against Stoke. (this time I have smoothed the Range boundaries to simplify the shapes, and to highlight the player roles I have plotted the 60% Range.)

Figure 3a. Liverpool Back Four (v Stoke)

We can also use  Ranges to map the activity of a whole team as shown in the next figure.

Figure 3b. Hull City (v Manchester United)

Importantly – and this is an advance on current analytics – we can also go further and quantify the Ranges and the extent of their interactions.

QUANTIFYING RANGES

Range Areas

One basic metric we can derive is the Range Area. The table below shows the Range Areas for the players considered so far:

Table 1. Typical Range Areas

Team Player

80% Range Area

(square metres)

Liverpool Back Four Sakho 1255
  Moreno 1317
  Skrtel 1372
  Can 1446
Hull City Dawson 1600
Manchester United Mata 1812
  Valencia 2402

We can also quantify interactions between players. For example:

a)  Player gaps.

We can calculate the percentage of time any pair of players are within a certain distance of each other.  The table for the Liverpool back four is shown below.

Table 2. Percentage of time Liverpool back four are within 10 metres of each other

  Can Skrtel Sakho Moreno
Can 18% 4% 3%
Skrtel 31% 5%
Sakho 35%

b)  Range Overlaps

We can also calculate the degree of overlap between Ranges. Range overlaps for the Liverpool back four are shown below.

Table 3. Liverpool Back 4: Percentage Overlap between Ranges

  Can Skrtel Sakho Moreno
Can 57% 17% 2%
Skrtel 58% 33%
Sakho 66%

COMPARING PLAYERS

Perhaps one of the most powerful applications of the techniques described above is comparing players. The degree of overlap between their Ranges is a measure of player similarity. For instance, the map below compares the Ranges of Vardy and Aguero.

Figure 4. Comparing Players: Vardy versus Aguero

The map shows that Vardy’s Range (1999 square metres) is a little larger than Aguero’s (1734 square metres); the degree of overlap (82%) measures their similarity, and shows they operate in very similar areas of the pitch.

We can also overlay narrower Ranges e.g. 40% to visualize and contrast the “core” locations of each player. The next map shows the comparison for Vardy and Aguero, with the core ranges coloured in.

Figure 5. Comparing Players: Vardy versus Aguero: Including 40% Core Ranges

Vardy’s core Range is 552 sq. m., and Aguero’s is 424 sq. m. The core Ranges overlap by 68%, and we can see that when Aguero goes forward he goes deep into the box in the centre of the goal, while Vardy prefers a bilateral region on the edge of the penalty box.

This kind of analysis could be useful in recruitment or match analysis.

In the same way we could compare a particular player in different matches, or even in different phases of the same match.  When combined with OPTA KPI data, these spatial metrics add further insight to the performance of individual teams or players.

SOME PROPERTIES OF PLAYER RANGES

If the Range is a meaningful concept, we would expect its metrics to show consistent differences across positions. In fact, they do. The table below shows the average Range areas and standard deviations for Goalkeepers, Defenders, Midfielders and Forwards.

Table 4. Average Range Areas for Different Positions

Position Number of Players 80% Range Area (sq. m.) Std dev.
Goalkeeper 10 272 75
Defender 40 1543 228
Midfielder 50 2117 544
Forward 26 2104 479

As expected, Goalkeepers have much smaller Ranges than the outfield players. More interestingly, Defenders have smaller Ranges than Midfielders and Forwards (the difference is statistically significant). However, there are also differences within each position, as shown in the next table.

Table5. Three Smallest and Three Largest Ranges in Each Position

Player Team 80-% Range Area (sq. m)
Defenders Smallest Fonte Southampton 1196
Chester Hull 1224
de Laet Leicester 1236
Largest Zabaleta Man City 1866
Jenkinson West Ham 1927
Janmaat Newcastle 1939
Midfield Smallest Wilson Stoke 1208
Di Marma Man Utd 1281
Lambert Liverpool 1431
Largest Phillips QPR 3103
Mané Southampton 3248
Milner Man City 3841
Forwards Smallest Bony Man City 1350
Hernandez Hull 1370
Aluko Hull 1539
Largest Mahrez Leicester 2895
E. Rivière Newcastle 2912
Long Southampton 3052

There are considerable differences in the Ranges within each position; for instance, the widest-ranging forwards cover an area more than twice as large as the lowest-ranging forwards.  (The significance of this will be the subject of future research.)

USING RANGES TO CLUSTER PLAYERS

Finally, we can use Ranges to derive similarity measures and cluster or classify players. Here we define the similarity between two players as the degree of overlap between their Ranges.  I used a clustering technique called multi-dimensional scaling, although I could have used any clustering algorithm. Multi-dimensional scaling positions players on a 2-D map according to their similarity. Players whose Ranges are similar (i.e. overlap to a considerable extent) appear close together on the map.  Players whose Ranges do not overlap much appear far apart.

The map that emerges represents the clustering of players. In the map below player names are coloured according to their OPTA position.

Figure 7. Cluster Mapping: Players with Similar Ranges appear Close Together

We can see a considerable degree of organisation emerged from the analysis. Goalkeepers tend to cluster towards the bottom left corner of the map, and Forwards towards the top right, and bands of Defenders and Midfielders appear in succession as we move diagonally across the map from bottom left to top right.  Right-sided players appear on the right-hand side of the diagonal, and left-sided players on the left.

This kind of map enables us to see where players fit in. For example, in the matches in our sample, Gerrard, Coutinho and Lambert, who are classified as Midfielders by OPTA, were indistinguishable from Forwards in their use of space. Rooney and Bony are shown as playing deeper than traditional forwards in the matches I examined.

THE BOTTOM LINE

This post has shown that spatial location data can be used to develop cogent and useful metrics for evaluating  and classifying players.  The definition of a simple metric to quantify player locations and their interactions may have considerable potential for scouting and match analysis.

2018-10-22T21:00:10+00:00 October 22nd, 2018|On the Pitch, Recent Posts|4 Comments

4 Comments

    • admin November 21, 2018 at 9:38 pm - Reply

      Hi Keith

      Of course. Thanks for your interest.

      Regards
      Garry

  1. Davide November 7, 2018 at 1:48 pm - Reply

    Hello Garry,

    thank you for sharing this post!
    Have you developed some specific scripts for this analysis? Which toolchain you used?

    Thanks
    Davide

    • admin November 21, 2018 at 9:54 pm - Reply

      Hi Davide

      I scripted the whole thing in R.

      Regards
      Garry

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.