2018 marks the tenth anniversary of my involvement in football analytics.

This has prompted me to crystallise a few thoughts on the role of analytics and data science in football.  In particular, I want to talk about how these tools should be integrated into a club’s football operations.

The Growth of Data

Since I first started analysing player performance data at Chelsea FC, advances in technology have dramatically increased the volume and range of performance data available to clubs.

Figure 1 shows the amount of data I was using at three time points. Of course, these time periods would be different for different clubs, but I would guess the progression is fairly typical.  The diagrams represent one season’s data, and are to scale.

Figure 1. Growth of Data 2008-2018

When I started in 2008, we had the OPTA F9 spreadsheet; this is a simple tally of each player’s actions in a match, together with some contextual information like his position in the formation. There was no xy data and no time stamping.  About 5 Mb was sufficient to store the results for all 380 matches in a Premier League season.  We had of course data for many leagues, and the club generated its own internal data, but basically that was it.  We had Prozone as well, but only the summary stats, and we had no access to the raw data.

Around 2011 I started looking at the OPTA log file.  This was a step change in data volume and detail.  I suspect some may have cottoned on to this earlier than we did at Chelsea.  Here each event is recorded individually, together with its xy pitch position and a time stamp, and a number of descriptors called qualifiers detailing for example the player’s foot, and other information about the event.  In a compact .csv form, a season’s worth of data ran to about 200MB.

Then in 2016 I got access to some TRACAB player tracking data; once again this represented a step change in volume and complexity.  At a resolution of 25HZ, a season of data could be stored in about 40-50 GB. The introduction of player tracking data took football into the realm of (fairly) Big Data for the first time.

This is of course a somewhat simplified picture of the progression, and omits the sports-science data generated from tracking devices and wearables and so on, but I think it is fairly representative of the increase in volumes of performance data that clubs could potentially take advantage of.

The key question for me is this: Has the commitment to analytics in football kept pace with the increase in information available?  My answer would be No. Of course there has been an increasing acceptance of analytics by clubs over the past nine or ten years. But I’m not sure how deep that acceptance goes.  Although some clubs are enthusiastic about the potentialities of data to inform decision-making, I have a suspicion that others are still rather uncertain, and while they may nominally claim to be “doing analytics” the real impact on their decision-making is rather limited.

The Four I’s of Analytics

To explain the role of analytics in the decision-making process, it’s convenient to think in terms of a four-stage process like the one depicted in Figure 2. I call it the Four I’s of Analytics.  It’s not a new concept, you can find variations on this basic four-stage model all over the internet – but mine stands out due to its cool alliteration!

Figure 2. The Four I’s of Analytics

The Four I’s are as follows.

Information. The first I is the generation of information, or data gathering. Information is raw or descriptive data, like a tally or an average – it describes what happened.  For example “In your last ten matches, you had 22 shots; 10 were off-target.”  Well that is information, and no doubt good to know, but what does it mean in practice?  This question leads naturally to the second I.

Intelligence. The second I, Intelligence, puts the information from the first I in context.  Is getting 10 out of 22 shots on target good or bad? How does it compare with other players in your position? How does it compare with what you did last season?  Are you getting better or worse? This is where the information gathered in stage 1 acquires real meaning.

Insight. The next stage is the emergence of understanding or Insight, the third I. Insight tells you why something happened.  For example: “You are taking more shots with your weaker foot. That’s why your attempts seem a bit more wayward than they used to be.”  This stage holds many traps for the unwary (such as those unfamiliar with multivariate statistics). It is vitally important for a data scientist to consider alternative possibilities and avoid jumping to conclusions.  Sometimes things that look like strong evidence need a second look.

For example, we might find that the probability of conceding a goal decreases when there are more defenders between the shot location and the goal.  This is a real effect, and a strong one, but to understand it fully, we need to know that the number of defenders is negatively correlated with distance from goal.  So shots involving fewer defenders are taken from further out.  How much of our defender effect is due to distance?  All of it? Actually no, but we need to check it out or we risk drawing a misleading conclusion.

At this stage we may have put in a lot of work and thought to interpret the intelligence we have acquired, but still nothing has actually happened.  If there is going to be a payoff for our efforts, it has to come from the fourth I.

Impact. Impact is the fourth I and it is where change happens. For example, in the case we are considering, when presented with the output of the third I, the coach may decide to work with the player on his shooting or positioning, or he may decide to reassure him that things will turn around of themselves.  The key point is that while the first three I‘s are in the realm of data science, the fourth I belongs to the football decision-makers in the club.  If you want to know whether  a club is really committed to analytics, look at the fourth I. However much they may be spending on the first three I’s and however sophisticated their data science, if they do not execute the fourth I, there is no real commitment to analytics.

Actually I‘ve drawn the four stages in a circle to suggest that the step following the fourth I is to return to the first I and collect more information to see if the Impact had any - umm - impact.  And if the circle of activities can be closed in this way, we will have a process of continuous improvement.  However, analytics by itself cannot close the circle. How and whether the Insight is turned into any kind of meaningful action depends on the footballing decision-makers.

Integrating Analytics into the Decision-making Process

Brentford, under its remarkable and insightful owner Matthew Benham, has pioneered the serious use of analytics in football. But in 2015 there was a widely-reported disagreement between the manager Mark Warburton and the club was about recruitment decision-making. It wasn’t that Warburton was data-averse or statistically illiterate. On the contrary, before his involvement in football management Warburton had been a successful currency trader at banks like RBS, AIG and the Bank of America.  Seemingly the problem was that Warburton wanted a veto on recruitment decisions - but the club were not prepared to give him one. As he put it “I think the manager has to pick the team and have the final say …”

In my opinion, Warburton was right to insist on his veto. Of course a sensible manager will listen to the club’s analysts and scouts, and also to his assistants, the sports scientists and other specialist staff, and even senior players, and he should give all their views serious consideration.  But in the end, the responsibility for the final decision rests with him.  Many successful business leaders say their success is based on hiring the best people they can, and then trusting them to get on with things and deliver. Whatever the data science (or anyone else) might say, the final decision must rest with the manager.  A spectacular example of what happens when this principle is ignored was Chelsea’s eye-watering acquisition of Fernando Torres, a transaction demanded by the club owner Roman Abramovich which turned out to be a major financial and footballing error.

Analytics is a decision support system

All that I have written so far is a prelude to the key idea I want to communicate: Analytics is a decision support system.

The only reason data science exists as a function at all is to help the manager or coach do his job. The role of the analyst or data scientist is to support the footballing operations of the club by providing insights relevant to decision-making.  In my opinion, the role of the data scientist in actually taking decisions should be precisely zero.  Nil. None. The analyst should certainly have an input - he should even submit recommendations, use persuasion, or suggest alternatives, but there should be no concept of the analyst having a “vote.”

Why is this so important?  There are three reasons.  One is that almost by definition, data scientists are unlikely to have sufficient knowledge or experience at the sharp end to understand the full set of decision variables in the way the team manager or head coach can.  It takes a long time to become good at data science, and a long time to become good at football management, and it would be a rare individual who could master both domains.

The second reason is that the data scientist and the manager and the footballing operations staff all have different objectives.  It is the data scientist’s job to provide insights based on sound statistical reasoning and investigation.  As an analogy, think about a computer engineer. He is there to support the business objectives of the company. He does not decide what those objectives are.  If the company decides to go into a new line of business, the computer department will be asked to develop a new system or evaluate and an off-the-shelf package that will allow the company to trade.  The task of the programmers and systems analysts in the computer department is to provide a robust and cost-efficient system to support the new line of business; they need to understand the company’s business objectives, but they are not there to tell the management team what those objectives should be, or how to run the business.

In the same way, it is not within the remit the data scientist to argue about the style of football a manager wants to play, how he rotates his squad, or who should be picked on Saturday, although he can certainly have valuable insights to contribute, and a wise manager would be well advised to listen to what he has to say.

But the third reason is perhaps the most important, and it is the human factor.  The key functional area for analytics nowadays is recruitment and retention: and this is where real conflict can arise if the wrong processes are in place. A manager has to develop his team, plan his tactics and select his starting eleven.  To do so he must have full confidence in his players.  If a player is forced on him, he will not feel that he “owns” the team or that he is fully responsible for its performance.  And however much he tries to conceal it, in the high-pressure environment of elite football that attitude inevitably seeps through to the players and undermines team confidence.

This has implications for the way recruitment is handled. Many clubs nowadays have a transfer committee, whose role is to identify and acquire new players and dispose of unwanted ones.  This is a great idea, because it encourages consideration of a broad range of factors including the data science.  But although a committee is a good structure, unless the right decision-making process is also in place it will not deliver good results.  And what the process needs to recognize is a stark difference in accountability between the manager and the committee members.  It is usually the manager, not the committee, that gets sacked for a poor run of form.  Until the committee members are held equally accountable, there can be no justification for them having any kind of absolute decision-making powers or overriding the man whose own job is dependent on results.

I don’t know much about the management dynamics of football clubs, but I do wonder whether the high turnover of managers in the Premier League can be attributed to a failure to align decision-making roles and accountability.  And that confusion might explain why some clubs are reluctant to embrace analytics.  Firmly establishing analytics in a decision-support role might erode some of that reluctance.