Although technical scouting has made great strides in the football world in recent years, subjective assessments of players continue to play an important role in recruitment. All clubs rely extensively on the subjective judgements of their scouts, but few have any idea how consistent or trustworthy they are, and even fewer know how to analyse and interpret scout ratings effectively.
In this post I review two important metrics for assessing the quality of scout ratings, reliability and validity.
The Quality of Subjective Ratings
The overall quality of subjective ratings such as the judgements of a scouting team (or indeed a single scout) can be quantified by two metrics: reliability and validity.
Simply put, subjective ratings are said to be “reliable” when they are consistent, and “valid” when they are accurate. The difference between reliability and validity is shown in the diagram below.
The dots or data points represent a set of individual judgements; they could for example represent successive ratings of a player by a single scout, or ratings of a particular player by several different scouts.
The left hand picture illustrates a cluster of reliable but invalid judgements; the data points are close to each other (consistent), but they do not hit the centre of the target. In terms of a scouting team, this represents a situation where all the scouts agree about a player, but they are wrong about him.
The middle picture shows a set of judgements that are valid but unreliable. Although the ratings do cluster around the central target, many ratings are needed to be sure the position of the target is identified correctly. This represents a situation where scouts differ considerably about the true value of a player, but their errors cancel out. This sort of diagram can arise when a player’s performance is very variable. (His inherent ‘ability’ represented by the central target however does not change.)
The right hand picture illustrates a set of judgements that are both reliable and valid. This represents a situation where all the scouts agree about a player and they are right about him.
Obviously, validity is the key to effective scouting and player assessment. However, validity is easier to achieve when reliability is high; when reliability is low, large amounts of data are required, and if the data are completely unreliable, validity is impossible to achieve. For this reason, it is useful to understand the reliability of the data before attempting to assess validity.
Reliabilities are measured on a scale from 0 to 1, where 0 represents complete unreliability and 1 represents perfect reliability. Conventionally, psychologists aim for a reliability of at least 0.7.
We can use reliability assessments to answer practical questions, such as: How many times should a scout watch a player? Is once or twice enough? Or does he need to see ten or twenty performances? I investigated the reliability of ratings in a Premiership club. I found that in order to achieve a reliability of 0.7, a scout should watch and rate an individual player on at least four different occasions.
This finding was used to recommend a scouting model for the club. Four ratings is sufficient to get a stable and reliable indication of a single scout’s opinion. Reliability could be increased by adding further ratings, but this uses more resources and follows the law of diminishing returns. Of course, it is advisable for a player to be watched by more than one scout, but no scout need watch a player more than four times.
Assessing reliability becomes more complex when we extend the metric to teams of scouts; for example, how many different scouts should watch a player in order to get a reliable consensus within the scouting team? Such questions require the use of more advanced analytical techniques like Generalizability Theory. In one set of scouting data I examined, I found that to get the desired level of reliability, at least four different scouts should rate at least eight performances between them. Whether this is a typical finding I don’t know; other scouting teams may need to work to a different recommendation – the results depend on the variability of individual scout ratings and the degree of consensus within the scouting team.
Subjective judgements of a player are said to be ‘valid’ if they agree with some external criterion.
Validity is difficult to quantify in practice, and although there are various ways to do it, none of them is perfect. One option is to correlate subjective ratings with objective measures of performance (this is called ‘predictive’ validity.)
I examined the relationship between scout ratings and objective performance statistics for the first‑team players in a Premiership club. While the relationship was positive and statistically significant (as we would hope), it was overall quite weak. However, there was a substantial difference between player positions. For strikers, central defenders and midfielders, the relationship between subjective ratings and objective performance was moderately strong, indicating quite a healthy degree of validity. But for centre‑backs and full‑backs, the relationship was either non-existent or negative. This was a very interesting finding, which suggested that either the scouts were failing to pick up pertinent information for these positions, or that the objective data was not measuring the things the club was looking for. In the latter case of course, we would conclude that the performance data was not an appropriate external measure for assessing validity.
If assessing validity against objective performance data is not always be desirable or possible, what other approaches are available? One option is to structure scouting reports in terms of specific criteria, for example ‘Defending crosses’, ‘Tackling’ or ‘Finishing skills’. A list of criteria for each position would be defined by the club, and scouts would score players on each of them. (Dan Ashworth, the FA’s director of Elite Development, is a firm advocate of this kind of structured assessment.) In this case, a valid report would be one where the scouts’ ratings agreed with the ratings of acknowledged experts such as the chief scout or head coach.
As the use of objective performance data has gained credibility at the highest levels in football, the treatment of subjective data has lagged somewhat behind.
Subjective judgements deserve the same level of analytical attention that objective measures have received. Tools and techniques exist to evaluate and improve consistency, guide practice, and highlight areas of concern, and clubs can certainly take advantage of them.
As Peter Drucker the management guru said “What gets measured gets managed.” Clubs that develop best practice in measuring and monitoring the judgements of their scouts will be able to identify areas for improvement, and to develop competitive advantage.