Predicting the Premiership: Is there Wisdom in the Crowd?


The “Wisdom of the Crowd” was discovered by Sir Francis Galton over a hundred years ago.  Galton tabulated the guesses of around 800 people who entered a country fair competition to estimate the weight of an ox;  although the individual guesses varied widely, the crowd average was correct to within 0.8% of the true weight.  A crowd of inexpert judges had made an expert judgement.

In more modern times, the ability of a crowd to outperform the individuals in it, and sometimes even outperform experts, was given a new prominence by the publication of James Surowiecki’s book The Wisdom of Crowds in 2004.

In this post I’m going to see if crowds can predict the end-of-season Premier League table.  Well, not exactly crowds as such.  I’m going to use Simon Gleave’s fascinating dataset of EPL predictions.  Every year Simon collects a bunch of pre-season predictions from media pundits, statistical modellers and fans, and at the end of the season the predicted league positions are compared with the final league standings. If you haven’t seen it, do head over to Simon’s blog where you can find the results for the 2014/15 season.  Simon’s publicly available dataset which I’m going to analyse here contains 54 pre-season predictions; 27 from modellers, 18 from media/journalists, and 9 from fans. (OK I know it’s a pretty small crowd.) It’s the same dataset I used in a previous post.

Measuring Prediction Error

As our participants aren’t trying to guess the weight of an ox, how do we measure how good their predictions are?  The most commonly used of predictive power is the RMSE or to give it its full title the Root Mean Square Error  of Prediction. Despite its ominously complicated name, the RMSE is remarkably simple to calculate.  For each team in the league, we subtract its actual position from its predicted position and square each of the twenty differences. We add up all the squared differences, divide by twenty (because that’s the number of teams) and take the square root. Bingo.  All we need to remember is that the RMSE is a measure of prediction error, so a small RMSE means a little error and a good prediction, and a large RMSE means a large error and a poor prediction.

In this dataset, the RMSE’s of the 54 individual participants ranged between 2.3 and 4.7 and the average was 3.5.

Average Prediction Error and the Wisdom of Groups

So how do media journalists, modellers and fans compare?  And are any of these groups wiser than their members?

First of all, it’s important to understand how the ‘wisdom’ of a crowd is calculated.  The wisdom of the crowd is the prediction error of the average estimate - as different from the average of the prediction errors. To see this, let’s look at a toy example. Imagine a crowd of two predicting Liverpool’s final league position  (Last season Liverpool actually finished 6th).  Suppose one person guesses 4th and the other person guesses 8th. If you rattle a bit at the arithmetic, you’ll find they both have an RMSE of 4. So the average RMSE for this crowd is 4.  But that’s not the wisdom of the crowd.  To calculate the wisdom, imagine that the crowd members get together and agree to average their individual estimates, and then we calculate the RMSE of their consensus prediction. In this case, the average predicted finishing position is (4+8)/2, which means the consensus of the crowd is a 6th placed finish -which is exactly where Liverpool did finish. So the RMSE of the crowd is 0.   In this particular case, the crowd outperforms all its members.  Of course there is no guarantee this will happen.  If everyone in the crowd is biased in the same direction, the crowd will perform no better than the average individual.

Armed with this method we can explore the wisdom of the Premier League pundits.

The Wisdom of The Premier League Pundits

First, I’ll look at the wisdom of the three crowds of pundits separately. Prediction errors for the 54 participants in the dataset are plotted in the chart below, split into three ‘sub-crowds’ by type of pundit.  The vertical axis is plotted in reverse, so that the best predictions appear towards the top of the chart.

The central line in each box is the mean of the sub-crowds (the averaged prediction errors of the sub-crowd members), and to give a sense of the spread of the data, the box boundaries are one standard deviation away in either direction.  We can see that the statistical modellers have the lowest (best) average prediction errors, with the media guys close behind, and the fans bringing up the rear.

But how wise are these different crowds?  A crowd’s wisdom is the RMSE for its consensus prediction, and a crowd is considered wise if its consensus prediction outperforms most of its members.  Looking at the wisdom line markers on the chart, we can see that each crowd is substantially wiser (has a lower prediction error) than the vast majority of its members. In fact, only three individuals in each group outperform the wisdom of their crowd. The numbers are:

Average of Prediction Errors Crowd Wisdom
Fans  3.90  3.33
Media  3.51  2.91
Modellers  3.39  2.68

But we can do more.  Suppose we combine all three groups into a single crowd of 54.  This time we get a crowd wisdom of 2.66.  It’s just slightly better than the modellers on their own.  In this case the fans seem to be the limiting factor in performance.  Perhaps not surprising - it is a tiny crowd, from an even tinier sample of clubs, and so not very representative. So what about combining just the media and the modellers?

The  chart below shows what happens.

The wisdom of the combined crowd now drops to 2.56. We can see from the chart that this crowd is wiser than all except three of its 45 members. In other words, the crowd as a whole outperforms 93% of its members.

An even more interesting fact is that the wisdom of the combined crowd of media pundits and modellers is wiser than either crowd on its own.  It may seem counter-intuitive, but by combining the differing perspectives of the media and the modellers, we can produce a better Premier League prediction than either group could manage by itself. This is the power of the wisdom of crowds.

Can a Dummy Out-perform the Experts?

Finally, there is another prediction we can explore, one  which contains no expertise at all;  teams will finish in the same position they finished last year. This naive prediction turns out to be surprisingly good.  It has an RMSE of 2.79 beating all but seven of the predictions in Simon’s dataset. In fact  Roger Pielke Jr. has even suggested that someone who knows nothing about football could outperform most of the experts simply by guessing the same finishing positions as last year.

But it’s dangerous to generalize from the results of a single season, and a closer look indicates the experts might be smarter than they look.  Historical data suggests that last year was something of an exception, because the correlation with the previous year’s positions was rather higher than normal ( .88 compared to a 20 year average of .68).  And correspondingly, last year’s RMSE of 2.79 was considerably lower than the 20-year average of 4.53. Take the 2013-14 season as an example; the RMSE for the naive prediction was 4.75,  and 22 of the 28 predictions Simon collected for that season did better than that.

Even this year, when the naive prediction seems to have got lucky, it failed to beat either the modeller crowd on its own (RMSE 2.68) or the combined media and modellers (RMSE 2.56).  Maybe the these guys aren’t so clueless after all.

So what do I think will happen this season?    Like Yogi Berra, I don’t make predictions, especially about the future.  But I’m prepared to stick my neck out this time and predict that once again,  in the words of James Surowiecki,  we’ll find that “the many are smarter than the few”.