So a while ago on Twitter, I saw this storify by @KateDaddie, talking about ethnic minority representation in the British media, in the context of this article by Joseph Harker in the British Journalism Review. As I am a notorious stats pedant and practically compulsive mansplainer, my initial reaction was to fire up the Pedantoscope and start nitpicking. On the face of it, it is not difficult to think up Devastating Critiques of the idea of counting “#AllWhiteFrontPages” as an indicator of more or less anything. But if I’ve learned one thing from a working life dealing with numbers (and from reading all those Nassim Taleb and Anthony Stafford Beer books), it’s that the central limit theorem will not be denied, and that simple, robust metrics with a broad-brush correlation to the thing you’re trying to measure are usually better management tools than fragile customised metrics which look like they might in principle be better. Anyway, Kate asked me to come up with a simple probability model to give an idea of what sort of frequency of #AllWhiteFrontPages might be considered odd, and this is the way I went about it.
Calculating a few simple descriptive statistics gives you an idea of what I’m on about. We’re talking about six national newspapers here – the Times, Telegraph, Guardian, Independent, Mail and Express. I’m not including the red-top tabloids or the “i” cut-down version of the Indy, not out of any general prejudice against them as news sources (though I have that too), but because they seem to have a slightly different design grid for front page photos, which I don’t want to deal with for reasons set out below.
To a reasonable approximation, the United Kingdom is a country that is 90% white. So if every newspaper were to take a completely random choice of British people to put on its front page, then you’d expect to see a “clean sweep” of #AllWhiteFrontPages – ie, a day with no non-white people in the main page 1 photo, roughly 0.9 ^ 6 = 0.53, so about half the time. Sometimes the big photo is not of a person at all, of course – allowing 20% for these would say that each newspaper would have a 0.8 x 0.1 = 0.08 chance of a non-white subject, and 0.92 ^ 6 = 0.6. So out of twenty weekdays per month, you would expect to see about 12 clean sweeps. If the average is 12, then the 95% confidence level – ie, the point at which you start saying with a reasonable degree of confidence that the selection is being made from a pool which is not racially representative of the UK – is going to be pretty high . Even a few months of nearly all clean sweeps is not going to refute it. Which starts looking a bit bleak for the #AllWhiteFrontPages count as an indicator.
But for all the papers discussed above, the main splash photo is only one of three or four pictures of people that you get on the front page – the rest of them are illustrating minor news stories or, more usually, feature articles. And of course, the more throws of the coin to take, the less likely you are to get an all-heads distribution. Assuming an average of four pictures per front page, the chance of an “All White Grand Slam” would be 0.92 ^ 24 = 13% (roughly 1 in 7), you would expect somewhat less than three Grand Slams per month, and if you were regularly seeing ten or twelve, you would pretty quickly smell a rat.
At this point, my fellow pedants will be pretty much bursting a blood vessel at the fact that I’m assuming that the front page picture choices are independent. Of course they aren’t – they’re driven by the same news! When Princess Kate has her baby, for example, the chance of #AllWhiteFrontPages is pretty close to 1 (maybe a little bit less than 1, as there’s a chance that the Guardian or Indy would put a global story on the front page just to make a point).
There are all sorts of ways you could choose to deal with this correlation, but to keep it at the level of finger exercises, I’d use a model where a random process called “the news” selects a picture at random and then each newspaper decides to use it (90% chance) or pick its own random draw (10% chance). Your chances of a clean sweep of front pages would then be:
In the case where “the news” picked a white picture, each paper has a 1% chance of picking a nonwhite (10% chance of not going with the flow, 10% chance of picking a nonwhite person if they don’t), so the chance of a clean sweep is (0.99^6)
In the case where “the news” picked a nonwhite picture, each paper has a 91% chance of going with a nonwhite front page, so the chance of a clean sweep all white is (0.09^6)
So allowing for correlation with this model, you would expect to see clean sweeps (0.9 x (0.99^6)) + (0.1 x (0.09^6), which is about 85% of the time. That would mean about 16 clean sweeps per month, which is rather closer to what we actually do see, and would mean that entire months with all white front pages shouldn’t be that uncommon.
But then we also have the minor front page photos, and since these will be driven more by features than news, I wouldn’t expect them to be more than say 30% correlated. By a similar argument, each minor photo position on the grid would show a nonwhite person with probability 7% when the underlying process picked a white person and 37% chance otherwise, giving the probability of a clean sweep for that grid position of
(0.9 x (0.93^6)) + (0.1 x (0.63^6)) = about 0.59.
So, the probability of a “Grand Slam”, assuming a main picture and three minors, would be about
0.85 x (0.59 ^3) = about 17%
That would say that it would be something that would happen by chance roughly once a week, and again, I would be looking at 12-14 on a 30 day rolling count as the level at which I would be saying with a fairly high degree of confidence that the process by which front page news went onto the front pages was one which wasn’t representative of the racial makeup of the country.
And this matters, I think, for the reasons that Joseph Harker sets out in his essay. His example of the Bangladesh flood was really convincing to me – this was clearly a very important world news event, and nearly everything I ever learned about it, I learned from non-traditional news sources (mainly charity fund raisers via my friends). Something has gone wrong somewhere in the process by which the news is shaped.
Kate and her friends are apparently going to be keeping the rolling count of #AllWhiteFrontPages and recording it on their blog, and I’ll be following it with interest. People tend to roll their eyes a bit when business school grads like me start saying things about “management is measurement” and so on, but the fact is that a) if you don’t measure something, how are you going to find out whether it’s changed or not? and b) if you don’t want to find out whether something’s changing or not, in what sense can you actually claim to care about it? Obviously, the wider project of ethnic minority representation in newsrooms isn’t all about all white front pages, but the simple count is actually a pretty important metric in itself.
 the phrase “Devastating Critique” should be read here as if pronounced with heavy sarcasm. The phrase originates, I believe, with notorious “sound science” hack Steven Milloy, and is generally used as prelude to an absurd piece of nitpicking, usually based on caveats included in the paper itself.
 In fact, the assumption that the selection is from the UK population as a whole is both problematic, and overly generous. One of the things we know about UK news media is that they have a strong London bias, and so the relevant population ought to be substantially more nonwhite. It does rather trouble me that a policy of increasing nonwhite representation is de facto a policy of further worsening non-London representation, but I think this is a different debate.
 I am not really interested in calculating precise distributions and 95% confidence intervals, because we are clearly oversimplifying the process here, and a precise parameterisation of an imprecise function is what is known in the literature as “a waste of time”. The rule of thumb that a deviation of three times the mean is not something that can be successfully yaddayaddaed is a good way to do robust statistics.