240 technical marks + 160 artistic merit marks
= 400 marks total.
It meant each rider had a total of 7 × 400 = 2,800 points to play for.
There is an element of subjectivity when assessing how good a horse’s performance has been, so it is hardly surprising that the judges don’t all give the same mark. For a particular technical movement, one judge might give it an ‘8’, while another spots a slightly dropped shoulder and reckons it’s a ‘7’. In fact, in Dujardin’s case, the judges’ total marks ranged from 355 to 370, and when added together she got a total of 2,522.5 points out of a maximum possible 2,800.
And this is where the percentage comes in, because her 2,522.5 total was then divided by 2,800 to give a score out of one hundred, a percentage:
2,522.5 ÷ 2,800 = 90.089%.4
Well, actually that’s not the exact number. It was really 90.089285714285714 … %.
Indeed, this number never stops, the pattern 285714 repeats for ever. This is what happens when you divide a number by a multiple of 7. So Dujardin’s score had to be rounded, and the authorities who were responsible for the scoring system decided to round scores to three decimal places.
What would have happened if Dujardin had been awarded half a mark less by the judges? She would have scored:
2,522 ÷ 2,800 = 90.071%.
In other words, the precision of her actual score of 90.089 was misleading. It wasn’t possible to score any other mark between 90.089% and 90.071%, Dujardin didn’t give it an extra 0.001%, but rather she gave it that extra 0.018%. Quoting her score to two decimal places (i.e. 90.09%) was enough.
The second decimal place is needed to guarantee that two contestants with different marks don’t end up with the same percentage, but it still gives a misleading sense of the accuracy of the scoring. In reality, each judge ‘measures’ the same performance differently. A half-mark disagreement in the artistic score (which is then multiplied by 4, remember) shifts the overall mark by 0.072%. And the actual discrepancies between the judges were bigger than that. For ‘Harmony between horse and rider’ one judge marked her 8 out of 10 while another gave her 9.5 out of 10.
A NUMBER IS ONLY AS STRONG AS ITS WEAKEST LINK
There’s a time and a place for quoting numbers to several decimal places, but dressage, and other sports in which the marking is subjective, isn’t one of them.
By using this scoring system, the judges were leaving us to assume that we were witnessing scoring of a precision equivalent to measuring a bookshelf to the nearest millimetre. Yet the tool they were using to measure that metaphorical bookshelf was a ruler that measured in 10-centimetre intervals. And it was worse than that, because it’s almost as if the judges each had different rulers and, on another day, that very same performance might have scored anywhere between, say, 89% and 92%. It was a score with potential for a lot of variability – more of which in the next section.
All of this reveals an important principle when looking at statistical measurements of any type. In the same way that a chain is only as strong as its weakest link, a statistic is only as reliable as its most unreliable component. That dinosaur skeleton’s age of 69 million years and 22 days was made up of two components: one was accurate to the nearest million years, the other to the nearest day. Needless to say, the 22 days are irrelevant.
BODY TEMPERATURE A BIT LOW? BLAME IT ON SPURIOUS PRECISION
In 1871, a German physician by the name of Carl Reinhold Wunderlich published a ground-breaking report on his research into human body temperature. The main finding that he wanted to publicise was that the average person’s body temperature is 98.6 degrees Fahrenheit, though this figure will vary quite a bit from person to person.
The figure of 98.6 °F has become gospel,5 the benchmark body temperature that parents have used ever since when checking if an unwell child has some sort of fever.
Except it turns out that Wunderlich didn’t publish the figure 98.6 °F. He was working in Celsius, and the figure he published was 37 °C, a rounded number, which he qualified by saying that it can vary by up to half a degree, depending on the individual and on where the temperature is taken (armpit or, ahem, orifice).
The figure 98.6 came from the translation of Wunderlich’s report into English. At the time, Fahrenheit was the commonly used scale in Britain. To convert 37 °C to Fahrenheit, you multiply by 9, divide by 5 and add 32; i.e. 37 °C converts to 98.6 °F. So the English translation – which reached a far bigger audience than the German original, gave the figure 98.6 °F as the human norm. Technically, they were right to do this, but the decimal place created a misleading impression. If Wunderlich had quoted the temperature as 37.0 °C, it would have been reasonable to quote this as 98.6 °F, but Wunderlich deliberately didn’t quote his rough figure to the decimal place. For a figure that can vary by nearly a whole degree between healthy individuals, 98.6 °F was (and is) spurious precision. And in any case, a study in 2015 using modern, more accurate thermometers, found that we’ve been getting it wrong all these years, and that the average human temperature is 98.2 °F, not 98.6 °F.
VARIABILITY
In the General Election of May 2017, there was a shock result in London’s Kensington constituency. The sitting MP was a Conservative with a healthy majority, but in the small hours of the Friday, news came through that the result was too close to call, and there was going to be a recount. Hours later, it was announced that there needed to be a second recount. And then, when even that failed to resolve the result, the staff were given a few hours to get some sleep, and then returned for a third recount the following day.
Finally, the returning officer was able to confirm the result: Labour’s Emma Dent Coad had defeated Victoria Borwick of the Conservatives.
The margin, however, was tiny. Coad won by just 20 votes, with 16,333 to Borwick’s 16,313.
You might expect that if there is one number of which we can be certain, down to the very last digit, it is the number we get when we have counted something.
Yet the truth is that even something as basic as counting the number of votes is prone to error. The person doing the counting might inadvertently pick up two voting slips that are stuck together. Or when they are getting tired, they might make a slip and count 28, 29, 40, 41 … Or they might reject a voting slip that another counter would have accepted, because they reckon that marks have been made against more than one candidate.
As a rule of thumb, some election officials reckon that manual counts can only be relied on within a margin of about 1 in 5,000 (or 0.02%), so with a vote like the one in Kensington, the result of one count might vary by as many as 10 votes when you do a recount.6
And while each recount will typically produce a slightly different result, there is no guarantee which of these counts is actually the correct figure – if there is a correct figure at all. (In the famously tight US Election of 2000, the result in Florida came down to a ruling on whether voting cards that hadn’t been fully punched through, and had a hanging ‘chad’, counted as legitimate votes or not.)
Re-counting typically stops when it is becoming clear that the error in the count isn’t big enough to affect the result, so the tighter the result, the more recounts there will be. There have twice been UK General Election votes that have had seven recounts, both of them in the 1960s, when the final result was a majority below 10.
All this shows that when it is announced that a candidate such as Coad has received 16,333 votes, it should really be expressed as something vaguer: ‘Almost certainly in the range 16,328 to 16,338’ (or in shorthand, 16,333 ± 5).
If we can’t even trust something as easy to nail down as the number of votes made on physical slips of paper, what hope is there for accurately counting other things that are more fluid?
In 2018, the two Carolina states in the USA were hit by Hurricane Florence, a massive storm that deposited as much as 50 inches of rain in some places. Among the chaos, a vast number of homes lost power for several days. On 18 September, CNN gave this update:
511,000—this was the number of customers without power Monday morning—according to the US Energy Information Administration. Of those, 486,000 were in North Carolina, 15,000 in South Carolina and 15,000 in Virginia. By late Monday, however, the number [of customers without power] in North Carolina had dropped to 342,884.
For most of that short report, numbers were being quoted in thousands. But suddenly, at the end, we were told that the number without power had dropped to 342,884. Even if that number were true, it could only have been true for a period of a few seconds when the figures were collated, because the number of customers without power was changing constantly.
And even the 486,000 figure that was quoted for North Carolina on the Monday morning was a little suspicious – here we had a number being quoted to three significant figures, while the two other states were being quoted as 15,000 – both of which looked suspiciously like they’d been rounded to the nearest 5,000. This is confirmed if you add up the numbers: 15,000 + 15,000 + 486,000 = 516,000, which is 5,000 higher than the total of 511,000 quoted at the start of the story.
So when quoting these figures, there is a choice. They should either be given as a range (‘somewhere between 300,000 and 350,000’) or they should be brutally rounded to just a single significant figure and the qualifying word ‘roughly’ (so, ‘roughly 500,000’). This makes it clear that these are not definitive numbers that could be reproduced if there was a recount.
And, indeed, there are times when even saying ‘roughly’ isn’t enough.
Every month, the Office for National Statistics publishes the latest UK unemployment figures. Of course this is always newsworthy – a move up or down in unemployment is a good indicator of how the economy is doing, and everyone can relate to it. In September 2018, the Office announced that UK unemployment had fallen by 55,000 from the previous month to 1,360,000.
The problem, however, is that the figures published aren’t very reliable – and the ONS know this. When they announced those unemployment figures in 2018, they also added the detail that they had 95% confidence that this figure was correct to within 69,000. In other words, unemployment had fallen by 55,000 plus or minus 69,000. This means unemployment might actually have gone down by as many as 124,000, or it might have gone up by as many as 14,000. And, of course, if the latter turned out to be the correct figure, it would have been a completely different news story.
When the margin of error is larger than the figure you are quoting, there’s barely any justification in quoting the statistic at all, let alone to more than one significant figure. The best they can say is: ‘Unemployment probably fell slightly last month, perhaps by about 50,000.’
It’s another example of how a rounded, less precise figure often gives a fairer impression of the true situation than a precise figure would.
SENSITIVITY
We’ve already seen that the statistics should really carry an indication of how much of a margin of error we should attach to them.
An understanding of the margins of error is even more important when it comes to making predictions and forecasts.
Many of the numbers quoted in the news are predictions: house prices next year, tomorrow’s rainfall, the Chancellor’s forecast of economic growth, the number of people who will be travelling by train … all of these are numbers that have come from somebody feeding numbers into a spreadsheet (or something more advanced) to represent this mathematically, in what is usually known as a mathematical model of the future.
In any model like this, there will be ‘inputs’ (such as prices, number of customers) and ‘outputs’ that are the things you want to predict (profits, for example).
But sometimes a small change in one input variable can have a surprisingly large effect on the number that comes out at the far end.
The link between the price of something and the profit it makes is a good example of this.
Imagine that last year you ran a face-painting stall for three hours at a fair. You paid £50 for the hire of the stand, but the cost of materials was almost zero. You charged £5 to paint a face, and you can paint a face in 15 minutes, so you did 12 faces in your three hours, and made:
£60 income – £50 costs = £10 profit.
There was a long queue last year and you were unable to meet the demand, so this year you increase your charge from £5 to £6. That’s an increase of 20%. Your revenue this year is £6 × 12 = £72, and your profit climbs to:
£72 income – £50 costs = £22 profit.
So, a 20% increase in price means that your profit has more than doubled. In other words, your profit is extremely sensitive to the price. Small percentage increases in the price lead to much larger percentage increases in the profit.
It’s a simplistic example, but it shows that increasing one thing by 10% doesn’t mean that everything else increases by 10% as a result.7
EXPONENTIAL GROWTH
There are some situations when a small change in the value assigned to one of the ‘inputs’ has an effect that grows dramatically as time elapses.
Take chickenpox, for example. It’s an unpleasant disease but rarely a dangerous one so long as you get it when you are young. Most children catch chickenpox at some point unless they have been vaccinated against it, because it is highly infectious. A child infected with chickenpox might typically pass it on to 10 other children during the contagious phase, and those newly infected children might themselves infect 10 more children, meaning there are now 100 cases. If those hundred infected children pass it on to 10 children each, within weeks the original child has infected 1,000 others.
In their early stages, infections spread ‘exponentially’. There is some sophisticated maths that is used to model this, but to illustrate the point let’s pretend that in its early stages, chickenpox just spreads in discrete batches of 10 infections passed on at the end of each week. In other words:
N = 10T,
where N is the number of people infected and T is the number of infection periods (weeks) so far.
After one week: N = 101 = 10.
After two weeks: N = 102 = 100.
After three weeks: N = 103 = 1,000,
and so on.
What if we increase the rate of infection by 20% to N = 12, so that now each child infects 12 others instead of 10? (Such an increase might happen if children are in bigger classes in school or have more playdates, for example.)
After one week, the number of children infected is 12 rather than 10, just a 20% increase. However, after three weeks, N = 123 = 1,728, which is heading towards double what it was for N = 10 at this stage. And this margin continues to grow as time goes on.
CLIMATE CHANGE AND COMPLEXITY
Sometimes the relationship between the numbers you feed into a model and the forecasts that come out are not so direct. There are many situations where the factors involved are inter-connected and extremely complex.
Climate change is perhaps the most important of these. Across the world, there are scientists attempting to model the impact that rising temperatures will have on sea levels, climate, harvests and animal populations. There is an overwhelming consensus that (unless human behaviour changes) global temperatures will rise, but the mathematical models produce a wide range of possible outcomes depending on how you set the assumptions. Despite overall warming, winters in some countries might become colder. Harvests may increase or decrease. The overall impact could be relatively benign or catastrophic. We can guess, we can use our judgement, but we can’t be certain.
In 1952, the science-fiction author Raymond Bradbury wrote a short story called ‘A Sound of Thunder’ in which a time-traveller transported back to the time of the dinosaurs accidentally kills a tiny butterfly, and this apparently innocuous incident has knock-on effects that turn out to have changed the modern world they return to. A couple of decades later, the mathematician Edward Lorenz is thought to have been referencing this story when he coined the phrase ‘the butterfly effect’ as a way to describe the unpredictable and potentially massive impact that small changes in the starting situation can have on what follows.
These butterfly effects are everywhere, and they make confident long-term predictions of any kind of climate change (including political and economic climate) extremely difficult.
MAD COWS AND MAD FORECASTS
In 1995, Stephen Churchill, a 19-year-old from Wiltshire, became the first person to die from Variant Creutzfeldt–Jakob disease (or vCJD). This horrific illness, a rapidly progressing degeneration of the brain, was related to BSE, more commonly known as ‘Mad Cow Disease’, and caused by eating contaminated beef.
As more victims of vCJD emerged over the following months, health scientists began to make forecasts about how big this epidemic would become. At a minimum, they reckoned there would be at least 100 victims. But, at worst, they predicted as many as 500,000 might die – a number of truly nightmare proportions. 8
Nearly 25 years on, we are now able to see how the forecasters did. The good news is that their prediction was right – the number of victims was indeed between 100 and 500,000. But this is hardly surprising, given how far apart the goalposts were.
The actual number believed to have died from vCJD is about 250, towards the very bottom end of the forecasts, and about 2,000 times smaller than the upper bound of the prediction.
But why was the predicted range so massive? The reason is that, when the disease was first identified, scientists could make a reasonable guess as to how many people might have eaten contaminated burgers, but they had no idea what proportion of the public was vulnerable to the damaged proteins (known as prions). Nor did they know how long the incubation period was. The worst-case scenario was that the disease would ultimately affect everyone exposed to it – and that we hadn’t seen the full effect because it might be 10 years before the first symptoms appeared. The reality turned out to be that most people were resistant, even if they were carrying the damaged prion.
It’s an interesting case study in how statistical forecasts are only as good as their weakest input. You might know certain details precisely (such as the number of cows diagnosed with BSE), but if the rate of infection could be anywhere between 0.01% and 100%, your predictions will be no more accurate than that factor of 10,000.
At least nobody (that I’m aware of) attempted to predict a number of victims to more than one significant figure. Even a prediction of ‘370,000’ would have implied a degree of accuracy that was wholly unjustified by the data.
DOES THIS NUMBER MAKE SENSE?
One of the most important skills that back-of-envelope maths can give you is the ability to answer the question: ‘Does this number make sense?’ In this case, the back of the envelope and the calculator can operate in harmony: the calculator does the donkey work in producing a numerical answer, and the back of the envelope is used to check that the number makes logical sense, and wasn’t the result of, say, a slip of the finger and pressing the wrong button.
We are inundated with numbers all the time; in particular, financial calculations, offers, and statistics that are being used to influence our opinions or decisions. The assumption is that we will take these figures at face value, and to a large extent we have to. A politician arguing the case for closing a hospital isn’t going to pause while a journalist works through the numbers, though I would be pleased if more journalists were prepared to do this.
Often it is only after the event that the spurious nature of a statistic emerges.
In 2010, the Conservative Party were in opposition, and wanted to highlight social inequalities that had been created by the policies of the Labour government then in power. In a report called ‘Labour’s Two Nations’, they claimed that in Britain’s most deprived areas ‘54% of girls are likely to fall pregnant before the age of 18’. Perhaps this figure was allowed to slip through because the Conservative policy makers wanted it to be true: if half of the girls on these housing estates really were getting pregnant before leaving school, it painted what they felt was a shocking picture of social breakdown in inner-city Britain.
The truth turned out to be far less dramatic. Somebody had stuck the decimal point in the wrong place. Elsewhere in the report, the correct statistic was quoted, that 54.32 out of every 1,000 women aged 15 to 17 in the 10 most deprived areas had fallen pregnant. Fifty-four out of 1,000 is 5.4%, not 54%. Perhaps it was the spurious precision of the 54.32’ figure that had confused the report writers.
Other questionable numbers require a little more thought. The National Survey of Sexual Attitudes has been published every 10 years since 1990. It gives an overview of sexual behaviour across Britain.
One statistic that often draws attention when the report is published is the number of sexual partners that the average man and woman has had in their lifetime.
The figures in the first three reports were as follows:
Average (mean) number of opposite-sex partners in lifetime (ages 16–44) Men Women 1990–91 8.6 3.7 1999–2001 12.6 6.5 2010–2012 11.7 7.7The figures appear quite revealing, with a surge in the number of partners during the 1990s, while the early 2000s saw a slight decline for men and an increase for women.
But there is something odd about these numbers. When sexual activity happens between two opposite-sex people, the overall ‘tally’ for all men and women increases by one. Some people will be far more promiscuous than others, but across the whole population, it is an incontravertible fact of life that the total number of male partners for women will be the same as the number of women partners for men. In other words, the two averages ought to be the same.
There are ways you can attempt to explain the difference. For example, perhaps the survey is not truly representative – maybe there is a large group of men who have sex with a small group of women that are not covered in the survey.
However, there is a more likely explanation, which is that somebody is lying. The researchers are relying on individuals’ honesty – and memory – to get these statistics, with no way of checking if the numbers are right.