“How to Lie with Statistics” (1954) is a wonderful book on how statistics and numbers, while they give the impression of “objectivity”, can easily be misread, misinterpreted, or flat out “massaged” to manipulate and sway people’s opinions.
- Watch out for missing data: 3 days to cure a cold with X drug is no good if you healed quicker without
- Always ask yourself what biases could go into a survey/statistics/sample
- Ask yourself if the individual presenting the statistics might have second motives
About the Author: Darrell Huff was an American writer and freelance journalist, penning both magazine articles and several books. He is most famous for “How to Lie With Statistics”, and it seems like he knew a thing or two about lying with statistics.
Huff was allegedly paid by the tobacco industry to ridicule the evidence against tobacco smoking and health problems. Basically, Huff was using his own techniques to lie and manipulate. Alex Rheinart explains that Huff was also commissioned to write a follow-up book to defend the tobacco industry and discredit the growing body of evidence against the health effects of smoking.
So let’s learn how to lie with statistics so that we will never fall for it.
The ideal sample is large, measures only one trait while controlling the other, and removes all sorts of biases.
Such a sample is hard to obtain, and most “statistics” instead have all sorts of biases.
How to Lie with Statistics lists a few of them.
The following biases are some of the most common and highly disruptive biases when measuring a sample:
#1. Asking People
People tend to answer what makes them look good. Or with what they wish they’d do, or with what they would like to do.
Not always they will answer instead with what they actually do.
Even when you meet a fella who wants to tell you the truth and nothing but the truth, will he be able to remember exactly what you’re asking for?
Thus if you ask people what they actually like, they will rarely come up with the truth.
#2. Self-Selection Bias
Many studies are flawed by design.
For example, if I send a questionnaire out most people replying will be people who maybe have more time to reply.
If I ring people’s doors to ask a question, it will be people who are more often at home. For example, unemployed people, and people who don’t travel much?.
Even if I stop people in the street I will end up selecting friendlier or more agreeable personalities.
#3. Pleasing the Interviewer
Whenever I ask a question about touchy subjects there might be an in-built bias to please the interviewer.
If I ask for example what do they think of, say, racism, few people will be open and frank about it.
#4. Small Sample Biases
It’s easy to come up with true statistics that mean very little -or that are deceitfully misleading-.
For example, if you wanted to prove that a medication is very effective in curing a cold, you could assemble many small samples. Then you can measure how long it takes to get better in each.
Since the samples are small, normal variation will likely give you a group that, by chance, healed quicker.
And you can publish the result of the group that fared the better
#5. Intervening Factors Biases
Another issue of comparing statistics across countries or times and, in general, in real life, is that of intervening factors that it’s difficult to account for.
For example, one study once linked milk to cancer because countries that consumed more milk had a higher incidence of cancer than countries with low milk consumption.
Darrell Huff explains that countries with more milk consumption were also richer and lived longer. That by itself can explain the higher incidence of cancer. And if you also factor in that cancer is more likely to happen in old age, that could explain all by itself the difference in cancer rates.
No milk withstanding.
Changing of Meaning
Sometimes bureaus change their description or what a word means, or a word comes to acquire a totally different meaning.
So for example when you calculate the number of “farms”, you should adjust for the changes in the census, because what is considered today a “farm” is not the same as it was fifty years ago.
Manipulating the Charts
There are many ways to “massage” your charts:
- Raise the starting point
To show bigger improvements you can make a chart start not at zero, but at the latest number in your data set.
For example, if you want to show an improvement from 20 to 22 and then 23.5 and you start your chart from zero, the improvement will look small.
But if you make the chart start at 20, it will look like a huge change.
- Leverage Volume
If you want to show the increase of something -say, salaries- a fair way would be to use bar charts that only change in height.
If you want to make the increase look bigger though you can use geometrical shapes. Say for example you choose the icon “bag of money” to show the salaries increase.
When you increase the height of the bag of money you don’t just change the height, but you also change the width. The result is that your bag of money will acquire a total area that is 4 times bigger, not twice.
- Don’t Label the Axis
Another trick is to not put measures and labels on a chart.
Make the reader do his own assumption and if the chart looks impressive he will make favorable assumptions.
Proving Without Proving
This one is almost pure marketing.
Say that you cannot prove that your new drug helps cure a cold. But you can prove that it kills thousands of germs. Maybe even more than your previous drugs. Maybe 30% more?
It might not do a thing to help people overcome a cold, but it certainly sounds good.
Or advertisers could say that physicians smoke brand X of cigarettes more than any other cigarettes.
But does that mean anything?
Physicians might not have much insider knowledge of “healthier cigarettes”. As a matter of fact, there is no such thing as healthy tobacco.
When people talk about an increase of 10%, you should always wonder: increase compared to what?
The previous month, the previous week?
And was it an increase across the board or did they pick a figure that increased the most?
Also, people who want to give different impressions can easily switch from percentages to whole numbers depending on which one sounds better.
The percentage drop is even easier to massage.
For example, describing the move from 13% to 10% as “a drop of 3 percentage points” sounds much better than “a decrease of 23%”.
Cherry-Picking the Best Figure
Similarly, you can express the same fact by calling it in many different ways, depending on what your goals are and what sounds more impressive -or less scary-.
- 2% percent return on sales
- 12% return on investment
- 5 million profit
- 35% increase in profits
Correlation Not Causation
And here is the big one, one that I have been criticizing in so many of my book reviews.
Many publications use correlation to imply causation.
Take for example the famous case of “studying leads to higher salary”. While that makes a lot of sense, it’s not so straightforward.
Kids who to go to college are disproportionately bright and rich.
The bright and rich would have likely earned more than non-college degree individuals even without a college degree.
And it’s not true that “the more you study, the more you earn”.
The most dishonest individuals can also look for a correlation and then plug a cause that works well for them.
Politicians do it often. For example, Donald Trump took the crime statistics of a single US city where the crime had increased. And blamed democrats on it.
There are three different types of averages:
- Mean: add all values and divide by the total quantity
- Mode: most common value in a data sample
- Median: value in the middle of the sample
In normal distribution -as it’s often the case for human traits-, the three will be similar to each other.
But in data sets with irregular distributions -for example, income and salaries- they will be very different.
And the biased statisticians can pick the one that most suits his needs.
Or simply use the most common mean when the mean hides a very different reality.
For example, a CEO could say that at company X the salaries went up 40% in the last 10 years. But he wouldn’t say that they went up 1% for doormen and security, 5% for low-level employees and 30% for the C-suite.
How to Critically Approach Statistics
Darrell Huff proposes five questions we can ask ourselves to analyze statistics are not misleading us:
- Who Says So?
First question you should analyze is the source.
Is the source inherently biased? Is it an advertisement, does a researcher want to defend his theory, does a newspaper want to sell more with a sensationalist story?
- How Does He Know?
How did he conduct the research?
Is the sample big enough? Did he send questionnaire, did he stop people in the streets? What could be wrong with that?
- What’s Missing?
What other influences have not been taken into account?
Was there a large control group and does the author mention it? If the author is using percentages, does he also give the raw numbers for a more complete picture?
- Did Somebody Change the Subject?
Changing the subject happens when we move one result to a similar, but actually different environment.
For example, a politician critical of how expensive prisons were for the public said it would cost less to house inmates in a hotel than in Alcatraz.
However, he changed the subjects from “total costs of inmates daily” to “daily charges of a hotel”. Of course, the latter does not include surveillance, food, healthcare, rehabilitation etc.
- Does it Make Sense?
Sometimes statistics can make no sense.
And the fact that we’re dealing with numbers should not lead to a suspension of common sense.
One example from Darrell Huff is the following: Since life expectancy is only 63 years, it’s a sham and a fraud to set the retirement age of 65 because virtually everybody dies before that.”
Even in the 1939-1941 period that wasn’t true and just by looking around one could have realized it was wrong.
Sometimes old truth can be taken for granted even when they’re not true anymore.
A major appliances company right after WWII was planning on making smaller appliances to accommodate the declining birth-rate. Then one of the guys looked around and realized that he and his friend were not planning to have fewer children.
Just looking around would have revealed that wasn’t true, but everyone was taking old statistics as gospel.
The last example is that of “plotting the results ad infinitum”. We see a trend and some poor statisticians will plot the results in the future. But trends rarely continue forever.
I see examples of this all time, and even intelligent authors don’t realize it makes no sense.
For example, when Goleman in “Working With Emotional Intelligence” said that each increase in EQ means an increase of x amount of income. But that makes no sense, because EQ is a bell curve, and you can’t keep adding points forever. Furthermore, adding 1 point when you’re around the middle is very significant. But adding one when you’re already more emotionally intelligent than anyone else will mean little because you’re already at the top.
Proper treatment cures a cold in 7 days. But left to itself it will hang on for a whole week.
Always Critically Read Data
Don’t be ready to believe the numbers “because they’re numbers”. Anything can be bent to fit propaganda or a biased individual.
- Don’t Stop at The Criticism, Even Flawed Statistics Can Be Valuable
Some people start to learn how to apply critical thinking skills but stop at the first level of critical thinking.
Such as, they are able to pinpoint the limitation of this or that piece of evidence
But they don’t realize that even imperfect statistics can add some value compare to not having any statistics at all.
Even a flawed statistics might be a strong warning signal or it may capture an important trend.
For example, the author criticizes the statistics that women who attend college marry less than the national average while men who attend college marry more. Instead of dismissing that result, one should ask if that makes sense and why it’s happening.
- Great Education Against Cheats and Power Movers
The more people read books like this, the fewer the people that are easy to dupe and swindle.
And that will make for a better world to live in.
- Short and To The Point
No fluff, no repetition: I love it.
How to Lie with Statistics is a five stars, must-read book for those who aren’t aware of the pitfalls and “sneaky tricks” that hide behind statistics -ie.: most people-.
I believe that books such as “How to Lie with Statistics” will make for a better world because critical thinking skills are so crucial to an empowered, civil society.
The more people read it, the more we will we can think critically, assess claims better, buy better, and even vote better.
It’s also good for a psychological understanding in a way. With “How to Lie with Statistics” you will learn how our mind can unwittingly make wrong connections.
And, of course, it’s great for power moves. Because you will learn how unscrupulous researchers, companies, and journalists can willfully manipulate data to sway people.