How to Lie with Statistics is a wonderful book on how statistics and numbers, while they give the impression of “objectivity”, can easily be misread, misinterpreted or flat out “massaged” to sway people’s opinions.
- Watch out for missing data: 3 days to cure a cold with X drug is no good if you healed quicker without
- Always ask yourself what biases could go into a survey/statistics/sample
- Ask yourself if the individual presenting the statistics might have second motives
The ideal sample is large, measures only one trait while controlling the other, and removes all sorts of biases.
Such a sample is hard to obtain, and most “statistics” instead have all sorts of biases.
How to Lie with Statistics lists a few of them.
The following biases are some of the most common and and highly disruptive biases when measuring a sample:
#1. Asking People
People tend to answer with what makes them look good. Or with what they wish they’d do, or with what they would like to do.
Not always they will answer instead with what they actually do.
Even when you meet a fella who wants to tell you the truth and nothing but the truth, will he be able to remember exactly what you’re asking for?
Thus if you ask people what they actually like, they will rarely come up with the truth.
#2. Self-Selection Bias
Many studies are flawed by design.
For example if I send a questionnaire out most people replying will be people who maybe have more time to reply.
If I ring people’s doors to ask question it will be people who are more often at home -unemployed, non-travelers?-.
Even if I stop people in the street I will end up selecting friendlier or more agreeable personalities.
#3. Pleasing the Interviewer
Whenever I ask a question about touchy subjects there might be an in-built bias in willing to please the interviewer.
If I ask for example what do they think of, say, racism, few people will be open and frank about it.
#4. Small Sample Biases
It’s easy to come up with a true statistics that though means very little or that is actually deceitfully misleading.
For example if you wanted to prove that a medication is very effective in curing a cold, you could assemble many small sample. Then you can measure how long it takes to get better in each.
Since the samples are small, normal variation will likely give you a group that, by chance, healed quicker.
And you can publish the result of the group that fared the better
#5. Intervening Factors Biases
Another issue of comparing statistics across countries or times and, in general, in real life, is that of intervening factors that it’s difficult to account for.
For example one study once link milk to cancer because countries that consumed more milk had higher incidence of cancer than countries with a low milk consumption.
Darrell Huff explains that the countries with more milk consumption were also richer and lived longer. That by itself can explain the higher incidence of cancer. And if you also factor in that cancer is more likely to happen in old age, that could explain all by itself the difference in cancer rates.
No milk withstanding.
Changing of Meaning
Sometimes bureaus change their description or what a word means, or a word comes to acquire a totally different meaning.
So for example when you calculate the number of “farms”, you should adjust for the changes in census, because what is considered today a “farm” is not the same as it was fifty years ago.
There are many ways to “massage” your charts:
- Raise the starting point
To show bigger improvements you can make a chart start not at zero, but at the latest number in your data set.
For example, if you want to show an improvement from 20 to 22 and then 23.5 and you start your chart from zero, the improvement will look small.
But if you make the chart start at 20, it will look like a huge change.
- Leverage Volume
If you want to show the increase of something -say, salaries- a fair way would be to use bar charts that only change in height.
If you want to make the increase look bigger though you can use geometrical shapes. Say for example you choose the icon “bag of money” to show the salaries increase.
When you increase the height of the bag of money you don’t just change the height, but you also change the width. The result is that your bag of money will acquire a total area that is 4 times bigger, not twice.
- Don’t Label the Axis
Another trick is to not put measures and labels on a chart.
Make the reader do his own assumption and if the chart looks impressive he will make favorable assumptions.
Proving Without Proving
This one is almost pure marketing.
Say that you cannot prove that your new drug helps cure a cold. But you can prove that it kills thousands of germs. Maybe even more than your previous drugs. Maybe 30% more?
It might not to a thing to help people overcome a cold, but it certainly sounds good.
Or an advertisers could say that physicians smoke brand X of cigarettes more than any other cigarettes. But does that mean anything? Physicians might not have much insider knowledge on “healthier cigarettes”. As a matter of fact, there is no such thing as healthy tobacco.
When people talk about an increase of 10%, you should always wonder: increase compared to what? The previous month, the previous week? And was it an increase across the board or did they pick a figure that increased the most?
Also people who want to give different impressions can easily switch from percentages to whole number depending on which one sounds better.
The percentage drop is even easier to massage.
For example describing the move from 13% to 10% as “a drop of 3 percentage points” sounds much better than “a decrease of 23%”.
Cherry-Picking the Best Figure
Similarly, you can express the same fact by calling it in many different ways, depending on what your goals are and what sounds more impressive -or less scary-.
- 2% percent return on sales
- 12% return on investment
- 5 million profit
- 35% increase in profits
Correlation Not Causation
And here is the big one, one that I have been criticizing in so many of my book reviews.
Many publications use correlation to imply causation.
Take for example the famous case of “studying leads to higher salary”. While that makes a lot of sense, it’s not so straightforward.
Kids who to go to college are disproportionately bright and rich.
The bright and rich would have likely earned more than non-college degree individuals even without a college degree.
And it’s not true that “the more you study, the more you earn”.
The most dishonest individuals can also look for a correlation and then plug a cause that works well for them.
Politicians do it often. For example Donald Trump took the crime statistics of a single US city where crime had increased. And blamed democrats on it.
There are three different types of averages:
- Mean: add all values and divide by the total quantity
- Mode: most common value in a data sample
- Median: value in the middle of the sample
In normal distribution -as it’s often the case for human traits-, the three will be similar to each other.
But in data sets with irregular distributions -for example income and salaries- they will be very different.
And the biased statisticians can pick the one that most suit his needs.
Or simply use the most common mean when the mean hides a very different reality.
For example a CEO could say that at company X the salaries went up 40% in the last 10 years. But he would’t say that they went up 1% for for doormen and security, 5% for low level employees and 30% for the C-suite.
How to Critically Approach Statistics
Darrell Huff proposes five question we can ask to analyze statistics are not misleading us:
- Who Says So?
First question you should analyze is the source.
Is the source inherently biased? Is the an advertisement, does a researcher want to defend his theory, does a newspaper want to sell more with a sensationalist story?
- How Does He Know?
How did he conduct the research?
Is the sample big enough? Did he send questionnaire, did he stop people in the streets? What could be wrong with that?
- What’s Missing?
What other influences have not been taken into account?
Was there a large control group and does the author mention about it? If the author is using percentages, does he also give the raw numbers for a more complete picture?
- Did Somebody Change the Subject?
Changing the subject happens when we move one result to a similar, but actually different environment.
For example a politician critical of how expensive prisons were for the public said it would cost less to house inmates in a hotel than in Alcatraz.
However he changed the subjects from “total costs of inmates daily” to “daily charges of a hotel”. Of course the latter does not include surveillance, food, healthcare, rehabilitation etc.
- Does it Make Sense?
Sometimes a statistics can make no sense.
And the fact that we’re dealing with numbers should not lead to a suspension of common sense.
One example from Darrell Huff is the following: Since life expectancy is only 63 years, it’s a sham and a fraud to set the retirement age of 65 because virtually everybody dies before that.”
Even in the 1939-1941 period that wasn’t true and just by looking around one could have realized it was wrong.
Sometimes old truth can be taken for granted even when they’re not true anymore.
A major appliances company right after WWII was planning on making smaller appliances to accommodate the declining birth-rate. Then one of the guys looked around and realized that he and his friend were not planning to have fewer children.
Just looking around would have revealed that wasn’t true, but everyone was taking old statistics as gospel.
A last example is that of “plotting the results ad infinitum”. We see a trend and some poor statistician will plot the results in the future. But trends rarely continue for ever.
Proper treatment cures a cold in 7 days. But left to itself it will hang on for a whole week.
Real Life Applications
Always Critically Read Data
Don’t be ready to believe the numbers “because they’re numbers”. Anything can be bent to fit a propaganda or a biased individual.
Don’t Stop at First Critical Level!
One danger I see with How to Lie with Statistics is that people might stop at the first level: taking apart a statistics without reflecting that actually even a flawed statistics might be a strong warning signal or it may capture an important trend.
For example the author criticizes the statistics that women who attend college marry less that the national average while men marry more. Instead of dismissing that result, one should ask if that makes sense and why it’s happening.
My guess is that it makes sense because more women are attending college -read a scientific approach a to dating-.
Great Education Against Cheats and Power Movers
The more people read books like this, the fewer the people that are easy to dupe and swindle.
And that will make for better world to live in.
Short and To The Point
No fluff, no repetition: I love it.
How to Lie with Statistics is a five stars, must read book for those who aren’t aware of the pitfalls and “sneaky tricks” that hide behind statistics -ie.: most of us-.
I believe that How to Lie with Statistics can make for a better world.
The more people read it, the more we will we can think critically, assess claims better, buy better and even vote better.
It’s also good for a psychological understanding in a way. With How to Lie with Statistics you will learn how our mind can unwittingly make wrong connections.
And, of course, it’s great for power moves. Because you will learn how unscrupulous researchers, companies and journalists can willfully manipulate data to sway people.