Data: What Gets Reported, How the Process Works

Tim Harford, in How to Make the World Add Up, his book on how to make sense of data and statistics, highlights the problem with what comes to our attention:

“When media outlets want to grab our attention, they look for stories that are novel and unexpected over a short time horizon – and these stories are more likely to be bad than good.”

 

As an example, he cites all those headlines about the growth in global inequality. As always, the devil is in the details: yes, the gap between the richest of the richest and the poorest of the poor is indeed growing by leaps and bounds. But it’s also true that about 150,000 people are coming out of poverty every day. Yes, every day. Over the last couple of decades, China and India have pulled out almost a billion people out of poverty. See how this example aligns with Harford’s point on the headlines? Short term v/s long term; bad news v/s good news.

 

Plus, as Harford reminds us, both “Another terrible crime has occurred!” and “Overall, crime is way down” can be true at the same time. No surprises for which headline would make the news...

 

Another area to understand is the backstory to the data/facts. Take academia, for example. In people related topics, the first study that finds something new or unexpected gets published (after some checks, peer reviews etc). But here’s the kicker: if subsequent attempts to reproduce the result don’t succeed, or even contradict the first study, they do not get published. (This is only in people related sciences, not in physical sciences). This has major consequences: if a study showed something surprising, it will get published and make headlines. But any followups that disagree will never see light of day. And so, in the general population, the first study and its result sticks and becomes the accepted reality. There’s even a term for this: “publication bias”…

 

In addition, many scientific studies are HARK’ed. HARK stands for “Hypothesising After the Results are Known”. See the danger with this? You do a study in full sincerity, get some results, then come up with a theory based on the results. And cite the results as support for the theory. The last step is the problem. Sure, one always gathers facts and then comes up with a theory. But one should do new tests after the theory has been framed, and only those should be cited in support. Can you be sure that study you just saw didn’t HARK it?

 

These examples may be from academia, but the points apply across fields. Once you know how things work behind the scenes, you can understand the possible side-effects, the blind spots, the likelihood of conclusions based on incomplete data. Note that Harford isn’t talk of deliberate deceit here (we know that problem all too well); he is talking of structural risks that exist no matter how data is gathered and interpreted.

 

Tough lessons to internalize, no doubt, but worth trying anyway.

Comments

Popular posts from this blog

Why we Deceive Ourselves

Europe #3 - Innsbruck

The Thrill of the Chase