Data: Dark, Found, and Crunched
From Tim Harford’s book on data and statistics, How to Make the World Add Up , let’s look at a couple of other things to watch out for when we see conclusions drawn.       Let’s start with an example:   “Consider the historical under-representation of women in clinical trials. One grim landmark was thalidomide, which was widely used by pregnant women to ease morning sickness only for it to emerge that the drug could cause severe disability and death to unborn children.”   Even if we try and collect data with the right representation of men/women, rich/poor, urban/rural etc, we still run into the problem that many don’t respond. Or certain types of people can’t be found easily. All these are a form of “dark data”. (This problem cannot be solved entirely).  The general danger here is that all analysis is, by definition, done with “found data” only. And that is a lesson Harford asks us to remember:   “We can and should remember to ask who or what might be missing from the data we’re b...