Is Correlation Enough?
Back
in 2008, Chris Anderson wrote an oft-quoted
article on what he called the “End of Theory” (exaggerated for effect) in
science. Here’s the summary of his article:
- The scientific way has been to
come up with models that describe reality; then test those models for any
errors or mismatches with what is observed.
- Conversely, he said:
“Scientists are trained to
recognize that correlation is not causation, that no conclusions should be
drawn simply on the basis of correlation between X and Y (it could just be a
coincidence). Instead, you must understand the underlying mechanisms that
connect the two.,,Data without a model is just noise.”
- Then came Anderson’s kicker: in
an age where we were getting enormous amounts of data about just about
everything (aka Big Data), he said that “this approach to science —
hypothesize, model, test — is becoming obsolete”. Why? Because:
“Petabytes (of data) allow us
to say: "Correlation is enough."…We can throw the numbers into the
biggest computing clusters the world has ever seen and let statistical
algorithms find patterns where science cannot.”
My
instinctive feeling was that it didn’t make sense to rely on correlations in
science. As Viktor Mayer-Schönberger and Kenneth Cukier wrote in their book
titled Big Data, correlations are
never perfect and if we rely on correlations, we will be fooled by randomness
at times.
And
yet, if only we had trusted data and correlations, how many mothers’ lives
might have been saved? Here’s that story: In 1847, Ignaz Semmelweis, a Viennese
physician noticed
that:
“Doctors were killing women by
not washing their hands in between dissecting corpses and delivering their
babies. The death rate from childbed fever was twice as high for
doctor-assisted births as it was for midwife-assisted births.”
How
did the medical community react to Semmelweis’ call for doctors to wash their
hands and use antiseptic measures? They ignored him altogether. Why? Mainly
because he “had no theory” to back him up: he only had data (The germ theory
only came about after Semmelweis’
death).
But is
that just a one-off case? Do we want our doctors and scientists to say, as
Anderson proposed, that correlation is good enough? Or do we want them to
understand things, to form their models and theories and only then proceed on
them? Or with so much data, is it now the case that nobody can even analyze all
of it before coming up with a theory to explain/fit it all? There are no easy
answers…
This is a crucial debate. Nobody knows the answer yet.
ReplyDeleteWhen a discoverer of a law proposes something, that is derived from limited data, it extends to not only more to fit more observed data of the same phenomenon but often can show some other correlating law linked with or an extension emerging. That supports the argument mentioned, "you must understand the underlying mechanisms that connect the two.,,Data without a model is just noise".
Evan with such cases in not only abundance but also being good at compelling our admiration to the scientist of such insight, there is a persisting opinion that all theories are only empirical: i.e. just curve fitting of observed data, nothing more.
A little confusing for me, because I still find that by addressing something as underlying cause, we are dealing with things that are universal and also pin-pointing further direction. That doesn't seem to have takers these days.
Nowadays I am tiring of intellectual debates. I am perhaps not sharp enough to grasp things well.