Is Correlation Enough?

Back in 2008, Chris Anderson wrote an oft-quoted article on what he called the “End of Theory” (exaggerated for effect) in science. Here’s the summary of his article:
-         The scientific way has been to come up with models that describe reality; then test those models for any errors or mismatches with what is observed.
-         Conversely, he said:
“Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two.,,Data without a model is just noise.”
-         Then came Anderson’s kicker: in an age where we were getting enormous amounts of data about just about everything (aka Big Data), he said that “this approach to science — hypothesize, model, test — is becoming obsolete”. Why? Because:
“Petabytes (of data) allow us to say: "Correlation is enough."…We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”

My instinctive feeling was that it didn’t make sense to rely on correlations in science. As Viktor Mayer-Schönberger and Kenneth Cukier wrote in their book titled Big Data, correlations are never perfect and if we rely on correlations, we will be fooled by randomness at times.

And yet, if only we had trusted data and correlations, how many mothers’ lives might have been saved? Here’s that story: In 1847, Ignaz Semmelweis, a Viennese physician noticed that:
“Doctors were killing women by not washing their hands in between dissecting corpses and delivering their babies. The death rate from childbed fever was twice as high for doctor-assisted births as it was for midwife-assisted births.”
How did the medical community react to Semmelweis’ call for doctors to wash their hands and use antiseptic measures? They ignored him altogether. Why? Mainly because he “had no theory” to back him up: he only had data (The germ theory only came about after Semmelweis’ death).

But is that just a one-off case? Do we want our doctors and scientists to say, as Anderson proposed, that correlation is good enough? Or do we want them to understand things, to form their models and theories and only then proceed on them? Or with so much data, is it now the case that nobody can even analyze all of it before coming up with a theory to explain/fit it all? There are no easy answers…

Comments

  1. This is a crucial debate. Nobody knows the answer yet.

    When a discoverer of a law proposes something, that is derived from limited data, it extends to not only more to fit more observed data of the same phenomenon but often can show some other correlating law linked with or an extension emerging. That supports the argument mentioned, "you must understand the underlying mechanisms that connect the two.,,Data without a model is just noise".

    Evan with such cases in not only abundance but also being good at compelling our admiration to the scientist of such insight, there is a persisting opinion that all theories are only empirical: i.e. just curve fitting of observed data, nothing more.

    A little confusing for me, because I still find that by addressing something as underlying cause, we are dealing with things that are universal and also pin-pointing further direction. That doesn't seem to have takers these days.

    Nowadays I am tiring of intellectual debates. I am perhaps not sharp enough to grasp things well.

    ReplyDelete

Post a Comment

Popular posts from this blog

Student of the Year

The Retort of the "Luxury Person"

Animal Senses #7: Touch and Remote Touch