Models v/s Patterns

In response to my blog on the three generations of the Internet, my dad had commented:
“Are we all stuffing ourselves with data and information, with very little time and inclination left for sharpening our innate, marvelous tool that evolution has led us to - the ability for digging meaning out of abundant data?...Will we have humankind reduced its ability to mind's ability for keen insights, failing to appropriately sharpening our grand mind potential?”

Such questions have been asked and debated for years (ironically) on the Internet! Is correlation good enough? For example, Big Data would allow algorithms to tell you where the planet would be without ever discovering Kepler’s laws of planetary motion; but is that the same as knowledge? On the other hand, can laws only be found for the inanimate universe? And is Big Data the (only) way to go when it comes to predicting humans? Is George Box’s statement (“All models are wrong, but some are useful.”) so true for humans that it is better to rely on patterns rather than models when it comes to people?

Given the money being minted by software companies that crunch the seemingly infinite datasets to form patterns (not models, but patterns), the (commercial) pendulum seems well and truly on the side of correlation-is-good-enough. Like Google’s auto-prediction as you type. Or Google Translate. Or Facebook feeds. Or Amazon’s recommendations. No wonder then that Kevin Kelly calls this (using data and patterns but never finding an underlying theory) the Google way of doing science! As Chris Anderson puts it:
“Petabytes allow us to say: “Correlation is enough.”

All this raises a different, almost philosophical question, the one Kelly asks:
“We may get answers that work, but which we don’t understand. Is this partial understanding? Or a different kind of understanding?

Then there’s the eternal warning that correlation is not causation. Harvard Law School student Tyler Vigen created this site with obviously uncorrelated pieces of data that still have a very high correlation co-efficient to bring out this warning wonderfully well. A few examples from his site should convey the point:
And:

Pablo Picasso is rumoured to have said:
“The problem with computers is that they only give you answers.”
And with Big Data and the Internet, computers have become awesome at providing certain types of answers. So perhaps, as Kelly says:
“The real value of the rest of science then becomes asking good questions.”

But computers are still nowhere near coming up with hypothesis on causes; so that job will continue to fall on the Newtons, Darwins and Bohrs of the world.

Comments

Post a Comment

Popular posts from this blog

Why we Deceive Ourselves

Europe #3 - Innsbruck

The Thrill of the Chase