Models v/s Patterns
In response to my
blog
on the three generations of the Internet, my dad had commented:
“Are we all stuffing ourselves with data
and information, with very little time and inclination left for sharpening our innate,
marvelous tool that evolution has led us to - the ability for digging meaning
out of abundant data?...Will we have humankind reduced its ability to mind's
ability for keen insights, failing to appropriately sharpening our grand mind
potential?”
Such questions have
been asked and debated for years (ironically) on the Internet! Is correlation
good enough? For example, Big Data would allow algorithms to tell you where the
planet would be without ever discovering Kepler’s laws of planetary motion; but
is that the same as knowledge? On the other hand, can laws only be found for the
inanimate universe? And is Big Data the (only) way to go when it comes to predicting
humans? Is George Box’s statement (“All models are wrong, but some are useful.”)
so true for humans that it is better to rely on patterns rather than models
when it comes to people?
Given the money
being minted by software companies that crunch the seemingly infinite datasets
to form patterns (not models, but patterns), the (commercial) pendulum seems
well and truly on the side of correlation-is-good-enough. Like Google’s auto-prediction
as you type. Or Google Translate. Or Facebook feeds. Or Amazon’s
recommendations. No wonder then that Kevin Kelly calls this (using data and
patterns but never finding an underlying theory) the Google way of doing
science! As Chris
Anderson puts it:
“Petabytes allow us to say: “Correlation
is enough.”
All this raises
a different, almost philosophical question, the one Kelly asks:
“We may get answers that work, but which
we don’t understand. Is this partial understanding? Or a different kind of
understanding?”
Then there’s the
eternal warning that correlation is not causation. Harvard Law School student
Tyler Vigen created this site with obviously
uncorrelated pieces of data that still have a very high correlation co-efficient
to bring out this warning wonderfully well. A few examples from his site should
convey the point:
And:
Pablo Picasso is
rumoured to have said:
“The problem with computers is that they
only give you answers.”
And with Big
Data and the Internet, computers have become awesome at providing certain types
of answers. So perhaps, as Kelly says:
“The real value of the rest of science
then becomes asking good questions.”
But computers are
still nowhere near coming up with hypothesis on causes; so that job will
continue to fall on the Newtons, Darwins and Bohrs of the world.
Very good.
ReplyDelete