When N = All
Statistics is
all about analyzing data for patterns. In the past, the size and quality of the
sample set was critical. Sometimes, even a problem.
Enter Big Data. Or
as Kenneth Neil Cukier and Viktor Mayer-Schoenberger wrote in their article, The Rise of Big Data wrote:
“But if we collect all the data -- “n =
all,” to use the terminology of statistics -- the problem disappears.”
Sure, “n = all”
is an exaggeration. But it is true that the size of data samples has gone
through the roof over the last decade or so.
And no, Big Data
doesn’t just refer to the size of the data:
“Big data is also characterized by the
ability to render into data many aspects of the world that have never been
quantified before; call it “datafication.””
A few examples
would help understand “datafication” better: take location and friendship. They
got datafied due to GPS and Facebook respectively!
And if
datafication is here, can algorithms be far behind? Google Translate is based
on statistical analysis of all the content on the Net. UPS, the courier
service, places sensors on vehicle parts to identify signs that it may break
down soon, thereby allowing them to take pre-emptive action. A Japanese
institute has datafied how you sit (posture, weight distribution etc) and is
considering making car seats that will use the data to “recognize” you, thereby
preventing car theft!
The critical
thing about Big Data is that it changed the emphasis from causation to
correlation. Knowing why is no longer
necessary in a lot of scenarios. Of course, correlation is never a guarantee of
anything. Like that time Google’s mining of its search queries suggested there
was a flu epidemic…except there wasn’t.
Another amusing
consequence of datafication in turn leading to commercial applications is the
following:
“…in the future, when China censors
Internet searches, it might face complaints not only about unjustly muzzling
speech but also about unfairly restraining commerce.”
But one thing
Big Data cannot do is predict
innovation or disruptive technologies. As Henry Ford famously said:
“If I had asked people what they wanted,
they would have said a faster horse.”
So as long as we
know its potential and its limitations, I think we will benefit a lot from Big
Data, privacy fears notwithstanding.
Comments
Post a Comment