When N = All

Statistics is all about analyzing data for patterns. In the past, the size and quality of the sample set was critical. Sometimes, even a problem.

Enter Big Data. Or as Kenneth Neil Cukier and Viktor Mayer-Schoenberger wrote in their article, The Rise of Big Data wrote:
“But if we collect all the data -- “n = all,” to use the terminology of statistics -- the problem disappears.”
Sure, “n = all” is an exaggeration. But it is true that the size of data samples has gone through the roof over the last decade or so.

And no, Big Data doesn’t just refer to the size of the data:
“Big data is also characterized by the ability to render into data many aspects of the world that have never been quantified before; call it “datafication.””
A few examples would help understand “datafication” better: take location and friendship. They got datafied due to GPS and Facebook respectively!

And if datafication is here, can algorithms be far behind? Google Translate is based on statistical analysis of all the content on the Net. UPS, the courier service, places sensors on vehicle parts to identify signs that it may break down soon, thereby allowing them to take pre-emptive action. A Japanese institute has datafied how you sit (posture, weight distribution etc) and is considering making car seats that will use the data to “recognize” you, thereby preventing car theft!

The critical thing about Big Data is that it changed the emphasis from causation to correlation. Knowing why is no longer necessary in a lot of scenarios. Of course, correlation is never a guarantee of anything. Like that time Google’s mining of its search queries suggested there was a flu epidemic…except there wasn’t.

Another amusing consequence of datafication in turn leading to commercial applications is the following:
“…in the future, when China censors Internet searches, it might face complaints not only about unjustly muzzling speech but also about unfairly restraining commerce.”

But one thing Big Data cannot do is predict innovation or disruptive technologies. As Henry Ford famously said:
“If I had asked people what they wanted, they would have said a faster horse.”

So as long as we know its potential and its limitations, I think we will benefit a lot from Big Data, privacy fears notwithstanding.

Comments

Popular posts from this blog

Why we Deceive Ourselves

Europe #3 - Innsbruck

The Thrill of the Chase