You Can Run But You Cannot Hide

When it comes to things related to the Internet, European governments and bureaucrats live in a dream world of their own, totally disconnected from reality, the power of technology and the ingenuity of algorithms.

The latest instance is the European Parliament’s attempt at data protection on the Net. The easy (and naïve) approach to data protection would be to ban almost any form of data sharing. But the EU is not that dumb. They know that data sets can be mined for useful patterns in lots of fields, like in healthcare. But here is where the EU is divorced from reality: they believe there are such things as anonymous data and “pseudonymised” data (date where true identifiers have been replaced with pseudonyms). Based on that misconception, they exempted the first category from regulation and subjected the second to milder regulation.

But as Seth David Schoen said:
“Just because something seems anonymous at first glance, doesn't mean it really is – both because of the mathematics of individual distinctiveness and because of the huge number of databases that are becoming available. That means we have to be extremely careful about whether things are truly anonymous, and not rely on our intuition alone.”

In fact, there are three very high-profile instances of the successful de-anonymising of data that had been released assuming it was sufficiently “scrambled”: AOL's pseudonymised search data; Massachusetts's anonymised health records; and Netflix's release of 100 million video-rental records.

But how can that be? How can someone de-anonymize? Cory Doctorow explains the gist of it:
“There are lots of smokers in the health records, but once you narrow it down to an anonymous male black smoker born in 1965 who presented at the emergency room with aching joints, it's actually pretty simple to merge the "anonymous" record with a different "anonymised" database and out pops the near-certain identity of the patient.”
Combine enough data points; with multiple data sets from different sources and bingo! Doctorow talks about this topic in an article titled “Data protection in the EU: the certainty of uncertainty”. Certainty of uncertainty? Reminds me of one Heisenberg of the uncertainty principle fame!

Comments

Popular posts from this blog

Student of the Year

Why we Deceive Ourselves

Handling of the Satyam Scam