You Can Run But You Cannot Hide
When it comes to
things related to the Internet, European governments and bureaucrats live in a
dream world of their own, totally disconnected from reality, the power of
technology and the ingenuity of algorithms.
The latest
instance is the European Parliament’s attempt at data protection on the Net.
The easy (and naïve)
approach to data protection would be to ban almost any form of data sharing.
But the EU is not that dumb. They know that data sets can be mined for useful patterns
in lots of fields, like in healthcare. But here is where the EU is divorced
from reality: they believe there are such things as anonymous data and “pseudonymised”
data (date where true identifiers have been replaced with pseudonyms). Based on
that misconception, they exempted the first category from regulation and
subjected the second to milder regulation.
But as Seth
David Schoen said:
“Just because something seems anonymous
at first glance, doesn't mean it really is – both because of the mathematics of
individual distinctiveness and because of the huge number of databases that are
becoming available. That means we have to be extremely careful about whether
things are truly anonymous, and not rely on our intuition alone.”
In fact, there are three very high-profile instances of the
successful de-anonymising of data that had been released assuming it was sufficiently
“scrambled”: AOL's pseudonymised search data; Massachusetts's anonymised health
records; and Netflix's release of 100 million video-rental records.
But how can that
be? How can someone de-anonymize? Cory Doctorow explains the gist of it:
“There are lots of smokers in the health
records, but once you narrow it down to an anonymous male black smoker born in
1965 who presented at the emergency room with aching joints, it's actually
pretty simple to merge the "anonymous" record with a different
"anonymised" database and out pops the near-certain identity of the
patient.”
Combine enough
data points; with multiple data sets from different sources and bingo! Doctorow
talks about this topic in an article titled “Data
protection in the EU: the certainty of uncertainty”. Certainty of
uncertainty? Reminds me of one Heisenberg of the uncertainty principle fame!
Comments
Post a Comment