The most popular and comprehensive Open Source ECM platform
Big Data and Data Anonymization: Is Anonymization an Illusion?
Data Anonymization is a method for removing personally identifiable information from a data set to protect the privacy of the individual or company that the data was collected from. It is sometimes also called data obfuscation. With the increasing use of data analytics and big data, use of anonymized data sets has become popular.
Ramon Krikken, Gartner analyst, said that “data anonymization techniques allow organizations to modify the data in such a way that the privacy of individuals within the data set remains protected at least in some way.”
Yves-Alexandre de Montjoye, research scientist at the MIT Media Lab, told SearchCompliance that “data anonymization is a two-step process — pseudonymization and de-identification. The idea, if it were to work, is to take sensitive data like mobile phone and medical data and remove any information that can link it back to an individual. We can then use it in research for example without endangering people’s privacy.”
But not everyone agrees that complete anonymization is possible. Pete Warden, writing for O’Reilly, said that “anonymization is an illusion. Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone’s actions has a good chance of matching identifiable public records.”
Paul Ohm, Law professor, said that “data can either be useful or perfectly anonymous but never both… Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization. This is no small faith, for technologists rely on it to justify sharing data indiscriminately and storing data perpetually, all while promising their users (and the world) that they are protecting privacy. Advances in reidentification expose these promises as too often illusory.”