The most popular and comprehensive Open Source ECM platform
Data Anonymity: Current Techniques Don’t Work
The success of an AI algorithm can often be attributed to extensive training based on massive amounts of data. Data collected from individuals is often sanitized in order to protect the privacy of the people from whom the data was collected. The idea is that anonymized data cannot be associated with any particular person.
A number of studies now suggest that the goal of data anonymization is elusive. Data anonymization works by substituting fake values for attributes like names and addresses, removing some attributes, or only releasing some parts of the data set.
Dr. Yves-Alexandre de Montjoye, assistant professor at the Imperial College London, said that “companies and governments have downplayed the risk of re-identification by arguing that the data sets they sell are always incomplete. Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.” The study found that 99.98% of all Americans could be correctly re-identified from any dataset with 15 demographic attributes.
A paper co-authored by de Montjoye concluded that “even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.”
It is a funny title when you consider that census bureaus have been releasing anonymized data for decades about billions of individuals, and yet I have been unable to find a single example of a malicious re-identification of this data.
I can’t think of a single other security technology with that track record. So in what sense is it that anonymity is not working???
Carnegie Mellon and Stanford studies are two examples that use US census data where it was concluded that “few characteristics are needed to uniquely identify a person.”
https://dataprivacylab.org/projects/identifiability/
http://crypto.stanford.edu/~pgolle/papers/census.pdf
These studies don’t discuss using the data for ‘malicious re-identification’.
The point is that better techniques for handling anonymity are needed.