Data Anonymity: Current Techniques Don’t Work

By Dick Weisinger

The success of an AI algorithm can often be attributed to extensive training based on massive amounts of data. Data collected from individuals is often sanitized in order to protect the privacy of the people from whom the data was collected. The idea is that anonymized data cannot be associated with any particular person.

A number of studies now suggest that the goal of data anonymization is elusive. Data anonymization works by substituting fake values for attributes like names and addresses, removing some attributes, or only releasing some parts of the data set.

Dr. Yves-Alexandre de Montjoye, assistant professor at the Imperial College London, said that “companies and governments have downplayed the risk of re-identification by arguing that the data sets they sell are always incomplete. Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.” The study found that 99.98% of all Americans could be correctly re-identified from any dataset with 15 demographic attributes.

A paper co-authored by de Montjoye concluded that “even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.”

October 10th, 2019

Category: Artificial Intelligence, Data Management

2 Comments

2 comments on “Data Anonymity: Current Techniques Don’t Work”

Paul Francis says:

October 11, 2019 at 8:15 am

It is a funny title when you consider that census bureaus have been releasing anonymized data for decades about billions of individuals, and yet I have been unable to find a single example of a malicious re-identification of this data.

I can’t think of a single other security technology with that track record. So in what sense is it that anonymity is not working???

Reply
- dweisinger says:
  
  October 11, 2019 at 8:58 am
  
  Carnegie Mellon and Stanford studies are two examples that use US census data where it was concluded that “few characteristics are needed to uniquely identify a person.”
  
  https://dataprivacylab.org/projects/identifiability/
  http://crypto.stanford.edu/~pgolle/papers/census.pdf
  
  These studies don’t discuss using the data for ‘malicious re-identification’.
  The point is that better techniques for handling anonymity are needed.
  
  Reply

Leave a Reply Cancel reply

Legal Terms & Disclaimers

This blog site is accessed from the website of Formtek, Inc. All visitors to or users of this blog site are subject to the terms and conditions and privacy policy that govern the Formtek website, links for which are provided above.

Some of the individuals posting to this blog site, including the moderators, work for Formtek. Postings by these individuals are the personal opinions of these individuals, not of Formtek. Their posted content is provided for informational purposes only and is not meant to be an endorsement or representation by Formtek or any other party. Postings to this blog site may be outdated, invalid or inaccurate by the time you read them. Individuals posting to this blog site make no statements, representations or warranties as to the timing, validity, accuracy or reliability of their postings.

This blog site may contain links to third party sites. Access to any third party site linked to this blog site is at your own risk. None of Formtek, the blog site moderator(s) and the individuals posting on this blog site that work for Formtek is responsible for the timing, validity, accuracy or reliability of any information, data, opinions, advice or statements made on these third party sites. These links are provided merely as a convenience and do not imply any endorsement.

Postings to this blog site are available to the public. You should not post, link to or otherwise upload any information considered confidential to this blog site. All postings to this blog site are moderated. Postings will appear if and when they are approved by the moderator. Notwithstanding any approval by the moderator, by posting information to this blog site, you agree to be solely responsible for the information you post, link to, or otherwise upload to the blog site. You agree to release Formtek from any liability related to that information or to your use of the blog site. You grant Formtek a worldwide, perpetual, irrevocable, royalty-free, fully-paid, and transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any information you post, link to or otherwise upload to this blog site.

Data Anonymity: Current Techniques Don’t Work

2 comments on “Data Anonymity: Current Techniques Don’t Work”

Leave a Reply Cancel reply

Company

Products and Services

News

Resources

Legal Terms & Disclaimers