Access and Feeds

Data Anonymity: Current Techniques Don’t Work

By Dick Weisinger

The success of an AI algorithm can often be attributed to extensive training based on massive amounts of data. Data collected from individuals is often sanitized in order to protect the privacy of the people from whom the data was collected. The idea is that anonymized data cannot be associated with any particular person.

A number of studies now suggest that the goal of data anonymization is elusive. Data anonymization works by substituting fake values for attributes like names and addresses, removing some attributes, or only releasing some parts of the data set.

Dr. Yves-Alexandre de Montjoye, assistant professor at the Imperial College London, said that “companies and governments have downplayed the risk of re-identification by arguing that the data sets they sell are always incomplete. Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.” The study found that 99.98% of all Americans could be correctly re-identified from any dataset with 15 demographic attributes.

A paper co-authored by de Montjoye concluded that “even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.”

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)
2 comments on “Data Anonymity: Current Techniques Don’t Work
  1. Paul Francis says:

    It is a funny title when you consider that census bureaus have been releasing anonymized data for decades about billions of individuals, and yet I have been unable to find a single example of a malicious re-identification of this data.

    I can’t think of a single other security technology with that track record. So in what sense is it that anonymity is not working???

Leave a Reply

Your email address will not be published. Required fields are marked *

*