The most popular and comprehensive Open Source ECM platform
The success of an AI algorithm can often be attributed to extensive training based on massive amounts of data. Data collected from individuals is often sanitized in order to protect the privacy of the people from whom the data was collected. The idea is that anonymized data cannot be associated with any particular person.
A number of studies now suggest that the goal of data anonymization is elusive. Data anonymization works by substituting fake values for attributes like names and addresses, removing some attributes, or only releasing some parts of the data set.
Dr. Yves-Alexandre de Montjoye, assistant professor at the Imperial College London, said that “companies and governments have downplayed the risk of re-identification by arguing that the data sets they sell are always incomplete. Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.” The study found that 99.98% of all Americans could be correctly re-identified from any dataset with 15 demographic attributes.
A paper co-authored by de Montjoye concluded that “even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.”