The most popular and comprehensive Open Source ECM platform
Security: Privacy with Data Anonymity
Data anonymity is is the collection of data and the subsquent removal from the data set of any personally identifiable information. Data anonymity would allow data sets to be publically published without exposing any personal information from the people whom the data was collected.
While the goal of data anonymity is good, it’s been difficult to achieve. There have been numerous examples where attempts have been made to release anonymized data only later to be found to have been re-identified. Often data can be reindentifed by combining other sources of publicly known information.
TechCrunch listed a few of these incidents:
- In 1996 health record indentities in Massachusetts health records were exposed by matching voter registration data.
- In 2006, Netflix movie viewing information was unmasked when combined with IMDB data.
One of the most secure ways for achieving data anonymity is the technique known as differential privacy. A technique developed by Cynthia Dwork for Microsoft Research is now used by data managed by Amazon, Facebook, Apple, and other large tech companies.
Differential privacy adds a certain amount of randomness to the data that can prevent data being revealed when complemented with data from other sources or background. When teh data is queried a technique accounts for the inherit random corrections and treats it as noice which is removed.
Aaron Roth, professor at the University of Pennsylvania, said that “you can dial up to perfect privacy, but then you can do almost nothing useful with the data, or you can go in the other direction and have no real protections. It’s a tradeoff, because privacy protections always come with a cost.”