Access and Feeds

How to Secure Data When Computing with PAC Privacy

By Dick Weisinger

Data privacy is a major concern for many applications of machine learning, especially when sensitive information such as medical records or personal images is involved. How can we share useful models without revealing the data they were trained on?

A new technique developed by MIT researchers could offer a solution. It is called Probably Approximately Correct (PAC) Privacy, and it allows users to automatically determine the minimal amount of noise that needs to be added to a model to protect the data from adversaries.

Unlike other privacy approaches, PAC Privacy does not require knowledge of the model’s architecture or training process. It only focuses on the output of the model and how hard it would be for an adversary to reconstruct any part of the data from it.

For example, if the data are images of human faces, PAC Privacy could measure whether an adversary could extract a recognizable silhouette of a face from the model, rather than just whether they could tell if a face was in the dataset or not.

The user can specify their desired level of confidence and accuracy for the privacy guarantee. For instance, they may want to ensure that an adversary will not be more than 1% confident that they have successfully reconstructed the data to within 5% of its actual value. The PAC Privacy algorithm will then tell the user the optimal amount of noise that needs to be added to the model before it is shared publicly.

The researchers show that PAC Privacy can significantly reduce the amount of noise needed to protect sensitive data, compared to other methods. This could help preserve the accuracy and utility of machine-learning models in real-world settings, while still ensuring data privacy.

PAC Privacy is a novel and powerful framework that exploits the uncertainty or entropy of the data in a meaningful way. It could enable engineers and scientists to share their models with confidence, without compromising the privacy of their data sources.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Leave a Reply

Your email address will not be published. Required fields are marked *

*