Differential Privacy
One of the key mechanisms for addressing the privacy risks of big data is differential privacy.
One of the key mechanisms for addressing the privacy risks of big data is differential privacy. A differentially private data set or algorithm is one that relies on controlled injections of statistical noise to prevent individuals from being identified and linked with their data. Differential privacy is not a single method or algorithm, but rather “a promise, made by a data holder, or curator, to a data subject: ‘You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, data sets, or information sources, are available.’” As Prof. Cynthia Dwork explains:
Differential privacy has been deployed in a wide range of contexts, including by Apple, Facebook, LinkedIn, Microsoft, and other major technology firms to protect certain types of personal data. But perhaps the best-known application is the U.S. Census Bureau’s use of differential privacy in its data products starting with the 2020 Census. By injecting controlled amounts of statistical noise into published census tables, the Bureau can produce useful data while simultaneously providing mathematical guarantees of privacy. Differential privacy allows the Bureau to protect against increasingly sophisticated reconstruction and reidentification attacks that threaten the confidentiality of individual census responses. As danah boyd describes, the potential harms of these attacks are significant:
During the 2020 Census, Alabama filed a federal lawsuit challenging the Bureau’s deployment of differential privacy. EPIC filed an amicus brief in the case arguing that differential privacy is “the only credible technique” to guard against reidentification attacks. EPIC also argued that differential privacy “is not the enemy of statistical accuracy,” but rather “vital to securing robust public participation in Census Bureau surveys[.]” The court ultimately denied Alabama’s motion