Big Data and the Future of Privacy

Top News


Big data is a term for the collection of sets of data that are large and complex and then analyzing these data sets for relationships. The size of these data sets prevents traditional methods of analyzing data to be effective. Rather than focusing on precise relationships between individual pieces of data, big data uses various algorithms and techniques to to infer general trends over the entire set. What counts is the quantity of the data, rather than its quality. It looks for the correlation rather than the causation, the what rather than the why.

Big data has only become possible in the last few years with advances in collection, storage, and interpretation of data. Datafication refers to reinterpreting information into usable sets of data. Data collection, from medicine, financial institutions, social networking, and many other fields, has exploded in recent years. Storage costs for this data have plummeted which lowers the required justification for holding onto data rather than discarding it. The costs and difficulty in processing this data has also dropped. These factors, along with better techniques for analyzing the data, have allowed relationships to be discovered in ways that would not have been possible in years past.

While there are many benefits to the growth of big data analytics, many traditional methods of privacy protections fail. Many notions of privacy rely on informed consent for the disclosure and use of an individual’s private data. However, big data means that data is a resource that can be used and reused, often in ways that were inconceivable at the time the data was collected. Anonymity is also eroded in a big data paradigm. Even if every individual piece of information is striped of personal information, the relationships between the individual pieces can reveal the individual's identity.

Obama Administration Big Data Review

Following the President's speech speech on reform of the National Security Agency's bulk meta-data collection program under Section 215 of the USA Patriot Act, White House counselor John Podesta announced "a comprehensive review of the way that 'big data will affect the way we live and work; the relationship between government and citizens; and how public and private sectors can spur innovation and maximize the opportunities and free flow of this information while minimizing the risks to privacy." This was the first major privacy initiative announced by the White House since the release of the Consumer Privacy Bill of Rights in 2012. The undertaking will involve key officials across the federal government, including the President’s Science Advisor and the President's Council of Advisors on Science and Technology.

EPIC and a coalition of consumer groups has already written a letter, to John Holdren, the Director of the Office of Science and Technology Policy. EPIC urged OSTP to provide the public an opportunity to comment and suggested that the review take into consideration (but not be limited to) the following important questions about the role of Big Data in our society:

1) What potential harms arise from big data collection and how are these risks currently addressed?

(2) What are the legal frameworks currently governing big data, and are they adequate?

(3) How could companies and government agencies be more transparent in the use of big data, for example, by publishing algorithms?

(4) What technical measures could promote the benefits of big data while minimizing the privacy risks?

(5) What experience have other countries had trying to address the challenges of big data?

(6) What future trends concerning big data could inform the current debate?

Public Comments by EPIC and Other Advocacy Groups

On March 4, 2014, in response to suggestions from EPIC and other consumer privacy groups, the Office of Science and Technology Policy published a Request for Information, which provides the public an opportunity to comment on the Podesta Big Data Review. EPIC submitted comments to the Podesta Review, emphasizing how the current Big Data environment poses enormous risks to ordinary Americans. EPIC emphasized the data security risks and substantial risks to student privacy that exist in the current big data regulatory environment and called for the Administration to better implement the Fair Information Practices(FIPs) first set out in 1973.

Other groups comments include: Center for Democracy and Technology, The Future of Privacy Forum, The Privacy Coalition, The Internet Association, The Consumer Federation of America, and the Federation of American Societies for Experimental Biology.

On May 1, 2014, the White House released the Big Data Privacy Report. The President's Council of Advisors on Science and Technology ("PCAST") also released a report on the same day, entitled, "Big Data and Privacy: A Technological Perspective."

Data Brokers

Data brokers are large commercial organizations that collect vast swathes of data on millions and sometimes hundreds of millions of consumers in order to resell the data or utilize it in targeted marketing campaigns. Recently, the data broker industry as a whole has come under a great deal of scrutiny from the Federal Trade Commission and the Senate Commerce Committee. FTC Commissioner Julie Brill has announced a new initiative, "Reclaim Your Name", which is designed to promote more transparency in the data broker industry and give consumers greater control over their individual data. The Senate Commerce Committee, under the leadership of Senator Jay Rockefeller (D-WV) undertook an examination of the data broker industry this past December, holding hearings, hearings on the issue, and releasing a report, A Review of the Data Broker Industry: Collection, Use, and Sale of Consumer Data for Marketing Purposes of their findings.

Most recently, Senator Rockefeller, along with Senator Ed Markey (D-MA), released a bill entitled The Data Broker Accountability and Transparency Act. This act is designed to provide some broad guidelines for regulating the data broker industry.

Big Data Statistics

  • Google is more than 1 million petabytes in size and processes more than 24 petabytes of data a day, a volume that is thousands of times the quantity of all printed material in the U.S. Library of Congress.
  • 32 billion searches are performed each month on Twitter.
  • More than 1 billion unique users visit YouTube each month and over 6 billion hours of video are watched each month on YouTube - that's almost an hour for every person on Earth, and 50% more than last year.
  • 90 percent of the data in the world today has been created in the past two years.
  • In 2012, data was forecasted to double every two years through the year 2020.
  • In 2020, the amount of digital data produced will exceed 40 zettabytes, which is the equivalent of 5,200 gigabytes for every man, woman and child on planet earth.
  • * 1 Gigabyte = Approximately 1 full-length feature film in digital format; 1 Petabyte= One Million Gigabytes or a Quadrillion Bytes; 1 Exabyte = One Billion Gigabytes; 1 Zettabyte = One Trillion Gigabytes or One Million Petabytes.

Academic Articles


News Items

Share this page:

Support EPIC

EPIC relies on support from individual donors to pursue our work.

Defend Privacy. Support EPIC.