Consumer Privacy

Web Scraping

Background

Scrapers invade our privacy when they collect personal information from social media profiles and other websites and use the data in ways we did not consent to, like building facial recognition databases and scoring systems.

Documents

Web scraping is the automated collection of information from websites. A lot of the scraping that happens does not harm our privacy. Journalists and researchers scrape data about corporations and the government to expose misconduct or shed light on obscure processes. Corporations scrape data from other corporations to keep tabs on their competitors or to create one-stop shops for consumers to compare prices on goods and services.

But when a scraper collects personal information from social media profiles and other websites, they take control of our data—and the privacy harms that result can be profound. We should not have to choose between having an online presence and being placed in a facial recognition database—which is why we need to regulate the scraping of personal information.

The Privacy Interest In Publicly Available Data

The internet is an essential part of today’s society, and an online presence of some kind is practically a necessity. We often make information like our name, photo, and other information viewable to the public so that people can find us. Many professionals are expected to be on networking sites such as LinkedIn, where we sometimes disclose our names, photos, cities, work history, and education; students are expected to be on Facebook or Instagram to hear about events on campus, and these profiles may make our names, photos, and other information publicly viewable. Many peer-to-peer services also involve some sort of public profile, like our names and photos on Venmo that help friends and family find the right account to pay.

When we make information available for the public to view on social media or the web, we do not expect or intend that others will take that information and do with it as they please. We expect that our data will only be used for purposes we choose, and that the privacy controls that we select for the data will be respected.

Privacy Harms From Web Scraping

Scrapers do not adhere to the privacy policies of the websites they scrape, nor do they ask our permission to take our data or process it. When our personal information is scraped, we lose control of that information. We can no longer limit who can view the information, what it is used for, or delete it. The information can be used in ways we never intended or consented to when we posted the information.

Scraped personal information might be:

  • Combined and/or enriched with data from other sources to create detailed profiles on individuals;
  • Sold to data brokers, scammers, or governments;
  • Used to score you or otherwise make decisions about you;
  • Used to create biometric profiles, like Clearview AI’s facial recognition database, which is built entirely from scraped photographs.

Preventing these harms requires regulating the collection and use of publicly available personal information on the internet. It also requires companies that host personal data on their websites to actively monitor automated scraping on their servers and to stop scrapers before they cause extensive harm.

EPIC’s Work

EPIC is leading the fight for a comprehensive federal privacy law and a federal data protection agency, which would regulate data collection and processing from web scraping. EPIC has also participated as a friend of the court in hiQ v. LinkedIn to represent the interests of the public against the scraping of their personal information from LinkedIn.

Recent Documents on Web Scraping

  • Amicus Briefs

    McCarthy v. Amazon

    US Court of Appeals for the Ninth Circuit

  • Amicus Briefs

    Van Buren v. United States

    US Supreme Court

    Whether a police officer "exceeds authorized access" under the Computer Fraud & Abuse Act when they access personal information in a government database for an improper purpose.

  • Amicus Briefs

    LinkedIn Corp. v. hiQ Labs, Inc.

    US Supreme Court

    Whether a court can compel an internet company to provide third-party data scrapers access to user data.

Support Our Work

EPIC's work is funded by the support of individuals like you, who help us to continue to protect privacy, open government, and democratic values in the information age.

Donate