Web Scraping

Background

Scrapers invade our privacy when they collect personal information from social media profiles and other websites and use the data in ways we did not consent to, like building facial recognition databases and scoring systems.

Documents

Amicus Briefs

Web scraping is the automated collection of information from websites. A lot of the scraping that happens does not harm our privacy. Journalists and researchers scrape data about corporations and the government to expose misconduct or shed light on obscure processes. Corporations scrape data from other corporations to keep tabs on their competitors or to create one-stop shops for consumers to compare prices on goods and services.

But when a scraper collects personal information from social media profiles and other websites, they take control of our data—and the privacy harms that result can be profound. We should not have to choose between having an online presence and being placed in a facial recognition database—which is why we need to regulate the scraping of personal information.

The Privacy Interest In Publicly Available Data

The internet is an essential part of today’s society, and an online presence of some kind is practically a necessity. We often make information like our name, photo, and other information viewable to the public so that people can find us. Many professionals are expected to be on networking sites such as LinkedIn, where we sometimes disclose our names, photos, cities, work history, and education; students are expected to be on Facebook or Instagram to hear about events on campus, and these profiles may make our names, photos, and other information publicly viewable. Many peer-to-peer services also involve some sort of public profile, like our names and photos on Venmo that help friends and family find the right account to pay.

When we make information available for the public to view on social media or the web, we do not expect or intend that others will take that information and do with it as they please. We expect that our data will only be used for purposes we choose, and that the privacy controls that we select for the data will be respected.

Privacy Harms From Web Scraping

Scrapers do not adhere to the privacy policies of the websites they scrape, nor do they ask our permission to take our data or process it. When our personal information is scraped, we lose control of that information. We can no longer limit who can view the information, what it is used for, or delete it. The information can be used in ways we never intended or consented to when we posted the information.

Scraped personal information might be:

Combined and/or enriched with data from other sources to create detailed profiles on individuals;
Sold to data brokers, scammers, or governments;
Used to score you or otherwise make decisions about you;
Used to create biometric profiles, like Clearview AI’s facial recognition database, which is built entirely from scraped photographs.

Preventing these harms requires regulating the collection and use of publicly available personal information on the internet. It also requires companies that host personal data on their websites to actively monitor automated scraping on their servers and to stop scrapers before they cause extensive harm.

EPIC’s Work

EPIC is leading the fight for a comprehensive federal privacy law and a federal data protection agency, which would regulate data collection and processing from web scraping. EPIC has also participated as a friend of the court in hiQ v. LinkedIn to represent the interests of the public against the scraping of their personal information from LinkedIn.

Top Updates

EPIC Urges the NTIA to Tackle Privacy Harms, Bias, and Regulatory Hurdles in New Comment on AI Model Openness

March 28, 2024

EPIC Submits Amicus Brief Urging Ninth Circuit to Permit Product Liability Claims Against Amazon in Case About the Sale of Suicide Chemicals to Minors

December 12, 2023

EU Parliament Approves AI Act, Urges Rejection of Transatlantic Data Framework

May 12, 2023

All Updates

Support Our Work

EPIC's work is funded by the support of individuals like you, who help us to continue to protect privacy, open government, and democratic values in the information age.

Donate

Web Scraping

Background

Documents

The Privacy Interest In Publicly Available Data

Privacy Harms From Web Scraping

EPIC’s Work

Recent Documents on Web Scraping

EPIC Comments to UK ICO Call for Views on “Consent or Pay” Business Models

McCarthy v. Amazon

Van Buren v. United States

LinkedIn Corp. v. hiQ Labs, Inc.

Top Updates

EPIC Urges the NTIA to Tackle Privacy Harms, Bias, and Regulatory Hurdles in New Comment on AI Model Openness

EPIC Submits Amicus Brief Urging Ninth Circuit to Permit Product Liability Claims Against Amazon in Case About the Sale of Suicide Chemicals to Minors

EU Parliament Approves AI Act, Urges Rejection of Transatlantic Data Framework

Support Our Work