Web scraping is the automated collection of information from websites. A lot of the scraping that happens does not harm our privacy. Journalists and researchers scrape data about corporations and the government to expose misconduct or shed light on obscure processes. Corporations scrape data from other corporations to keep tabs on their competitors or to create one-stop shops for consumers to compare prices on goods and services.
But when a scraper collects personal information from social media profiles and other websites, they take control of our data—and the privacy harms that result can be profound. We should not have to choose between having an online presence and being placed in a facial recognition database—which is why we need to regulate the scraping of personal information.
The Privacy Interest In Publicly Available Data
The internet is an essential part of today’s society, and an online presence of some kind is practically a necessity. We often make information like our name, photo, and other information viewable to the public so that people can find us. Many professionals are expected to be on networking sites such as LinkedIn, where we sometimes disclose our names, photos, cities, work history, and education; students are expected to be on Facebook or Instagram to hear about events on campus, and these profiles may make our names, photos, and other information publicly viewable. Many peer-to-peer services also involve some sort of public profile, like our names and photos on Venmo that help friends and family find the right account to pay.
When we make information available for the public to view on social media or the web, we do not expect or intend that others will take that information and do with it as they please. We expect that our data will only be used for purposes we choose, and that the privacy controls that we select for the data will be respected.
Privacy Harms From Web Scraping
Scrapers do not adhere to the privacy policies of the websites they scrape, nor do they ask our permission to take our data or process it. When our personal information is scraped, we lose control of that information. We can no longer limit who can view the information, what it is used for, or delete it. The information can be used in ways we never intended or consented to when we posted the information.
Scraped personal information might be:
- Combined and/or enriched with data from other sources to create detailed profiles on individuals;
- Sold to data brokers, scammers, or governments;
- Used to score you or otherwise make decisions about you;
- Used to create biometric profiles, like Clearview AI’s facial recognition database, which is built entirely from scraped photographs.
Preventing these harms requires regulating the collection and use of publicly available personal information on the internet. It also requires companies that host personal data on their websites to actively monitor automated scraping on their servers and to stop scrapers before they cause extensive harm.
EPIC is leading the fight for a comprehensive federal privacy law and a federal data protection agency, which would regulate data collection and processing from web scraping. EPIC has also participated as a friend of the court in hiQ v. LinkedIn to represent the interests of the public against the scraping of their personal information from LinkedIn.
Recent Documents on Web Scraping
US Supreme Court
Whether a police officer "exceeds authorized access" under the Computer Fraud & Abuse Act when they access personal information in a government database for an improper purpose.
US Supreme Court
Whether a court can compel an internet company to provide third-party data scrapers access to user data.