Vox: The tricky truth about how generative AI uses your data

July 27, 2023

Simply put, generative AI systems need as much data as possible to train on. The more they get, the better they can generate approximations of how humans sound, look, talk, and write. The internet provides massive amounts of data that’s relatively easy to gobble up through web scraping tools and APIs. But that gobbling process doesn’t distinguish between copyrighted works or personal data; if it’s out there, it takes it.

“In the absence of meaningful privacy regulations, that means that people can scrape really widely all over the internet, take anything that is ‘publicly available’ — that top layer of the internet for lack of a better term — and just use it in their product,” said Ben Winters, who leads the Electronic Privacy Information Center’s AI and Human Rights Project and co-authored its report on generative AI harms.

Read the full article here.

Support Our Work

EPIC's work is funded by the support of individuals like you, who allow us to continue to protect privacy, open government, and democratic values in the information age.

Donate