United States v. Wilson

Whether the Fourth Amendment permits constant scanning of images uploaded to Google with corresponding reports automatically sent to law enforcement, absent evidence establishing that the underlying algorithm is accurate and reliably detects only contraband images

Summary

Google has developed a proprietary image matching algorithm that the company uses to scan every file uploaded to Google's services for alleged child pornography. The algorithm takes in an image file, processes the data, and returns an alphanumeric string, called a "hash value", that Google then tries to match to a repository of hash values corresponding to images it has flagged as child pornography. When Google's software detects a match, the company sends a report to the National Center for Missing and Exploited Children (NCMEC), including the user's personal data such as their IP address and secondary email address. NCMEC then gathers even more personal data about the Google user to send to law enforcement. Reporting is often automatic, such that no Google employee checks whether the matched file is, in fact, contraband. In this case, Defendant's file uploads were flagged as child pornography and automatically reported to NCMEC. Defendant challenged use of the evidence at trial as a violation of his Fourth Amendment right against unreasonable search. The District Court denied the motion to suppress, finding that Google conducted a private search, and that police did not expand that search. Defendant appealed to the Ninth Circuit. EPIC also filed an amicus brief in a similar case in the Sixth Circuit, United States v. Miller.

Background

Legal Background

The Fourth Amendment only protects against searches by the government, not private entities. In United States v. Jacobsen, 466 U.S. 109, 131 (1984), the Supreme Court decided that government searches that follow private searches and are within the scope of the private search are reasonable. In Jacobsen, the Court held that the Government’s warrantless inspection and testing of the contents of a package that had been previously searched by FedEx was permissible because “there was a virtual certainty” that the law enforcement officer’s search would not reveal “anything more than he had already been told.”

The question in this case is whether the Government has provided sufficient evidence to establish that there was "virtual certainty" that the files Google sent in a CyberTipline Report to the NCMEC, and were ultimately opened by police, were the same as those a Google employee previously viewed.

Factual Background

Google maintains a prorprietary image matching system that automatically scans files uploaded to Google products, including Gmail, to search for child pornography. The defendant uploaded two images to Google's e-mail system, which flagged the images as "apparent child pornography." Google's system flagged the defedant's images, and then automatically generated and submitted “CyberTip Report # 5778397” to the National Center for Missing and Exploited Children (“NCMEC”) with the following information:

  • the date and time of the incident;
  • the e-mail address associated with the user account that uploaded the file;
  • the IP address associated witht he upload;
  • a list of IP addresses used to access the user account (which can go as far back as the original account registration date);
  • the secondary email address associated with the account;
  • the filename;
  • the "categorization" of the image based on an existing rubric; and
  • copies of te image files(s);

Google was required by law to submit this CyberTipline report once it became aware of apparent child pornography. 18 U.S.C. § 2258A.

When NCMEC received Google's CyberTipline report, NCMEC staff initiated a websearch for the email and IP addresses associated with the report without opening the images sent by Google to confirm that they were contraband. NCMEC identifies information associated with the user’s IP address(es): Country, Region, City, Metro Code, Postal Code, Area Code, Latitude/Longitude, and Internet Service Provider or Organization. NCMEC staff also collect "data gathered from searches on publicly-available, open-source websites" using the account and user identifying informatiomn provided by the CyberTipline report. This information can include social media profiles, websites, addresses, and other personal data.

After NCMEC staff collected this information on defendant, the report was referred to local police for potential investigation. A detective opened the images attached to the Cybertipline report and confirmed they were child pornograpy.

The extent of what is known about Google’s practices in using the hashing technology is described in the declaration of Cathy McGoff, a Senior Manager for Law Enforcment and Information Security at Google:

4. Based on [Google’s] private non-government interests, since 2008, Google has been using its own proprietary hashing technology to tag confirmed child sexual abuse images. Each offending image, after it is viewed by at least one Google employee is given a digital fingerprint (“hash”) that our computers can automatically recognize and is added to our repository of hashes of apparent child pornography as defined in 18 USC § 2256. Comparing these hashes to hashes of content uploaded to our services allows us to identify duplicate images of apparent child pornography to prevent them from continuing to circulate on our products.

5. We also rely on users who flag suspicious content they encounter so we can review it and help expand our database of illegal images. No hash is added to our respository without a corresponding image first having been visually confirmed by a Google employee to be apparent child pornography.

6. Google trains a team of employees on the legal obligation to report apparent child pornography. The team is trained by counsel on the federal statutory definition of child pornography and how to recognize it on our products and services. Google makes reports in accordance with that training.

7. When Google’s product abuse detection system encounters a hash that matches a hash of a known child sexual abuse image, in some cases Google automatically reports the user to NCMEC without re-reviewing the image. In other cases, Google undertakes a manual, human review, to confirm that the image contains apparent child pornography before reporting it to NCMEC.

While Google describes its algorithm as assigning each image in its repository a "digital fingerprint," there is no information provided on the type of hash function Google uses to assign this "digital fingerprint." This is important becasue file hashing functions work differently than image hashing functions. File hashing functions create a unique hash value for a file, and changing one bit of data will change the hash value of the file. File hashing is a method of demonstrating that two files are the same, bit-for-bit, without comparing each bit to the corresponding bit of the other file, which is very time and resource consuming. In contrast, image hashing algorithms provide a way to match images even if they have been altered slightly, but also enable by design the matching of files that do not have the same file-hash values.

Procedural History

The Defendant filed a motion to suppress the email, its attachments, and all other evidence obtained subsequently. He argued that Google acted as a government agent in this case and that it was therefore an unreasonable warrantless search under the Fourth Amendment. The Defendant also argues that the Detective's search exceeded the scope of Google's private search. The district court disagreed. In denying the motion to supress, the district court found that Google's search was a private search and that the police did not exceed the scope of the private search because there was a "virtual certainty" that a Google employee had previously viewed the images before the police did so. The district court relied upon Google's representation that its algorithm assigns each image in its database a "digital fingerprint" that is "unique." Defendant was subsequently convicted on several counts and appealed to the U.S. Court of Appeals for the Ninth Circuit.

EPIC's Interest

EPIC seeks to ensure that Fourth Amendment protections keep pace with advances in technology. For instance, EPIC filed an amicus brief before the Supreme Court in Carpenter v. United States arguing that the technological changes justified broader Fourth Amendment protections. The Court declined to extend the “third party doctrine” to permit the warrantless collection of cell site location information. Here, EPIC has an interest in ensuring that the Government does not conduct warrantless searches based on proprietary and potentially unreliable algorithmic search techniques.

This case also implicates questions about the standard of proof required to demonstrate the validity of a new investigative technique, an issue EPIC has advised the courts on previously. EPIC advised the Supreme Court about this issue as amicus curiae in Florida v. Harris, arguing that the government should bear the burden of establishing the reliability of investigative techniques in criminal cases.

EPIC has promoted algorithmic transparency for many years. EPIC has also litigated several cases where algorithms used to make decisions that impact individuals were withheld from the public.

EPIC filed an amicus brief in a similar case in the Sixth Circuit, United States v. Miller.

Legal Documents

U.S. Court of Appeals for the Ninth Circuit, No. 18-50440

U.S. District Court for the Southern District of California, No. 15-2838

Resources

Share this page:

Support EPIC

EPIC relies on support from individual donors to pursue our work.

Defend Privacy. Support EPIC.