United States v. Wilson
- Appeals Court Questions Government on Reliability of Google Scanning Algorithm: This week a federal appellate judge pressed the government about the reliability of a Google scanning algorithm that provided the basis for the warrantless search of a private email. EPIC raised concerns about the scanning technique in an amicus brief for the appeals court. In United States v. Wilson, EPIC argued that "because neither Google nor the Government explained how the image matching technique actually works or presented evidence establishing accuracy and reliability, the Government's search was unreasonable." Judge Watford told the government attorney that he "would like to hear your defense of the evidentiary record" because what we have "is this declaration from the Google person," and "I would need far more explanation of how reliable the hash matching technology is before I could validate this search." EPIC filed an amicus brief in a similar case in United States v. Miller. EPIC routinely submits amicus briefs on the privacy implications of new investigative techniques. EPIC has also long promoted algorithmic transparency to ensure accountability for AI-based decision making. (Nov. 18, 2019)
- EPIC Warns Appellate Court of Google’s Flawed, Secretive, Massive File Scanning Program: EPIC has filed an amicus brief in United States v. Wilson, a case concerning Google’s scanning of billions of personal files for suspected unlawful content, at the behest of the federal government. EPIC argued that “because neither Google nor the Government explained how the image matching technique actually works or presented evidence establishing accuracy and reliability, the Government’s search was unreasonable.” EPIC also explained that “the lower court made a key mistake” by confusing file hashing, which uniquely identifies a file, and image matching, which is prone to false positives. Last year, EPIC filed an amicus brief in a similar case, United States v. Miller. EPIC has promoted algorithmic transparency for many years. EPIC routinely submits amicus briefs on the application of the Fourth Amendment to investigative techniques. (Mar. 29, 2019) More top news »
Google has developed a proprietary image matching algorithm that the company uses to scan every file uploaded to Google's services for alleged child pornography. The algorithm takes in an image file, processes the data, and returns an alphanumeric string, called a "hash value", that Google then tries to match to a repository of hash values corresponding to images it has flagged as child pornography. When Google's software detects a match, the company sends a report to the National Center for Missing and Exploited Children (NCMEC), including the user's personal data such as their IP address and secondary email address. NCMEC then gathers even more personal data about the Google user to send to law enforcement. Reporting is often automatic, such that no Google employee checks whether the matched file is, in fact, contraband. In this case, Defendant's file uploads were flagged as child pornography and automatically reported to NCMEC. Defendant challenged use of the evidence at trial as a violation of his Fourth Amendment right against unreasonable search. The District Court denied the motion to suppress, finding that Google conducted a private search, and that police did not expand that search. Defendant appealed to the Ninth Circuit. EPIC also filed an amicus brief in a similar case in the Sixth Circuit, United States v. Miller.
The Fourth Amendment only protects against searches by the government, not private entities. In United States v. Jacobsen, 466 U.S. 109, 131 (1984), the Supreme Court decided that government searches that follow private searches and are within the scope of the private search are reasonable. In Jacobsen, the Court held that the Government’s warrantless inspection and testing of the contents of a package that had been previously searched by FedEx was permissible because “there was a virtual certainty” that the law enforcement officer’s search would not reveal “anything more than he had already been told.”
The question in this case is whether the Government has provided sufficient evidence to establish that there was "virtual certainty" that the files Google sent in a CyberTipline Report to the NCMEC, and were ultimately opened by police, were the same as those a Google employee previously viewed.
Google maintains a prorprietary image matching system that automatically scans files uploaded to Google products, including Gmail, to search for child pornography. The defendant uploaded two images to Google's e-mail system, which flagged the images as "apparent child pornography." Google's system flagged the defedant's images, and then automatically generated and submitted “CyberTip Report # 5778397” to the National Center for Missing and Exploited Children (“NCMEC”) with the following information:
- the date and time of the incident;
- the e-mail address associated with the user account that uploaded the file;
- the IP address associated witht he upload;
- a list of IP addresses used to access the user account (which can go as far back as the original account registration date);
- the secondary email address associated with the account;
- the filename;
- the "categorization" of the image based on an existing rubric; and
- copies of te image files(s);
Google was required by law to submit this CyberTipline report once it became aware of apparent child pornography. 18 U.S.C. § 2258A.
When NCMEC received Google's CyberTipline report, NCMEC staff initiated a websearch for the email and IP addresses associated with the report without opening the images sent by Google to confirm that they were contraband. NCMEC identifies information associated with the user’s IP address(es): Country, Region, City, Metro Code, Postal Code, Area Code, Latitude/Longitude, and Internet Service Provider or Organization. NCMEC staff also collect "data gathered from searches on publicly-available, open-source websites" using the account and user identifying informatiomn provided by the CyberTipline report. This information can include social media profiles, websites, addresses, and other personal data.
After NCMEC staff collected this information on defendant, the report was referred to local police for potential investigation. A detective opened the images attached to the Cybertipline report and confirmed they were child pornograpy.
The extent of what is known about Google’s practices in using the hashing technology is described in the declaration of Cathy McGoff, a Senior Manager for Law Enforcment and Information Security at Google:
4. Based on [Google’s] private non-government interests, since 2008, Google has been using its own proprietary hashing technology to tag confirmed child sexual abuse images. Each offending image, after it is viewed by at least one Google employee is given a digital fingerprint (“hash”) that our computers can automatically recognize and is added to our repository of hashes of apparent child pornography as defined in 18 USC § 2256. Comparing these hashes to hashes of content uploaded to our services allows us to identify duplicate images of apparent child pornography to prevent them from continuing to circulate on our products.
5. We also rely on users who flag suspicious content they encounter so we can review it and help expand our database of illegal images. No hash is added to our respository without a corresponding image first having been visually confirmed by a Google employee to be apparent child pornography.
6. Google trains a team of employees on the legal obligation to report apparent child pornography. The team is trained by counsel on the federal statutory definition of child pornography and how to recognize it on our products and services. Google makes reports in accordance with that training.
7. When Google’s product abuse detection system encounters a hash that matches a hash of a known child sexual abuse image, in some cases Google automatically reports the user to NCMEC without re-reviewing the image. In other cases, Google undertakes a manual, human review, to confirm that the image contains apparent child pornography before reporting it to NCMEC.
While Google describes its algorithm as assigning each image in its repository a "digital fingerprint," there is no information provided on the type of hash function Google uses to assign this "digital fingerprint." This is important becasue file hashing functions work differently than image hashing functions. File hashing functions create a unique hash value for a file, and changing one bit of data will change the hash value of the file. File hashing is a method of demonstrating that two files are the same, bit-for-bit, without comparing each bit to the corresponding bit of the other file, which is very time and resource consuming. In contrast, image hashing algorithms provide a way to match images even if they have been altered slightly, but also enable by design the matching of files that do not have the same file-hash values.
The Defendant filed a motion to suppress the email, its attachments, and all other evidence obtained subsequently. He argued that Google acted as a government agent in this case and that it was therefore an unreasonable warrantless search under the Fourth Amendment. The Defendant also argues that the Detective's search exceeded the scope of Google's private search. The district court disagreed. In denying the motion to supress, the district court found that Google's search was a private search and that the police did not exceed the scope of the private search because there was a "virtual certainty" that a Google employee had previously viewed the images before the police did so. The district court relied upon Google's representation that its algorithm assigns each image in its database a "digital fingerprint" that is "unique." Defendant was subsequently convicted on several counts and appealed to the U.S. Court of Appeals for the Ninth Circuit.
EPIC seeks to ensure that Fourth Amendment protections keep pace with advances in technology. For instance, EPIC filed an amicus brief before the Supreme Court in Carpenter v. United States arguing that the technological changes justified broader Fourth Amendment protections. The Court declined to extend the “third party doctrine” to permit the warrantless collection of cell site location information. Here, EPIC has an interest in ensuring that the Government does not conduct warrantless searches based on proprietary and potentially unreliable algorithmic search techniques.
This case also implicates questions about the standard of proof required to demonstrate the validity of a new investigative technique, an issue EPIC has advised the courts on previously. EPIC advised the Supreme Court about this issue as amicus curiae in Florida v. Harris, arguing that the government should bear the burden of establishing the reliability of investigative techniques in criminal cases.
EPIC filed an amicus brief in a similar case in the Sixth Circuit, United States v. Miller.
U.S. Court of Appeals for the Ninth Circuit, No. 18-50440
- Brief of Defendant-Appellant (Mar. 21, 2019)
- Amicus Briefs in Support of Appellant
- Amicus Brief of Electronic Privacy Information Center in Support of Appellant (Mar. 28, 2019)
- Amicus Brief of the Electronic Frontier Foundation and the ACLU (Mar. 28, 2018)
- Government's Answering Brief (June 21, 2019)
- Amicus Brief of Facebook & Google in Support of Appellee (June 28, 2019)
U.S. District Court for the Southern District of California, No. 15-2838
- United State's Response to Defendant's Motion to Suppress (February 28, 2017)
- Memorandum Order Denying Motion to Suppress (Jun. 26, 2017)
Share this page:
Subscribe to the EPIC Alert
The EPIC Alert is a biweekly newsletter highlighting emerging privacy issues.