This case follows the prosecution of an individual based on the discovery of two illegal images that he uploaded via Gmail. These images were automatically flagged by Google’s “product abuse detection system” based on the company’s proprietary hashing technology and automatically relayed to government investigators without human review of a matching image. The defendant has argued that the searches of his e-mail data were unreasonable under the Fourth Amendment. The lower court found that the “private search” doctrine applied and exempted these actions from Fourth Amendment scrutiny. Neither Google nor the Government has produced the underlying algorithm used to scan the images, and the Government has not established that the investigative technique is accurate or reliably identifies only contraband images.
The Fourth Amendment only protects against searches by the government, not private entities. In United States v. Jacobsen, 466 U.S. 109, 131 (1984), the Supreme Court decided that government searches that follow private searches and are within the scope of the private search are reasonable. In Jacobsen, the Court held that the Government’s warrantless inspection and testing of the contents of a package that had been previously searched by FedEx was permissible because “there was a virtual certainty” that the law enforcement officer’s search would not reveal “anything more than he had already been told.”
The question in this case is whether the Government has provided sufficient evidence to establish that there was “virtual certainty” that the files Google sent in a CyberTipline Report to the NCMEC, and were ultimately opened by police, were the same as those a Google employee previously viewed.
Google maintains a prorprietary image matching system that automatically scans files uploaded to Google products, including Gmail, to search for child pornography. The defendant uploaded two images to Google’s e-mail system, which flagged the images as “apparent child pornography.” Google’s system flagged the defedant’s images, and then automatically generated and submitted “CyberTip Report # 5778397” to the National Center for Missing and Exploited Children (“NCMEC”) with the following information:
the date and time of the incident;
the e-mail address associated with the user account that uploaded the file;
the IP address associated witht he upload;
a list of IP addresses used to access the user account (which can go as far back as the original account registration date);
the “categorization” of the image based on an existing rubric; and
copies of te image files(s);
Google was required by law to submit this CyberTipline report once it became aware of apparent child pornography. 18 U.S.C. § 2258A.
When NCMEC received Google’s CyberTipline report, NCMEC staff initiated a websearch for the email and IP addresses associated with the report without opening the images sent by Google to confirm that they were contraband. NCMEC identifies information associated with the user’s IP address(es): Country, Region, City, Metro Code, Postal Code, Area Code, Latitude/Longitude, and Internet Service Provider or Organization. NCMEC staff also collect “data gathered from searches on publicly-available, open-source websites” using the account and user identifying informatiomn provided by the CyberTipline report. This information can include social media profiles, websites, addresses, and other personal data.
After NCMEC staff collected this information on defendant, the report was referred to the Kentucky State Police and the Kenton County Police Department for potential investigation. A detective in the KCPD opened the images attached to the Cybertipline report and confirmed they were child pornograpy.
The extent of what is known about Google’s practices in using the hashing technology is described in the declaration of Cathy McGoff, a Senior Manager for Law Enforcment and Information Security at Google:
4. Based on [Google’s] private non-government interests, since 2008, Google has been using its own proprietary hashing technology to tag confirmed child sexual abuse images. Each offending image, after it is viewed by at least one Google employee is given a digital fingerprint (“hash”) that our computers can automatically recognize and is added to our repository of hashes of apparent child pornography as defined in 18 USC § 2256. Comparing these hashes to hashes of content uploaded to our services allows us to identify duplicate images of apparent child pornography to prevent them from continuing to circulate on our products.
5. We also rely on users who flag suspicious content they encounter so we can review it and help expand our database of illegal images. No hash is added to our respository without a corresponding image first having been visually confirmed by a Google employee to be apparent child pornography.
6. Google trains a team of employees on the legal obligation to report apparent child pornography. The team is trained by counsel on the federal statutory definition of child pornography and how to recognize it on our products and services. Google makes reports in accordance with that training.
7. When Google’s product abuse detection system encounters a hash that matches a hash of a known child sexual abuse image, in some cases Google automatically reports the user to NCMEC without re-reviewing the image. In other cases, Google undertakes a manual, human review, to confirm that the image contains apparent child pornography before reporting it to NCMEC.
While Google describes its algorithm as assigning each image in its repository a “digital fingerprint,” there is no information provided on the type of hash function Google uses to assign this “digital fingerprint.” This is important becasue file hashing functions work differently than image hashing functions. File hashing functions create a unique hash value for a file, and changing one bit of data will change the hash value of the file. File hashing is a method of demonstrating that two files are the same, bit-for-bit, without comparing each bit to the corresponding bit of the other file, which is very time and resource consuming. In contrast, image hashing algorithms provide a way to match images even if they have been altered slightly, but also enable by design the matching of files that do not have the same file-hash values.
Detective Aaron Schihil of the Kenton County Police Department received the information from NCMEC and “opened the attachments viewed the relevant images, which he confirmed to be child pornography.” After confirming they were child pornography, Detective Schihil obtained a search warrant for several categories of data held by Google and associated with the Defendant’s account. Detective Schihil later obtained a search warrant for the Defendant’s residence and a separate search warrant for various electronic devices seized at the Defendant’s residence.
The Defendant was charged in the U.S. District Court for the Eastern District of Kentucky and subsequently filed a motion to suppress the evidence obtained by Detective Schhil. He argued that Google’s search was that of a government actor in this case and that it was therefore an unreasonable warrantless search under the Fourth Amendment, or in the alternative that the Detective’s search exceeded the scope of Google’s private search. The district court disagreed. In denying the motion to supress, the district court found that Google’s search was a private search and that the police did not exceed the scope of the private search because there was a “virtual certainty” that a Google employee had previously viewed the images before the police did so. The district court relied upon Google’s representation that its algorithm assigns each image in its database a “digital fingerprint” that is “uniquely associated with the input data.” Defendant was subsequently convicted on several counts and appealed to the U.S. Court of Appeals for the Sixth Circuit.
EPIC seeks to ensure that Fourth Amendment protections keep pace with advances in technology. For instance, EPIC filed an amicus brief before the Supreme Court in Carpenter v. United States arguing that the technological changes justified broader Fourth Amendment protections. The Court declined to extend the “third party doctrine” to permit the warrantless collection of cell site location information. Here, EPIC has an interest in ensuring that the Government does not conduct warrantless searches based on proprietary and potentially unreliable algorithmic search techniques.
This case also implicates questions about the standard of proof required to demonstrate the validity of a new investigative technique, an issue EPIC has advised the courts on previously. EPIC advised the Supreme Court about this issue as amicus curiae in Florida v. Harris, arguing that the government should bear the burden of establishing the reliability of investigative techniques in criminal cases.