Search Engine Privacy

Introduction

Internet search engines are the primary means by which individuals access Internet content. Internet users submit more than 15 Billion searches per month. Typically, search engines collect detailed information that is personally identifiable or can be made personally identifiable. This information includes the search terms submitted to the search engine, as well as the time, date, and location of the computer submitting the search. This data is collected for marketing and consumer profiling purposes. Companies also use search engine data to carry out research and compile usage statistics. Search engines also link individuals' names and other personal information with websites and news stories that may be inaccurate, misleading, or harassing.

Search data is one of the most sensitive types of personal information, and its collection and use by Internet firms poses significant consumer privacy risks. As a result of behavioral marketing methods and the potential exposure of sensitive personal information, privacy groups have called for greater protections for search data. Specifically, privacy advocates have called for strict limitations on the collection, retention, and disclosure of information relating to Internet Protocol (IP) addresses. IP addresses are one of the main methods of identifying Internet users. Other methods include browser fingerprinting, tracking cookies, and search query analysis (particularly with regard to vanity searches). Most users are unaware that search engines collect their personally identifiable data. The majority of users polled in 2015 think that online advertisers should not have any information about their online activities.

Top News

Background

IP and MAC Addresses

An Internet Protocol ("IP") address is a numerical identifier that is used by a computer to send and receive data on a network. An IP address for a computer is similar to a telephone number for a telephone, a “housing addresses” of networked devices. Most modern networks use the TCP/IP protocol to communicate, but there are now two different standards used for IP addresses. All computers that connect to IP networks have an assigned IPv4 address, which is a 32-bit address expressed by four numbers separated by dots (e.g. 192.168.1.1). Many modern devices now also use IPv6 addresses, which are 128-bit identifiers expressed by eight groups of hexadecimal numbers separated by colons (though groups of numbers consisting of all zeroes are often omitted to save space).

Due to the limited size of the IPv4 address space (4,294,967,296 total numbers) and to avoid confusion, the Internet Assigned Numbers Authority (IANA) has reserved three "blocks" for use by private networks (the 10/8, 172.16/12, and 192.168/16 prefixes). These private addresses are commonly assigned to computers on local networks for homes, businesses, or educational institutions. As a result, "public" IP addresses can be shared by multiple computers. An single computer can also be assigned multiple IP addresses if it has multiple network interfaces (e.g. wireless, wired, etc). The IPv6 address space, by contrast, is much larger (3.4 × 1038 addresses) and each device can be uniquely identified. In addition to the IP address, each device with a network connection has a unique media access control (MAC) address for each “distinct point of attachment" (network card or interface). Marketing agencies rely on usernames, IP addresses, and other digital identifiers to track users across the web, and to deliver targeted ads.

Behavioral Marketing

The emergence of targeted Internet advertising has led to "behavioral marketing." In the course of recording users' viewing habits and monitoring their search terms, companies collect information about user interests and tastes, including the things they buy, the stories they read, and the websites they visit, in addition to very sensitive personal information. Search terms entered into search engines may reveal a plethora of personal information such as an individual's medical issues, religious beliefs, political preferences, sexual orientation, and investments. The expansion of the behavioral marketing industry, as well as its ability and incentive to monitor online search behavior, has produced significant privacy problems and substantial risks to Internet users. Opaque industry practices result in consumers remaining largely unaware of the monitoring of their online behavior, the security of this information and the extent to which this information is kept confidential. Industry practices, in the absence of strong privacy principles, also prevent users from exercising any meaningful control over their personal data that is obtained.

Right to Be Forgotten

In 2014, the European Court of Justice ruled that European citizens have a limited right to deindex websites from search results of searches of the person’s name. A website is subject to removal if it contains information that is “inadequate, irrelevant or excessive in relation to” the information’s original purpose. In so ruling, the Court concluded that the fundamental right to privacy is greater than the economic interest of the commercial firm and, in some circumstances, the public interest interest in access to information.

Regulation of Search Engines

Public Disclosure of Search Engine Data by US Service Providers

In 2006, America Online (AOL) published three months of search records for 658,000 Americans. AOL attempted to "anonymize" the records, and intended for academics and technologists to use the data for research purposes. The records did not link searches to IP addresses or user names, but did group searches by individual users via randomly-assigned numerical IDs. Subsequent events demonstrated that AOL's storage of numerical IDs as opposed to usernames or IP addresses does not necessarily prevent search data from being linked back to individuals. Though the search logs released by AOL had been "anonymized," identifying the user by only a number, quick research by New York Times reporters matched some user numbers with the correct individuals. Other sources identified sensitive and occasionally disturbing personal information in the AOL search data, including user searches for "how to kill your wife" "anti psychotic drugs," and "aftermath of incest." In response, several privacy groups filed complaints with the Federal Trade Commission.

EU Regulation of Search Engines

The European Union Data Protection Directive requires search engines to "delete or irreversibly anonymise personal data once they no longer serve the specified and legitimate purpose" for which they were collected. Retention of personal data by search engines for more than six months is presumed to be unnecessary. Search engines that retain personal data for longer periods must "demonstrate comprehensively that it is strictly necessary for the service." This requirement applies to IP address data, which virtually all search engines collect each time a user runs a search. The EU also imposes limits on the lifetime of search engines' cookies - small computer files that can track users between multiple sessions and web sites. As a technical matter, every cookie expires eventually, and web sites can easily select the expiration dates for their cookies. EU guidelines prohibit search engines from setting expiration dates farther in the future than necessary to provide search services.

Article 29 Data Protection Working Party

  • Article 29 Data Protection Working Party Opinion on data protection issues related to search engines, April 4, 2008.
  • Article 29 Data Protection Working Party Statement, February 19, 2008.
  • Article 29 Working Group - Main Page.
  • The Article 29 Working Group's April 4, 2008 report issued a set of obligations to search engine firms, including:
    • Search engines should get informed consent from users if they correlate personal data across different services, such as desktop search;
    • Search engine providers must delete or anonymise (in an irreversible and efficient way) personal data once they are no longer necessary for the purpose for which they were collected;
    • Personal data should not be held by search engines for longer than six months;
    • In case search engine providers retain personal data longer than six months, they must demonstrate comprehensively that it is strictly necessary for the service;
    • It is not necessary to collect additional personal data from individual users in order to be able to perform the service of delivering search results and advertisements;
    • If search engine providers use cookies, their lifetime should be no longer than demonstrably necessary;
    • Search engine providers must give users clear and intelligible information about their identity and location and about the data they intend to collect, store, or transmit, as well as the purpose for which they are collected

EPIC's Work

IP Addresses and Privacy

IP Address Privacy in the United States

In the United States, federal law does not provide uniform privacy protections for personal data submitted to search engines or for IP addresses. Some federal regulations (i.e. 45 C.F.R. § 164.514(b)(O)) treat IP addresses as "individually identifiable" information for specific purposes, but such treatment is not comprehensive.

IP Address Privacy in the European Union

The European Commission classifies IP addresses as personal data. Search engine data falls under the relevant EU data protection directives, and EU regulations generally apply to search engine companies even when they are headquartered outside Europe. Search engines must comply with European privacy provisions if they maintain an establishment in one of the EU Member States, or if they use automated equipment based in one of the Member States for the purposes of processing personal data. European privacy rules limit the collection, use, and disclosure of personal information. The privacy officials who make up the EU Article 29 Working Group have stated that "the protection of the users' privacy and the guaranteeing of their rights, such as the right to access to their data and the right to information as provided for by the applicable data protection regulations, remain the core issues of the ongoing debate."

Corporate Policies Regarding IP Address Privacy

Google, the leading Internet search engine, automatically collects its users' search terms in connection with their IP addresses. Google states that, after collection, it retains the personally identifiable information for 18 months, and then "anonymizes" the data linking search terms to specific IP addresses by erasing the last octect of the IP address.

On December 17, 2008, Yahoo announced that it would erase the last octect of the IP address after 90 days. The search engine company previously retained the data for for 13 months.

Microsoft makes search query data anonymous after 18 months by permanently removing cookie IDs, the entire IP address and other identifiers from search terms.

Ixquick states that it deletes users' search data (including IP addresses) within 48 hours. Ixquick further states that it does not set any uniquely identifying cookies, and that it shares data with 3rd parties only in limited circumstances.

News

Share this page:

Support EPIC

EPIC relies on support from individual donors to pursue our work.

Defend Privacy. Support EPIC.

#Privacy

EPIC Bookstore

Communications Law and Policy

Communications Law and Policy
Jerry Kang and Alan Butler