March 25, 2003
Representative Adam Putnam
Chair, House Government Reform Subcommittee
on Technology, Information Policy, Intergovernmental
Relations, and the Census
Washington, DC 20515
Representative William Clay
Ranking Member, House Government Reform Subcommittee
on Technology, Information Policy, Intergovernmental
Relations, and the Census
Washington, DC 20515
Re: Hearing on Data Mining: Current Applications and Future Possibilities
Dear Chairman Putnam and Ranking Member Clay,
The Electronic Privacy Information Center (EPIC) submits this letter for inclusion in the hearing record for the March 25, 2003 Oversight Hearing on Data Mining. EPIC is a not-for-profit research center based in Washington, D.C. It was established in 1994 to focus public attention on emerging civil liberties issues and to protect privacy, the First Amendment, and constitutional values. We appreciate the Committee's attention to data mining and its civil liberties implications.
We write to call your attention to the growing practice of federal agencies purchasing commercial databases for law enforcement purposes. It is our view that these activities violate the intent of the Privacy Act and should be suspended.
EPIC initiated a Freedom of Information Act (FOIA) request to seven federal law enforcement agencies in July 2001 to obtain agency records relating to government purchase of personal data from commercial information brokers. The documents obtained from the request and subsequent litigation show that a number of information brokerage companies provide law enforcement agencies with information ranging from Social Security Numbers to professional licenses.
As Congress considers the impact of data mining on privacy and civil liberties, it should focus attention on the risks created by relationships between federal agencies and their private-sector information broker partners. Some of these private-sector information brokers sell detailed consumer purchasing data that exists in no public records system. The information sold by these commercial data brokers could be used for data mining.
We recommend that Congress act now to limit private-sector collection of information, because information collected by private entities is regularly sold to federal law enforcement agencies. This practice contravenes the clear intent of the Privacy Act of 1974.
Data mining is "the process of finding patterns in information contained in large databases." Data mining is employed in different contexts in order to achieve different goals. For instance, data mining is commonly used to detect fraudulent use of credit cards. It has also been employed by companies to detect defective parts in a manufacturing line.
When employed for the limited purposes of fraud detection or product quality, data mining poses little risk to privacy and civil liberties. However, when these systems are employed to evaluate future intent or action, data mining presents serious risks to a distinctly American value: "the right to be let alone." For instance, the Transportation Security Administration is currently developing a data mining system called CAPPS II, the Enhanced Computer Assisted Passenger Profiling System. CAPPS II would sift through credit report header information and over one hundred unnamed commercial and government databases to attempt to assess a passenger's risk to a transportation system.
Retired Admiral John Poindexter leads a research project at the Defense Advanced Research Project Agency that is developing a data mining system similar to CAPPS II. The system, Total Information Awareness, purports to capture the "information signature" of people so that the government can track suspicious persons. The project calls for the development of "revolutionary technology for ultra-large all-source information repositories," which would contain information from multiple sources to create a "virtual, centralized, grand database." This database would be populated by transaction data contained in current databases such as financial records, medical records, communication records, and travel records as well as new sources of information. Also fed into the database would be intelligence data.
These two systems are highly invasive because they are operated by federal agencies, are conducted in secret, draw upon a wide array of data sources, and attempt to predict future human action. These two systems have been made possible by private sector information sources, which have voraciously collected information on individuals.
The Privacy Act
In 1974, the Congress, with broad bipartisan support, enacted comprehensive legislation to prevent precisely the type of data profiling that is now under consideration by several federal agencies.
In passing the Privacy Act, Congress found that:
(1) the privacy of an individual is directly affected by the collection, maintenance, use, and dissemination of personal information by Federal agencies;
(2) the increasing use of computers and sophisticated information technology, while essential to the efficient operations of the Government, has greatly magnified the harm to individual privacy that can occur from any collection, maintenance, use, or dissemination of personal information;
(3) the opportunities for an individual to secure employment, insurance, and credit, and his right to due process, and other legal protections are endangered by the misuse of certain information systems;
(4) the right to privacy is a personal and fundamental right protected by the Constitution of the United States; and
(5) in order to protect the privacy of individuals identified in information systems maintained by Federal agencies, it is necessary and proper for the Congress to regulate the collection, maintenance, use, and dissemination of information by such agencies.
The Privacy Act set out several specific purposes. These are to:
(1) permit an individual to determine what records pertaining to him are collected, maintained, used, or disseminated by such agencies;
(2) permit an individual to prevent records pertaining to him obtained by such agencies for a particular purpose from being used or made available for another purpose without his consent;
(3) permit an individual to gain access to information pertaining to him in Federal agency records, to have a copy made of all or any portion thereof, and to correct or amend such records;
(4) collect, maintain, use, or disseminate any record of identifiable personal information in a manner that assures that such action is for a necessary and lawful purpose, that the information is current and accurate for its intended use, and that adequate safeguards are provided to prevent misuse of such information;
(5) permit exemptions from the requirements with respect to records provided in this Act only in those cases where there is an important public policy need for such exemption as has been determined by specific statutory authority; and
(6) be subject to civil suit for any damages which occur as a result of willful or intentional action which violates any individual's rights under this Act.
The Privacy Protection Study Commission created by the Privacy Act recommended that these protections be extended to private-sector collection of information.  However, Congress did not act to extend protections to private-sector information collectors.
Now that private sector entities are engaging in practices that enable federal agencies to violate the purposes of the federal Privacy Act, we believe that Congress should regulate these businesses.
Private-Public Sector Partnerships Create New Data Mining Risks
EPIC initiated a FOIA request to seven federal law enforcement agencies in July 2001. Documents obtained from the request and subsequent litigation show that a number of companies provide law enforcement with personal information. There is a risk that this information could be used for wide- scale data mining.
The documents obtained by EPIC under the FOIA demonstrate that commercial database vendors sell volumes of personal information to federal investigative agencies. These companies possess multi-million dollar contracts with federal agencies to provide desktop computer access to personal information. If these databases of information were used for data mining, it would represent a serious threat to First and Fourth Amendment Constitutional values.
The documents obtained by EPIC show that a number of companies are selling personal data to the government:
1. The Department of Justice obtained a $11,000,000 contract for access to ChoicePoint databases in fiscal year 2002. ChoicePoint is a large provider of credit header and public records information. A credit header lists the name, address, previous address, place of employment, spouse's name, and the Social Security Number of an individual. The company's databases include financial reports, education and employment verification, criminal records checks, and motor vehicle records. ChoicePoint also sells personal information on citizens of Argentina, Brazil, Columbia, Costa Rica, Mexico, Honduras, Nicaragua, Guatemala, and Venezuela.
2. Several agencies have contracts with Dun and Bradstreet in order to obtain personal information of business owners.
3. Lexis Nexis sells a broad array of information to government, including access to its "Nationwide Person Tracker," a database of 324 million individuals along with their Social Security Numbers. Lexis Nexis also sells motor vehicle records, flight license records, professional license records, and a military personnel location service.
4. One document obtained from the Internal Revenue Service shows that the agency wished to obtain 25,000 credit headers a month from private databases. Experian, one of the largest credit bureaus, is listed as the source for credit headers and full reports for IRS access.
5. One document obtained from the Immigration and Naturalization Service shows that the agency queries private sector databases 20,000 times a month.
6. Although some documents reference regulations that prohibit personal use of these information services, none indicates that the agencies audit or otherwise monitor agency use or misuse of the records systems.
The federal agencies that purchase this information are circumventing privacy protections passed by Congress in the Privacy Act. In effect, federal agencies are able to access detailed personal information, maintained by the private sector, while technically side-stepping obligations under the Privacy Act. Simply put, since the federal government is prohibited from building a general national data center, agencies have privatized this function, and can now obtain information on anyone from their desktop computers.
Now that commercial-sector brokers regularly sell information to the federal government, thereby allowing the government to have access to detailed dossiers without actually maintaining the database, Congress should revisit this issue, and apply Privacy Act protections to the private sector.
Future Possibilities: Employment of Consumer Data for Government Data Mining
A future data mining risk flows from private-sector collection of consumer habit information. Some of the same companies that are engaged in private-public sector partnerships also maintain databases of consumer information that could be sold to the government. Experian, for instance, sells marketing databases with the names, addresses, and other personal details of racial and ethnic minorities. The company also sells medical information for marketing. Its medical marketing databases, for instance, include a list of people believed to be suffering from bladder control problems.
Collectors of consumer information are willing to categorize, compile, and sell virtually any tidbit of information. For instance, the Medical Marketing Service sells lists of persons suffering from various ailments. These lists are cross-referenced with information regarding age, educational level, family dwelling size, gender, income, lifestyle, marital status, and presence of children. The list of ailments includes: diabetes, breast cancer, and heart disease. Other companies sell databases of information relating to individuals lifestyle habits, reading preferences, and even religion.
Another consumer profiling company divides individuals into fifteen different groups, which are in turn categorized into various subgroups. These include "Pools & Patios," "Big Fish Small Pond," "Shotguns and Pickups," and "Urban Cores." The assumptions drawn on these categories of people often can be racially-charged and objectionable. They also can catalog populations of people who are at-risk for hate crimes or other stigmatization. For instance, PlanetOut.com sells lists of consumers identified as homosexual.
Consumer collection of information occurs through aggregating information from online and offline purchase data, supermarket savings cards, white pages, surveys, sweepstakes and contest entries, financial records, property records, U.S. Census records, motor vehicle data, automatic number information, credit card transactions, phone records (Customer Proprietary Network Information or "CPNI"), credit records, product warranty cards, the sale of magazine and catalog subscriptions, and public records.
There are no standards for the collection of consumer data, and it widely known in the industry that consumer information databases are riddled with errors. There is a serious and credible risk that this consumer information may be employed for data mining purposes related to risk assessment. Congress should act now to prevent this improper, secondary use of personal information.
Congress should take action to ensure that the government does not use commercial data sources for data mining.
1. Congress should begin oversight hearings on the information brokers' practices.
2. Agencies should be asked to routinely report on the private-sector databases that they have purchased, including the number of records obtained, and the specific characteristics of the data.
3. Congress should determine whether Privacy Act obligations should be applied to the entire information broker industry, as these businesses are now engaged in the practice of building government profiles of individuals that would be regulated under the Privacy Act.
We appreciate this opportunity to share with the Committee information about the risks inherent in certain types of data mining. Please contact us if we can be of more assistance in this debate.
Chris Jay Hoofnagle
1 Usama Fayyad, Data Mining, in ENCYCLOPEDIA OF COMPUTER SCIENCE (A. Ralston, E. Reilly, & D. Hemmendinger eds., 4th ed. 2000).
2 The Commission recommended that Privacy Act protections extend to the consumer credit, insurance, banking, and medical care industries. U.S. Privacy Protection Study Commission, Personal Privacy in an Information Society (Washington: GPO, 1977), available at http://aspe.hhs.gov/datacncl/1977privacy/toc.htm
3 Sample documents are enclosed as attachments "A-D." An entire collection of documents obtained from the Justice Management Division are online at http://www.epic.org/privacy/publicrecords/jmdchoicepoint.pdf.
4 See attachment A.
5 See generally ChoicePoint Online List of Services, available at http://www.epic.org/privacy/profiling/choicepointlistofservices.pdf.
6 See attachment B (B2) (B3).
7 See attachment C.
8 See attachment D. (D2)
9 Experian List Services Catalog (on file with EPIC), excerpts available at http://www.epic.org/privacy/profiling/experianlistservices.pdf.
11 Consumers By Ailment, Medical Marketing Service (on file with author). This list has been removed from the Internet, but is still available via the Google Cache: http://22.214.171.124/search?q=cache:kKDlOrzU2Q4C:www.mmslists.com/consumers_by_ailment_counts.htm+&hl=en&ie=UTF-8.
12 A number of companies sell religious affiliation information, including the Post-Newsweek company's "Catholic Subscriber" database, which is described online at http://dmipublic.directmedia.com/datacard/dmicards/dmi/47/dm47610.stm.
13 The Claritas Prizm and MicroVision clustering services are online at http://cluster2.claritas.com/YAWYL/Default.wjsp?System=WL.
14 Meet Your Best Customer, PlanetOut Partners, at http://www.planetoutpartners.com/sales.html (last visited Jan 20, 2003).
15 See generally Experian Insource Enhancement, available at http://www.epic.org/privacy/profiling/experianinsourceenhancement.pdf