Focusing public attention on emerging privacy and civil liberties issues

The Census and Privacy

Introduction

Every ten years, as directed by the US Constitution, the Government conducts a census of all individuals in the country. This enumeration is used both for reapportionment of the members of Congress, as well as for the distribution of taxes. Along with the benefits of the census have come many risks. This page outlines risks to privacy posed by the census.

History

The counting of citizens can be traced back to the Biblical recordings of Moses. In the Book of Numbers, Moses counted people in areas surrounding his kingdom in order to strengthen the count of the population under his control. Scholars discuss that the list of names was used as an original census, creating a legal identity of and control over a group of people.

The history of the United States census dates back to pre-Revolutionary times. It is thought that the census was developed to establish an equitable way to distribute the burden of the Revolutionary War, both economically and in manpower. The expense of the war was proposed to be distributed based upon population, among the 13 colonies, as the new United States government was created. In order to make this uniform, the concept of payment by distribution was included in the Articles of Confederation. The original Congress finally voted that the first distribution method would be by the cumulative value of property within each State. Enumeration of population became the chosen method directly after the Revolutionary War.

The modern census was established in Article I, Section 2, Clause 3 of the US Constitution, providing "Representatives...shall be apportioned among the several States which may be included within this Union, according to their respective Numbers...The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct."

The minimal enumeration of the population described in the Constitution was quickly expanded to include business and socioeconomic information. Jefferson was a main proponent of expanding the enumeration, as he wished to obtain "a more detailed view of the inhabitants" of the country. The first enumeration counted free persons, including women, children, and those bound to service, but only counted three-fifths of slaves and excluded untaxed American Indians. As of March 1, 1790, it was directed that US Marshals gather the name of head of families, number of people in each family, all other free people, and slaves. In addition to this information, James Madison sought occupational and industrial information, but Congress did not authorize collection of this information in the first census.

Nevertheless, as historian Robert C. Davis argues, "the crucial point is that the first act pushed beyond the simple constitutional provision, thereby establishing a precedent for the enormous expansion of the census in the following century." By 1800, the census collected more refined age information; by 1810 the census collected economic information; by 1820 the census collected more detailed occupational information; by 1830 the census collected information on physical disability; and by 1840 the census collected investment and productivity information. Through this expansion, protections were developed to maintain the confidentiality of economic questions, but the population survey was publicly posted unitl 1850 in order to allow individuals to check for errors.

After the first census, Thomas Jefferson and the American Philosophical Society lobbied for the expansion of reported census information on age, birthplace, and occupation for the purpose of ascertaining "the causes which influence life and health" and "the conditions and vocations of our fellow citizens." Since then, the census became an instituted method of gathering information about American Society.

The US Census has been administered every ten years since the Revolutionary War, and it was intended to be used primarily for the apportionment of Representatives for Congress. The complexity of the census has grown with the expansion of the United States; the US government has found extensive uses for census related statistics. The census has also been crucial in tracking the population needs of various regions and understanding the structural composition of the nation's population. Politically, the census has become a tool in the process of congressional reapportionment.

  • Wright, Carroll D. "The History and Growth of the US Census" (excerpt).
  • Privacy, the Census and Federal Questionnaires, Hearings Before the Subcommittee on Constitutional Rights of the Committee on the Judiciary, US Senate (April 1969).
  • Minnesota Population Center (MPC), University of Minnesota "The Public Use Microdata Samples of the U.S. Census: Research Applications and Privacy Issues". Census 2000 Users' Conference on PUMS, Alexandria, VA (May 22, 2000).
  • Robert C. Davis, Confidentiality and the Census, 1790-1929, appendix to Records, Computers and the Rights of Citizens, Report of the Secretary's Advisory Committee on Automated Personal Data Systems, July, 1973.
  • Records and Record Keepers, Records, Computers and the Rights of Citizens, Report of the Secretary's Advisory Committee on Automated Personal Data Systems, July, 1973.

The Census and Privacy

The risks that accompany the electronic compilation personal information include re-identification, which is the practice of linking individuals identities to anonymous census records; marketing solicitations; and even more serious consequences of political abuse. The use of information to identify individuals rather than for the statistical collection of information offers room for abuses of privacy and confidentiality.

Risks regarding privacy and confidentiality are not new issues for the Census. According to Thomas S. Mayer, privacy interests have evolved from the very first census in 1790. In the history of the American census, these privacy concerns have regulated the confidentiality of released information and the privacy considerations of individuals. Recorded protest in 1870 up until 1960 reflect the constitutional issues resulting from the requirement for US residents to provide sensitive personal information. Questions on the census about diseases, mortgage values, and other items have raised many risks.

The census forms the most inclusive federal database of American citizens. The information it contains is protected under law from disclosure, yet with the advent of technology many of the traditional legislative protection are inadequate. The recent use of computers has dramatically altered the structure of the US census. It has allowed the Census Bureau to retain information in an efficient format, while also challenging the traditional methods of information collection. Along with this growing technology, the potential harm has grown exponentially. Technology has allowed the collection of information to move at remarkable speeds and the protection of such information remains a struggle.

  • Kysar, Douglas A., Kids & Cul-De-Sacs: Census 2000 and the Reproduction of Consumer Culture, 87 Cornell L. Rev. 853 (2001).
  • Mayer, Thomas S. "Privacy and Confidentiality Research and the US Census Bureau: Recommendations Based on a Review of the Literature". Research Report Series (Survey Methodology #2002-01) Statistical Research Division, US Bureau of the Census, Feb. 7, 2002, Statistical Review of public and interviewer perceptions.

Statutory Authority for the Census

Title 13 of the US Code regulates the structure of the census and its uses by the government and private entities. 13 USC § 6 authorizes the Census Bureau to acquire data from other agencies instead of conducting direct inquiries. Each agency can then share information, rather than recollecting it. This allows the Census Bureau to receive certain information from agencies like the Social Security Administration or the Internal Revenue Service.

Title 13 USC § 9 regulates privacy of information collected in the Census. Section 9 requires information gathered by the Bureau be kept confidential and be used exclusively for statistical purposes. The statute provides penalties for employees who willfully disclose such information illegally. Part A of Section 9 expressly restricts the Census Bureau from: 1) using the information for any purpose other than statistics, 2) making any publication allowing any individual to be identified or 3) permitting any unauthorized person to examine the census reports. Only authorized people may have Official Copies of the census reports. Even in the case of litigation, census reports are restricted from legal evidence.

Under federal law, individuals can be fined for not completing census questionnaires. Section 221 requires a fine when an individual does not fill out the form, as well as a fine for providing false information. Several cases in the 1960s confirmed that a penalty would be enforced in instances of incomplete questionnaires or lack of response. U.S. v. Richenbacker held that even if social scientists find the questionnaires to be larger than necessary, the enforcement of fines is constitutional. However, fines for non-completion have not been levied in recent years.

Part C of Section 221 does make it illegal for the government to ask questions regarding religion on the census form. Even with the restrictions dictated in Title 13, problems still exist when determining between authorized use and appropriate use. The standard is left to the judgment of agencies, which then weigh the costs and benefits of restricting census information.

  • Dunne, Timothy. Issues in the Establishment and Management of Secure Research Sites, Confidentiality, Disclosure, and Data Access: Theory and Practical Application for Statistical Agencies (Doyle, Lane, Theeuwes, Zayatz, eds., NY: North-Holland 2001)
  • US v. Richenbacker, 309 F.2d 462 (1962).
  • US. v. Sharrow, 309 F.2d 77 (1962).

Risks of the Census

The census performs many useful functions for society. However, there is widespread evidence of the misuse of census data. Hitler notably used the European Census in his conquests across Europe. The misuse of census data can be found in much of the world, including in our own nation. Even recently, privacy risks lead to public statements by politicians regarding the intrusiveness of census questions. The Washington Post quoted Former Senate Majority Leader Trent Lott of encouraging citizens not to answer invasive questions.

  • D'Vera Cohn, Census Too Nosy? Don't Answer Invasive Questions, GOP Suggests, Washington Post, Mar. 30, 2000, at A1

The Civil War

Along with the benefits of census information for war planning, the census can be used for methods of destruction as a war tactic. General Sherman used census data to locate targets during the famed Civil War March though Georgia.

World War II and Japanese Internment

A specific example of the privacy risks of the US census can also be found in the 1940s. During World War II, Japanese-American citizens were rounded up and sent to internment camps. The Census Bureau might not have necessarily given out individual Japanese-American names or numbers, but the Bureau did work with US War Department to offer aggregated data about certain localities. Although there is still a lack of consensus concerning specific conclusions, the Census Bureau has issued a formal apology and now reports that the Bureau did not protect Japanese-Americans.

It has been recorded that even before the Japanese attack on Pearl Harbor, President Franklin Delano Roosevelt ordered the Census Bureau to collect information on "American-born and foreign-born Japanese" from the Census data lists. Information was gathered from the 1930 and 1940 censuses on all Japanese-Americans and then given to the FBI and top military officials. These sources point directly to the census information as one of the reasons that led to the internment of almost 110,000 Japanese-Americans on the West Coast, two-thirds of whom were U.S. citizens.

United Kingdom

A recent example of abuse from abroad can be found in the United Kingdom. It recently has reached the public view that compulsory transfers were considered in Northern Ireland in 1972. A UK government top-secret memo has surfaced describing a plan to relocate Irish Catholics. The plan was written with census data. Although never implemented, the use of census data for non-statistical purposes has caused great concern in Europe.

Germany

Germany has a contrasting history in census reporting. The most extreme example of census abuse is Hitler's use of the census to track minorities for extermination during the NAZI regime. Although this example remains perhaps the most horrifying abuse of the census, Germany's modern use of the census is exactly the opposite. In the aftermath of World War II, privacy protections were placed in the German Constitution. In the 1980s, the German Government instituted a law requiring more information to be provided on the national census. After a public outcry, the law was challenged in court. The issue was brought before the German Federal Constitutional Court by representatives who had been instrumental in the passage of the first German Data Protection Act during the 1970s. The court found the census law unconstitutional based upon what the court termed a fundamental right to informational self-determination implicit in the German Constitution.

After the court decision, the legislature amended the German Data Protection Act in 1990 to include the right of informational self-determination regarding government uses of information as well as information use in the private sector. By including private uses as well, Germany created one of the most broadly reaching privacy protections relating to the census.

European privacy concerns over the census have appeared in strong numbers. Mayer reports on several surveys taken in the 1970's regarding risks over privacy of census reports. In particular England, Germany, the Netherlands, and Northern Ireland reportedly protested in large numbers against the census' undermining of information privacy.

  • Kent Walker, Where Everybody Knows Your Name: A Pragmatic Look at the Costs of Privacy and the Benefits of Information Exchange, 1 Stan. Tech. L. Rev. 106 (2000)(citing Jerry M. Rosenberg, The Death Of Privacy 1 (1969)).
  • Mayer, Thomas S. "Privacy and Confidentiality Research and the US Census Bureau: Recommendations Based on a Review of the Literature". Research Report Series (Survey Methodology #2002-01) Statistical Research Division, US Bureau of the Census, Feb. 7, 2002, Statistical Review of public and interviewer perceptions.
  • Richard Sobel. The Demeaning of Identity and Personhood in the National Identification Systems,15 Harv. J.L. & Tech. 319 (2002).
  • Viktor Mayer-Schonberger, Privacy and the Law: A Symposium, No Choice: Trans-Atlantic Information Privacy Legislation and Rational Choice Theory, 67 Geo. Wash. L. Rev. 1309 (1999).
  • Edwin Black, IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America's Most Powerful Corporation (Crown Publishers 2002).
  • Commission belge de la Protection de la Vie Privée, Avis d'initiative No. 37/2001 of October 8, 2001 concernant l'enquête socio-économique 2001. (Belgian Data Protection Authority's opinion on the compatibility of the 2001 ten-yearly census survey with Belgian privacy regulations.)

Obscuring Individual Data

There are many techniques used to mask data. The challenges of obscuring individual data derive from growing technology and the increasing demand for information. With every new technological advancement in protection of private information, technology is also created to undo such protections. Along with the technological problems of protecting information, there is also the problem of the societal demand for information. The government faces pressure from both the information brokerage industry, who derive their profit off the sale of private information, as well as from branches within the government.

One of the primary methods used to obscure individuals' data is to inflate certain lists of information. List inflation can be used to protect confidentiality by adding nonexistent variables. This is also known as "commingling" or "salting". One controversial technique of traditional data protection is through the use of "noise," by deliberately fudging numbers. "Noise" adds random error to quantitative variables. It can also switch variables for similar respondents to obscure the results further. The noise technique does not provide absolute protection. Due to increased technology, the noise method is more vulnerable to modern computer programs which can unscramble any noise patterns.

  • GAO-01-126SP Record Linkage and Privacy, OMB Report Chapters on Re-identification, Privacy Statutes, Report on Data Protection Methods.

Re-Identification

Public use datasets contain 'anonymous' microdata, information on individual people and organizations where the explicit identifiers have been stripped away. Microdata can then be transferred even in blocks of just a few individuals. Under 13 USC § 9, the Census Bureau is required to make sure that the identities cannot be "reasonably deduced." Concepts of "reasonable deduction," however, are changing quickly.

Re-identification is the process of linking anonymous data to the actual identity of an individual. Carnegie Mellon Professor Latanya Sweeney has demonstrated that anonymous data sets can often be readily re-identified. In one experiment, Sweeney, using 1990 Census data, demonstrated that individuals often have demographic values that occur infrequently. Since these values occur infrequently, they allow the re-identification of individuals in putatively anonymous datasets. Sweeney found in her report Uniqueness of Simple Demographics in the U.S. Population:

...87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides. And even at the county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S. population. In general, few characteristics are needed to uniquely identify a person.

Re-identification can also be enhanced through the use of commercially available or public records databases. Census data can be combined with other datasets in order to identify individuals. Some re-identification software is available commercially.

Re-identification is legal in the United States. However, some countries have attempted to address re-identification in a legal framework. Germany, for instance, recently proscribed census re-identification.

Business Influence Over the Census

For much of American history, business interests have shaped questions in order to capture marketing data from the census. For instance, the 2000 census long form asked numerous questions about home and consumption habits. Even in the late 19th Century, business associations have played a strong role in the creation of specific census questions. Rather than just including the original questions of race, age, sex, the census grew to include complex socioeconomic questions. Various statistical associations along with the Chairperson of the Board of the American Marketing Association appoint the members of the official Census Advisory Committee of Professional Associations.

In a hearing before the House Government Reform Subcommittee on the Census in 1998, the Members openly discussed the connection between consumer marketing and census data. The information provided in the census is used to construct marketing and strategic planning for companies. Starbucks' location strategy serves as the prime example; the sites for stores are closely examined and determined according to any data available from the government. Because of the economic impact of such information, data marketers formed lobbying groups who encourage lawmakers to continue to make such information publicly available.

Consumer marketing companies use the census to evaluate much more then income, race, gender, and the location of potential customers. Companies can evaluate the workplace, leisure activity, and consumption patterns of individuals. This can be attempted through "geodemographic segmentation," which combines the population and housing census information from several categories. Using geographic, demographic and psychographic (focusing on lifestyle rather than demographic information as a basis for describing segments of data) approaches, marketing companies can use the census information to construct lifestyle profiles. The marketing companies Claritas, CACI Marketing, Mediamark Research, Inc., former R.J. Reynolds Tobacco Company, and others have used various types of census based marketing systems.

The influence of marketing associations on the development of the census cannot be ignored. In the face of growing privacy violations, lawmakers are consistently choosing to expand the level of information available from the Census Bureau. Between one or more databases, individual identity can be reconstructed within the same street block. In a group of 15 individuals, it is not difficult for a marketer to identify the one person of the characteristics listed in census information. With various computer programs that synthesize such information, an individual can re-identify any small group of people from the information provided to marketers.

  • Kysar, Douglas A., Kids & Cul-De-Sacs: Census 2000 and the Reproduction of Consumer Culture, 87 Cornell L. Rev. 853 (2001).
  • U.S. Census Bureau, U.S. Dep't of Commerce, Charter of the Census Advisory Committee of Professional Associations (Mar. 27, 2000).
  • Department of Commerce, Bureau of the Census, Federal Register Vol. 68, No. 54, March 20, 2003.

Other Issues

Enumeration and Data Quality

The Census Bureau has been conducting an ongoing study to determine privacy attitudes surrounding the enumeration and data quality. The importance of correct enumeration and data quality has both political and social ramifications. On the societal effects, the public opinion regarding the quality of the census has a crucial impact on the number of questionnaires returned. If the general public does not see the importance of a national census, responses are likely to be low. The Census Bureau has calculated that an added cost for item non-response and misinformation must be expected in the cost of each census.

In the political arena, enumeration plays a crucial role in the construction of the Electoral College and the party outcome in elections. Enumeration is used for redrawing districts, therefore either adding or subtracted Electoral numbers. All parties have played a role in bargaining over the necessary quality of the national census. Particularly, the number of minorities counted has caused debate. The undercounting of minorities has become a frequent problem, due to both non-response as well as difficulties in racial categorizing.

  • Mayer, Thomas S. Privacy and Confidentiality Research and the US Census Bureau: Recommendations Based on a Review of the Literature, Research Report Series (Survey Methodology #2002-01) Statistical Research Division, US Bureau of the Census, Feb. 7, 2002, Statistical Review of public and interviewer perceptions.
  • Anderson, Margo and Stephen E. Fienberg, Census 2000: Politics and Statistics, 32 U. Tol. L. Rev. 19 (2000).

Social Security Numbers

The use of the Social Security Number on public documents remains one of the most controversial topics in privacy regulation. Recently, the Census Bureau has engaged in a study to see whether the public will object to the collection of Social Security numbers on census forms. The Census Bureau has created a program called SPAN, Social Security Number, Privacy Attitudes and Notification Experiment. The experiment would consist of asking 20,000 people to fill out their special census form, which would include their SSN. Meanwhile, the Census Bureau has begun to expand interagency sharing of Social Security numbers. In 1998, Commissioner of the Social Security Administration approved the Census Bureau's request for the file of SSN applicants (also called the Numident File).

The Administrative Records Steering Committee continues to assess whether or not a public outcry would follow the use of SSNs in the Census. Their studies have recognized that there are numerous considerations, particularly due to issues of controlling data.

Microdata

Microdata is a concept derived from the public use of samples of information. Data samples are used for statistical analysis of past census reports. Microdata allows researchers to create tabulations tailored to particular questions regarding the filed information. These files include nearly all the detail originally recorded by the census enumerations. The use of microdata can construct a great variety of interrelating figures to compile a set of variables for analysis. Microdata is particularly used for historical research because the aggregate tabulations produced by the Census Bureau are often not comparable across time.

A critical issue today is how large the sample sizes are and what exactly each contains. During the period since the 1940 census, microdata are subject to confidentiality measures that limit their usefulness for some applications. The available samples for these years include no names, addresses or other potentially identifying information. To further ensure that no individuals can be identified, the Census Bureau is required to limit the information regarding residence, place of work, high incomes, and several other variables. By changing the size and variables in new samples, the microdata could show completely different statistical results.

Rather than looking at larger statistics of cities or large towns, individuals can now research statistics according to a Census Bureau program called TIGER, Topologically Integrated Geographic Encoding and Referencing system. This system creates custom reports for any area defined, including a small neighborhood. The Public Use Microdata Sample files would then allow users to see the actual census questionnaires, albeit without the actual name or address. If one combines the two programs, the likelihood of identifying individuals is high.

News Items

Cases

  • Dep't of Commerce v. US House of Representatives, 525 U.S. 316 (1999). The Supreme Court held that the current language of Title 13 required that statistical sampling could not be used for the apportionment of seats in the House of Representatives.
  • St. Regis Paper Co. v. US, 368 US 208 (1962). The Supreme Court held that the company-retained copies of census reports could be subpoenaed and used against the reporting company in legal proceedings (so that the paper company was required to submit copies of their economic census forms to the FTC). According to some scholars, this can lower the business community's confidence in the reporting system and jeopardized the Federal statistical system.
  • US v. Little, 321 F. Supp. 388 (D. Del. 1971). The court found that information obtained by the census is strictly confidential under 13 USC § 9 and may not be used other than for statistical reporting and may never be disclosed in any manner so as to identify any person who has answered the questions.
  • US v. Bethlehem Steel Corp., 21 FRD 568 (D.N.Y. 1958). This case regarded an antitrust action against a steel corporation, when the US Department of Justice wanted to receive Census reports from Department of Commerce for the investigation. Court stated that the protection of privacy of census information is so clear and compelling that there is no basis for making census records available in such actions.
  • US v. Moriarity, 106 F. 886, 1091 U.S. App. Lexis 3634 (S.D.N.Y. 1901).
  • In re FTC Corporate Patterns Report Litigation, 432 F. Supp. 291 (D.D.C. 1977). The court decided that Congress intended to protect "copies" of the Census report and not general information relating to Census results. The case was determined from assessments of the legislative history of 13 USC 9(a).

Resources