Question: Historians wanting to use data from U.S. censuses collected in the precomputer age faced the daunting task of poring over reels of handwritten records on microfilm, arranged in geographic order. The Public Use Microdata Samples (PUMS) were constructed by taking samples of the records and typing those records into the computer. Ruggles describes the PUMS construction for the 1940 census: The population schedules of the 1940 census are preserved on 4,576 microfilm reels. Each census page contains information on forty individuals. Two lines on each page were designated as "sample lines" by the Census Bureau: the individuals falling on those lines-5 percent of the population-were asked a set of supplemental questions that appear at the bottom of the census page.
Two of every five census pages were systematically selected for examination. On each selected census page, one of the two designated sample lines was then randomly selected. Data-entry personnel then counted the size of the sample unit containing the targeted sample line. Units size six or smaller were included in the sample in inverse proportion to their size. Thus, every one-person unit was included in the sample, every second two-person unit, every third three-person unit, and so on. Units with seven or more persons were included with a probability of 1-in-7: every seventh household of size seven or more was selected for the sample. (1995, 44)
a Explain why this is a cluster sample. What are the psu's? The ssu's?
b What effect do you think the clustering will have on estimates of race? Age? Occupation?
c Construct a table for the probability of selection for persons in one-person units, two-person units, and so on.
d What happens if you estimate the mean age of the population by the average age of all persons in the sample? What estimator should you use?
e Do you think that taking a systematic sample was a good idea for this sample? Why, or why not?
f Does this method provide a representative sample of households? Why, or why not?
g What type of sample is taken of the individuals with supplementary information? Explain.