Potential jurors. In Example 28.3 the indicator variable for year (x2= 0 for 1998 and x2= 1 for 2000) was used to combine the two separate regression models from Example 28.1 into one multiple regression model. Suppose that instead of x2 we use an indicator variable x3 that reverses the two years, so that x3 =1 for 1998 and x3= 0 for 2000. The mean reporting percent is µ =ß0 + ß1x1 + ßrx3, where x1 is the code for the reporting date (the value on the x axis in Figure 28.1) and x3 is an indicator variable to identify the year (different symbols in Figure 28.1). Statistical software now gives the estimated regression model as 94.9- 0.717x1- 17.8x3.
(a) Substitute the two values of the indicator variable into the estimated regression equation to obtain a least-squares line for each year.
(b) How do your estimated regression lines in part (a) compare with the estimated regression lines provided for each year in Example 28.3?
(c) Will the regression standard error change when this new indicator variable is used? Explain.
Example 28.3:
Example 28.2 introduced the regression model
for predicting reporting percent y from reporting date x1 and year x2. Statistical software gives the least-squares estimate of this model as
By substituting the two values of the indicator variable into this estimated regression equation, we can obtain a least-squares line for each year. The predicted reporting percents are
and
Comparing these estimated regression equations with the two separate regression lines obtained in Example 28.1, we see that the intercept parameters are very close to one another (95.571 is close to 94.9, and 76.426 is close to 77.1) for both years. The big change, as intended, is that the slope -0.717 is now the same for both lines. In other words, the estimated change in mean reporting percent for a one-unit change in the reporting date is now the same for both models, -0.717. A closer look reveals that -0.717 is the average of the two slope estimates (-0.765 and -0.668) obtained in Example 28.1. Finally, the regression standard error s = 6.709 indicates the size of the "typical" error. Thus, for a particular reporting date, we would expect approximately 95% of the reporting percents to be within 2 × 6.709 = 13.418 of their mean.
Example 28.1:
STATE: Tom Shields, jury commissioner for the Franklin County Municipal Court in Columbus, Ohio, is responsible for making sure that the judges have enough potential jurors to conduct jury trials. Only a small percent of cases go to trial, but potential jurors must be available to serve on short notice. Jury duty for this court is two weeks long, so Tom must bring together a new group of potential jurors twenty-six times a year. Random sampling methods are used to obtain a sample of registered voters in Franklin County every two weeks, and these individuals are sent a summons to appear for jury duty. Not all of the voters who receive a summons actually appear for jury duty. Table 28.1 shows the percent of individuals who reported for jury duty after receiving a summons for two years, 1998 and 2000.1 The reporting dates vary slightly from year to year, so they are coded in order from 1, the first group to report in January, to 26, the last group to report in December. New efforts were made to increase participation rates in 2000. Is there evidence that these efforts were successful?