Question: (a) Assume that one of the regressors in a two-regressor model has been misspecified, and a logarithmic term should be used rather than a linear term. If the values of that regressor in the data set range from 8 to 14, would a partial residual plot likely indicate the need for the logarithmic term? (Hint: Assume that n = 25 and that the regressor values are equally spaced. What is the correlation between the linear term and the logarithmic term?) Should we really be concerned whether any plot identifies or fails to identify the need for the term? In particular, recall the rule-of-thumb from Mosteller and Tukey (1977) that was given in Section 2.3.2.
(b) Now assume that the regressor values range from 8 to 200, and n stays the same. Again compute the correlation coefficient. Explain why a residuals plot, standardized residuals plot, or detrended added variable plot will not provide the appropriate signal. The following data were constructed using the equation Y = 4 + 3X1 + 4 In(X2) + ?.