Deb and Trivedi (1997) and Zeileis et al. (2008) examine the relationship between the number of physician office visits for a person (ofp) and a set of explanatory variables for individuals on Medicare. Their data are contained in the file dt.csv. The explanatory variables are number of hospital stays (hosp), number of chronic conditions (numchron), gender (gender; male = 1, female = 0), number of years of education (school), and private insurance (privins; yes = 1, no = 0). Two additional explanatory variables given by the authors are denoted as health_excellent and health_poor in the data file. These are self-perceived health status indicators that take on a value of yes = 1 or no = 0, and they cannot both be 1 (both equal to 0 indicates "average" health). Using these data, complete the following:
(a) Estimate the Poisson regression model to predict the number of physician office visits. Use all of the explanatory variables in a linear form without any transformations.
(b) Interpret the effect that each explanatory variable has on the number of physician office visits.
(c) Compare the number of zero-visit counts in the data to the number predicted by the model and comment. Can you think of a possible explanation for why there are so many zeroes in the data?
(d) Estimate the zero-inflated Poisson regression model to predict the number of physician office visits. Use all of the explanatory variables in a linear form without any transformations for the log(µi) part of the model and no explanatory variables in the πi part of the model. Interpret the model fit results.
(e) Complete part (d) again, but now use all of the explanatory variables in a linear form to estimate πi. Interpret the model fit results and compare this model to the previous ZIP model using a LRT.
(f) Examine how well each model estimates the number of 0 counts.