Suppose we have data from a health survey conducted in year 2010. Data were obtained from a random sample of 2342 persons.
A dichotomous logistic regression analysis was conducted in the following way:
Dependent Variable (Y): Self-reported general health (1 if not healthy, 0 if healthy)
Independent Variables: Gender (1 if female, 0 if male)
Age (in years)
Race/Ethnicity (binary variables for "Non-Hispanic Black" "Hispanic" and "Others"; the reference category is "Non-Hispanic White")
Form of the equation: where p is the probability of being self-reportedly unhealthy and Xj's (j=1, 2, ..., k) are independent variables. Age range of the sample: 25-84. The log(p/(1-p)) is also called log odds. (Note, the logarithm we used in the class is natural log. If you use EXCEL, make sure to use the ln() function instead of the log() function).
The computer output is given below. The column labeled "Beta" shows estimated regression coefficients, i.e., α and β's in the above equation. (It can be interpreted that beta's for the reference categories, "Male" and "Non-Hispanic White", are fixed to be zero.)
1. Suppose that the data set includes a 30-year old Hispanic man. What is his log odds of being self-reportedly unhealthy according to this regression equation?
2. What are his odds of being self-reportedly unhealthy according to this regression equation?
3. What is the probability for him to be self-reportedly unhealthy, according to this regression equation?
4. The above table shows that exp(beta) for "Hispanic" is 3.722 ( = exp(1.314) ). What does this number mean? (Hint: Note that the reference category for Race/Ethnicity is "Non-Hispanic White".)
5. The above table shows that exp(beta) for "Age" is 1.026 ( = exp(0.026) ). What does this number mean?
6. Let us consider the odds (or log odds) of being self reportedly unhealthy for Non-Hispanic White, adjusted for gender and age, and that for Non-Hispanic Black. Are they statistically significantly different at the 0.05 level by the two-sided test? Why or why not?