The purpose of this assignment is to develop models that can be used to predict the specific variable:frequency of going to a fast food restaurant (FFR), q2, which we shall treat as interval scale [even though it is, technically, merely ordinal] with values ranging from 1 to 5. (I.e., q2 is back to its original form on the questionnaire).
A reasonable strategy is to first run (stepwise) regressions including as (eligible) independent variables the questions in each noted “block of variables.” Particularly relevant “blocks “ might be thought of as (1) Product attributes/ benefits, q16 - q26; (2) Restaurant scripts - ideas in your mind about fast food restaurants, q69 - q84; (3) Usage/occasion, q85 - q102, (4) Demographics, q136 – q148. Then note which variables are statistically significant[1], and the R-squared values from the model using the variables. Call this “phase 1.”
Q1) What is the R2 for each of the four blocks of variables? Which block of variables is the most predictive of q2?
Next (call this “phase 2”), you should run a stepwise regression using the variables that were identified in phase 1 as being important (i.e., “survived” phase 1 stepwise – the “winners” from each block). This is your “final model.”
Q2) For your final model, explain the impact of each independent variable on the dependent variable in terms of the variable’s coefficient; do not focus on the actual numerical value, but, rather, on the sign of the coefficient. A typical answer for the variable, Income, would be something like, “Assuming other variables in the model are held constant, if a person has a higher income, he/she will visit FFR’s more (less) often.” Also, along with this explanation, indicate if the sign of the coefficient makes sense to youand why.
Q3) a) In your final model, suppose that each of the independent variables takes on a value of its respective mean (average) in the dataset of n = 402. What do you predict as the value for q2?
Q3) b) Find a 90% confidence interval for the prediction in Q3a) for a hypothetical “average” individual person with that set of values for the independent variables.
In addition to building your phase 2, “final” model, there is a particular interest in the relationship of the dependent variable (q2) to the importance the respondent gives to “popularity with children,” q18, and “the number of children 15 and under,” q138.
4) For each of the 2 variables, q18 and q138, is it in the final model? If it is, describe its exact relationship to q2 after adjusting for other variables in the final model[2]. If it isn’t, is it because the variable is redundant or because it is irrelevant? (Please make sure I can tell whether your answer is “redundant” or if it is “irrelevant,” by typing a boldfaceredundant or a boldfaceirrelevant somewhere in your answer). Also,provide your reasoning for your conclusion.
Technical note: Some of the questions toward the end of the data set (i.e., questionnaire) are nominal scale variables with more than two categories: specifically, 144 and 145. These need to be converted to dummy variable form. (If there are only two categories to begin with [e.g., Male, Female], then the variable is “already in dummy variable form,” and does not need conversion.)
At the end of the database, various dummy variables have been constructed for your convenience. The category of the housing variable, Duplex, which had a very low frequency, was combined with one of the other categories, Apartment, resulting in the variable having only 2 categories (and, correspondingly, one variable).
[1] I.e., remain in the stepwise regression at the end of the stepwise process.
[2] Note: its “relationship” to q2 concerns its coefficient/slope (not something to do with R or R2).