Problem
I. State the measurement level (nominal, ordinal, interval, ratio) for each variable in the data set along with the appropriate measures for central tendency (mode, mean, median) and dispersion (range, variance/standard deviation)
II. For each of the questions a-c below select the appropriate test and state:
i. What is the null hypothesis?
ii. Did you reject the null? Why or why not?
iii. What is your conclusion?
• Is there a relationship between being a smoker and the region a person lives in?
• Are smokers charged more by insurers relative to non-smokers?
• Is BMI different for males and females?
III. Use a linear regression model to capture the relationship between insurance charges and relevant explanatory variables.
• Briefly explain the rationale behind your model specification. Why did you include the variables you selected? Think about whether you need/want interaction effects or nonlinear transformation of the variables.
• Comment on the overall fit of the model.
• Interpret the coefficients. What do we learn about the factors explaining the variation in insurance charges?