Assignment
A reminder about Academic Integrity from the syllabus:
Violations of the Academic Integrity Policy will not be tolerated. I am available to help you with graded assignments (within certain limits, e.g., I cannot tell you whether an answer is correct before you submit an assignment), but you are not allowed to work with anyone else on a assignment.
Are all violations of the Academic Integrity Policy. The above is not meant to be an exhaustive list of possible violations, so if you have any question about whether something is permissible, I strongly encourage you to check with me ahead of time.
Part (a)
Use the "Bus Maintenance.xls" data set and run a multiple regression with "Maintenance cost per month" as dependent variable on the following independent variables: "Age," "Miles per month," and "Diesel dummy."
You must submit your actual Excel file with the output as part of the assignment.
Part (b)
How can we tell that we reject the null hypothesis that all slope coefficients are zero? Then explain what that means in language that could be understood by a high school senior.
Part (c)
Interpret the estimated value of the coefficient on the "Age" variable, i.e., explain what the number means in this regression.
Part (d)
Interpret the estimated value of the coefficient on the "Miles per month" variable, i.e., explain what the number means in this regression.
Part (e)
Interpret the estimated value of the coefficient on the "Diesel dummy" variable, i.e., explain what the number means in this regression.
Part (f)
Is the estimate of the coefficient on the "Diesel dummy" variable statistically significant? Please answer "yes" or "no." Then explain in language that could be understood by a high school senior what that means.
Part (g)
Consider a new "Gasoline dummy" variable that is "1" when a bus has a gasoline engine and "0" when it has a Diesel engine. Suppose that we then run a new multiple regression with "Maintenance cost per month" as dependent variable on the following independent variables: "Age," "Miles per month," and "Gasoline dummy." (That is, it is the same regression as in Part (a), but with the "Gasoline dummy" variable instead of the "Diesel dummy.") What would the estimated value of the coefficient on the "Gasoline dummy" be?
Part (h)
Consider a new variable called "State code" that indicates in which state a bus is operated and is constructed as follows: The 50 U.S. states are sorted alphabetically and coded from 1 to 50 so that Alabama would be coded as "1," Alaska as "2," Arizona as "3," and so on all the way to "50" for Wyoming. Suppose that we then run a new multiple regression with "Maintenance cost per month" as dependent variable on the following independent variables: "Age," "Miles per month," "Diesel dummy," and the newly created "State code" variable (That is, it is the same regression as in Part (a), but with the "State code" variable added.) Would the (unadjusted) R-Squared increase or decrease? Then explain whether that is something that is desirable or whether there is a problem with what happens to R-Squared.
Attachment:- bus_maintenance.rar