The Project -
This project concerns a problem of interest of buying a used car. The calling price of used cars will vary depending on the year of production and the mileage, any specific kind (brand and body type, e.g. Honda Civic Sedan) of cars,...including some random factors.
The purpose of this project is to examine the relationship between the mean calling price E(y) (the price asked for by the owner) of a specific kind of car and the following independent variables:
1. X1 (quantitative): The number of years since production; e.g. If a car is produced on2010, then X1 = 2017-2010 = 7.
2. X2 (quantitative): The original price of the car when it is brand new.
3. X3 (quantitative): The Current Mileage of the car.
4. X4 (qualitative): Title (Clean vs not clean).
Choose specific type of car (for example: Honda Civic sedan; Chevrolet Malibu, etc.), and collect your sample data with sample size n ≥ 30 (you can decide how many observations to be included but that number must be greater or equal to 30). Make sure your data contains the above quantitative and qualitative variables.
The objectives of this project are as follows:
1. Hypothesis a model for calling price and predictors (if necessary you need to consider the interactive effects)
2. Run variable selection procedure to choose most important x's (stepwise regression, all possible regression selection procedure)
3. For the selected x's in step 2, fit regression model you proposed in step 1. Conduct T-test on important β′s; comparing adjusted R2; compare 2s values.
4. Propose and fit other candidate models. Determine a best model for E(y) by checking nested model F-test (hint using anova() function in R for nested-F test);
5. Based on the best model you selected in step 4, perform residue analysis to check assumption on ε (whether or not ε's are independently from N(0, σ2)). (Hint: for normality assumption, use both Q-Q plot residual plots (code will be provided in later chapters)).
6. Remedy your model if you do detect some violation of assumption on ε and redo step 1, 2, 3, 4 and 5
7. Assess adequacy of best model by checking global F-test significant; adjusted R2 high; 2s value small
Format of Your Work -
Your work should be clear and easy to understand, follow the following format:
1. Statement of The Problem: You need to state your research question here. That is, tell us what your study is about and your purpose of the study (around 100 words).
2. The Data: You need to specify how you collect the data and summarize your sample data using the methods we learned in descriptive statistics.
(1) The following table must be included.
(2) Scatter plots: X1 versus Y; X2 versus Y; X3 versus Y.
Histogram: the histogram of the calling price Y.
3. The Models: Specify the hypothesized models you want to apply. In this part, you are expected to finish the first four objectives stated above. Hint: When you proposed a model, the first fitting might not an ideal model, you might need to improve your model by selecting variables, change the order of your model, considering interactive effect, ect. You need to compare all the models you fitted and explain why it is the best by checking the nested model F-test; T-test on important β′s; comparing adjusted R2; comparing 2s values.
4. Assumption check: In this part, you can do item 5 of objectives stated above, and write down your conclusion.
5. Model Remedy: In this part, do some transformation for y or x to make the model assumptions be satisfied and write down the new model and conclusion.
6. Model adequency: In this part, you are expected to finish the last objectives stated above.
7. Conclusion: Give a brief summary of your study.
Others -
This project is composed of 7 parts (see 2 Format of Your work)
All the analysis should be done by applying software R. You should collect your data and write your code by yourself. Project report does not include analysis and R code will not be graded. Your work should be a pdf file which contains your analysis and R code.
Attachment:- Assignment File.rar