STATISTICAL ANALYSIS PROJECT-
This project leads you through a statistical analysis of used car price data. The data for this project was obtained from the car sales website www.carsales.com.aubetween 4 and 11 January 2016 (inclusive).
Project Data
The data for this project can be accessed from the MySCU site for MAT10251 in Project under Assessment.
The data set provided contains 10 randomly chosen samples of size 125.
To obtain your data
(1) Click on the 'Project Data' file. This will download an Excel file.
(2) Select the 5 columns (Year to Price) of data for the sample specified by the last digit of your student ID number.
(3) Copy this into a new Excel file.
There are 10 sample data sets each of 5 columns (Year to Price)
Your sample number matches the last digit of your SCU student ID number. For example, if your student ID number ends in 1 your sample is Sample 1 and you will be analysing used car data for Mazda 3 cars for sale in Queensland in columns G to K.
Project Situation
Your statistical analysis ofused car price data is to enable you to answer questions from a relative or friend who is seeking to buy a used car of the make and model specified by your sample and has asked you for information and advice. Your relative or friend is restricting their search to the state they are living in also specified by the sample. For example, if your student ID number ends in 0 your sample is Sample 0 so your friend or relative is seeking to buy a Mazda 3 in New South Wales.
In each part of the project you are required to analyse your sample data in response to given questions and provide a written answer. You can assume that each written answer is a part of a letter or email to your friend or relative.
Data Analysis Project - Part A
Part A Question-
Your friend or relative has asked you for information on theprice of three and four year old cars of the make and model and in the state specified by your sample. In particular, he/she is interested in the minimum and maximum price, the average price and an estimated price range for a three or four year old used car.
Data Analysis Project - Part B
Purpose: To-
- obtain feedback on your submission in Part A and to gain experience in self-evaluation of submitted work,
- apply your knowledge of statistical inference to answer questions about used car prices by analysing the data and communicating the results.
Tasks-
Task 1- Part A Self-Marking -
1) Open yoursaved copy of your submission for Part A.
2) Replace the Part A coversheets (three pages) with the Part B coversheets (first four pages).
3) Rename and save this file as - "Family Name_First Name_Part_B_Campus".
4) Use the solution template and marking guide provided to mark your submission for Part A. Enter recommended marks on the self-marking sheet for Part A, page 3 of the file in 3) above.
5) Write a short (approximately 200 words) reflection/feedback on your submission and marking of Part A. In particular;
- consider the good aspects of your submission, what did you do well,
- identify where you made mistakes, and how you would avoid them in the future,
- consider what you learnt from submitting and marking Part A.
This is to be entered in the space at the bottom of the self-marking sheet for Part A.
Task 2 Part B Appendix - Statistical Inference
The following statistical tasks should appear as appendices to your written answer. This should include all necessary steps and appropriate Excel, or equivalent, output.
These appendices should come after your written answer within your single word document for Part B.
In preparing your appendices you may use one of the following formats:
- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.
Statistical Inference-
Choose a level of significance for any hypothesis tests and a level of confidence for any confidence intervals. Enter these values on page 2 of the Part B coversheets along with the sample number from Part A.
Question 1 - Your relative or friend asks you for an estimate of the average price of a three or four year old car of the specified make and model in the state specified by your sample.
To provide this estimate use Price data for 2012 and 2013 used cars and an appropriate statistical inference technique to answer the following question.
What is the mean price of at hree or four year old car of the specified make and model in the specified state?
Note: the required data for 2012 and 2013 cars is in the first rows of your sample.
Question 2 - Your relative or friend would prefer to purchase a car with a manual transmission and wishes to know if this will limit their choice.
To provide a justified answer to this question use the Transmission data (where A = Automatic transmission, M = Manual transmission) for all cars in your sample and an appropriate statistical inference technique to answer the following question
Do more than 30% of cars, of the specified make and model, for sale in the specified state have manual transmission?
Task 3 - Part B Written Answer- Letter or Emails
For each question present the results of your calculations, with your interpretation and conclusion, as part of a letter or email to your friend or relative.
Use the instructions given on page five of the Part B coversheets.
This should be one to three pagesand200 to 400 words.
It should be submitted as a Word file with Excel output included.
Make sure you:
- Introduce the question and put it in context.
- Answer the question in non-statistical language.
- Present the results of your intervals or tests without unnecessary statistical jargon.
- Include conclusions which answer the given questions.
Data Analysis Project - Part C
Task 1 Part C - Appendix Statistical Inference and Regression and Correlation
The following statistical tasks should appear as appendices to your written answer. This should include all necessary steps and appropriate Excel, or equivalent, output.
These appendices should come after your written answer within your single word document for Part C.
In preparing your appendices you may use one of the following formats:
- Word with Excel output added.
- Handwritten with Excel output added. This will then need to be scanned and added to your word document.
Choose a level of significance for any hypothesis test and a level of confidence for any confidence interval. Enter these values on page 2 of the Part C cover sheets along with the sample number from Part A.
Use your sample and appropriate statistical inference and regression and correlation techniques to answer the following questions.
Question 1 Statistical Inference
Your relative or friend asks you if used car prices are generally higher for cars with automatic transmission than those with manual.
Use Price and Transmission data (where A = Automatic transmission, M = Manual transmission) for all cars in your sample and an appropriate statistical inference technique to answer the following question
On average is the price of cars, of the specified make and modelfor sale in the specified state, with automatic transmission higher than those with manual transmission?
Question 2 Simple Linear Regression model
Your friend or relative asks you how the value of the car that they decide to purchase will depreciate in value.
Use Age (independent variable) and Price (dependent variable) to model the relationship between age of a used car and its price.
Then to provide an answer on how the value of the car that your friend or relative decides to purchase will depreciate in value explore this relationship by
1. Plotting the data with a scatter plot.
2. Calculating the least squares regression line, correlation coefficient and coefficient of determination.
Question 3 Multiple Linear Regression model
Your relative or friend now wants to know what other factors may have an influence on price.
To explore this add Kilometers' and Transmission as additional independent variables to the regression model developed in Question 2. Then explore the relationship between these variables by
1. Calculating the multiple regression equation, multiple correlation coefficient, and coefficient of multiple determination
2. Using appropriate tests to determine which independent variables make a significant contribution to the regression model.
Hence, determine which independent variables to include in your model.
Task 2 -Part C - Written Answer - Emails or Letter
For each question present the results of your calculations, with your interpretation and conclusion, as part of a letter or email to your friend or relative.
Use the instructions given on pages four and five of the Part C coversheets.
This should be 500 to 900 words and three to seven pages.
It should be submitted as a Word file with Excel output embedded.
Make sure you:
- Introduce each question and put it in context.
- Answer the questions in non-statistical language.
- Present the result of your procedures, intervals and/or tests without unnecessary statistical jargon.
- Include conclusions which answer the given questions.
In particular, for Question 2:
- Explain the choice of independent and dependent variables.
- Include your graph.
- From your scatter plot discuss any apparent relationship between age and price. Comment on the strength, shape and sign of the relationship.
- Interpret the gradient and vertical intercept of the simple linear regression equation.
- Discuss and interpret the values of correlation coefficient and coefficient of determination. In particular, are these values consistent with your graph.
- Mention any concerns you may have about the validity of your results due do a non-linear relationship, extreme values etc.
- Provide an answer on how how the value of the car that your friend or relative decides to purchase will depreciate in value
In particular, for Question 3
- Interpret the values of the multiple regression coefficients. Compare these with the corresponding values in the simple linear regression model.
- Discuss and interpret the values of the multiple correlation coefficient and coefficient of multiple determination. In particular, compare these with the corresponding values for the simple linear regression model.
- Include and justify a recommendation on which independent variables to include in your model.