Mis770 - foundation skills in data analysis assignment -


Foundation Skills in Data Analysis Assignment - Automotive CO2 Emissions Analysis 

Learning Outcomes -

  • Critical thinking: evaluating information using critical and analytical thinking and judgment.
  • Manipulate and summarise data that accurately represents real world problems.
  • Interpret and appraise statistical output to assist in real-world decision making.

Overview -

The purpose of this assignment is to investigate a dataset which will enable you to answer questions posed in a Memorandum (see Memorandum section below).  In order to answer the memorandum questions, you'll need to analyse a given dataset, interpret the results, and then draw appropriate conclusions. The aims of the assignment are to:

  • Provide you with some examples of the application of data analysis
  • Test your understanding of the material in the relevant topics
  • Test your ability to analyse and interpret your results
  • Test your ability to effectively communicate the results of your analysis to others

Scenario -

You play the role of Mira Hetnal in the Ministry of Transport's Research and Analysis Department and you have been asked to respond to a Memorandum from Selina Wang, the Chief Data Analyst. To assist you in answering Selina's questions, she has provided you with a dataset called Motor_Vehicles.xlsx.

For the purposes of this assignment, the dataset relates to a random sample of new Canadian Motor Vehicles whose CO2 Emissions were tested during 2015.

The specific questions Selina has for Mira are in the following memorandum.

Memorandum

Date: 4th January, 2017

To: Mira Hetnal, Research and Analysis Department

From: Selina Wang, Chief Data Analyst

Subject:  Analysis of Automotive CO2 Emissions Data

Dear Mira,

Can you please carry out an analysis of the recent Automotive CO2 Emissions Data (contained in the file Motor_Vehicles.xlsx) and prepare a Memorandum reply to me containing answers to the following questions.  In your Memorandum, please use plain language as I will provide your reply directly to people who do not necessarily understand statistical jargon.

My specific questions are:

Q1. An Overall View of CO2 Emissions

Can you provide me with an overall summary of the variable CO2 Emissions just by itself?

Q2. Relationships with CO2 Emissions

Does there appear to be any relationship between the CO2 Emissions and the type of Fuel used?

Q3. Confidence Intervals

(a) Can you estimate the level of CO2 Emissions for all 4 Cylinder, 6 Cylinder and 8 Cylinder vehicles? Does there appear to be any difference?

(b) Also, can you estimate the proportion of all vehicles that have 4 Cylinders, 6 Cylinders and 8 Cylinders?

Q4. Hypothesis Tests

Last month a national newspaper published an article stating the Federal Government was investigating a proposal to restrict CO2 Emissions for new vehicles to no more than 350 grams per kilometre. The same article suggested that this would remove at least 5% of the largest polluting vehicles off the road. Are you able to confirm if the sample data we have (i.e. Motor_Vehicles.xlsx) supports this claim?

Q5. Simple Regression

(a) I don't know a lot about vehicle CO2 Emissions, but I would think that the larger an engine was, the greater the CO2 Emissions. Would you therefore create a regression model that shows how well the size of an engine (in litres) explains the variation in CO2 Emissions?

(b) Would you then use your model to predict the CO2 Emissions of a vehicle that had an engine size of 1000 cc (i.e. 1 litre)? Do you have any concerns about this prediction?

Q6. Appropriate Sample Size

Finally, I am concerned that the sample size of 1082 cars that we have in this study is far too many and we could easily get the same results with a much smaller sample. Therefore, if we wanted to undertake a future study, what would an appropriate sample size be if we wanted to:

(a) estimate the proportion of vehicles whose CO2 Emissions were no more than 350 grams per kilometre to within 3% with a high level of confidence, and

(b) accurately estimate  the  overall  combined  fuel  consumption  (i.e.  Fuel_Both)  to within 0.5 l/100kms.

Basically, how many cars would we need to include in next year's survey to satisfy both requirements?

Regards, Selina

Memorandum Requirements

  • Your Memorandum should be no longer than 2000 words and there is no need to include a Table of Contents, Charts and Tables, or Appendices in the Memorandum. The Charts/Graphics and Tables you create are only to be placed in the Data Analysis file i.e. the Excel spreadsheet
  • Suggested Word formatting for the Memorandum: Single-line spacing; no smaller that 10-point font; page margins approx. 25mm, and good use of white space
  • Your Memorandum must have a cover sheet containing your particulars and Unit details
  • Set out the Memorandum in the same order as in the originating Memorandum from Selina, with each section (question) clearly marked
  • Use plain language and keep your explanations succinct. Avoid the use of technical or statistical jargon. As a guide to the meaning of "Plain Language", imagine you are explaining your findings to a person without any statistical training (e.g. someone who has not studied this unit). What type of language would you use in this case?
  • Marks will be lost if you use unexplained technical terms, irrelevant material, or have poor presentation/organisation
  • All Microsoft Excel output associated with each question in the Memorandum is to be placed in the corresponding tab in the file Motor_Vehicles.xlsx

Data Analysis Instructions/Guidelines

To prepare  a  reply  to Selina's Memorandum,  you will  need  to  examine  and  analyse  the  dataset Motor_Vehicles.xlsx thoroughly.

Selina has asked several questions and your Data Analysis output (i.e. your charts/tables/graphs) should be structured such that you answer each question on the separate tab/worksheet provided in your Excel document. There are also two extra tabs in Motor_Vehicles.xlsx called CI and HT and you can use the various templates contained in these tabs in your "Confidence Interval" and "Hypothesis" answers.

In  order  to  effectively  answer  the  questions,  your Data  Analysis  output  needs  to  be  appropriate. Accordingly, you'll need to establish which of the following techniques are applicable for any given question:

  • Summary Measures (e.g. Descriptive Statistics, Inc. Outlier detection)
  • Comparative Summary Measures (i.e. Descriptive Statistics for multiple values of a variable)
  • Suitable tables (such as a Frequency Distribution) and charts or graphics (such as Histograms, Box Plots, Pie Charts, Bar/Column Charts) that will illustrate more clearly, other important features of a variable
  • Cross Tabulations (sometimes called Contingency Tables), used to establish the relationships (dependencies) between two variables (see Additional Materials under Topic 3 - Creating Cross Tabulations in Excel using Pivot Tables)
  • Confidence Intervals. You can assume that a 95% confidence level is appropriate. We use Confidence Intervals when we have no idea about the population parameter we are investigating. Additionally, we would use Confidence Intervals if we are asked for an estimate. You can use the relevant Excel templates provided in the dataset and copy them to the applicable question tab
  • Hypothesis Tests. You can assume that a 5% level of significance is appropriate. We Use Hypothesis Tests when we are testing a Claim, a Theory or a Standard. You can use the relevant Excel templates provided in the dataset and copy them to the applicable question tab.

Attachment:- Assignment Files.rar

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Mis770 - foundation skills in data analysis assignment -
Reference No:- TGS02285096

Expected delivery within 24 Hours