Problem Description:
You are consultants working with an online real estate appraiser, onthehouse.com.au. In order to better calibrate their models to predict housing prices, your supervisor has asked your group to develop a model to appraise the price of homes in a capital city of Australia based on characteristics of the home and the surrounding neighbourhood. In economics, this is commonly called a "Hedonic Regression."
You will use descriptive statistics, inferential statisticsand your knowledge of multiple linearregression to complete this task.
Housing data for 100 single-family units lists housing price data (in $000s) (Dependent Variable)and several characteristics of the home and neighbourhood(Independent Variable) for a capital city in Australia are given in the Excel file: Thursday.xlsx.
Here is a table describing the variables in the data set:
Variable
|
Definition
|
Price
|
Price of sold single-family home is $000s
|
Bed
|
Number of bedrooms in the house
|
Dis
|
Distance to nearest CBD in kilometres
|
Floor
|
Area of home is square metres
|
School
|
State ranking of nearby public secondary school. Varies from 0 to 100 points.
|
Train
|
Dummy Variable indicating whether a train station is located within 500 metres
|
Required:
A. Calculate the descriptive statistics fromthe data and display in a table. Be sure to comment on the central tendency,variabilityand shape for housing price and two additional variables.
B. Draw a graph that displays the relative share of bedrooms in the sample.
C. Create a box-and-whisker plot for the distribution of the price of the homes and describe the shape. Is there evidence of outliers in the data?
D. What is the likelihood that a house is both over $580,000 and more than 8 kilometres from the CBD?Is the price statistically independent of distance? Use a Contingency Table.
E. Estimate the 99% confidence interval for the population mean housing price.
F. Your supervisor recently stated that it is obvious that the mean housing price is greater than$600,000,which was the average price of housing sold last year. Test his claim at the 1% level of significance.
G. Run a multiple linear regression using the data and show the output from Excel.
H. Is the coefficient estimate for the number of bedrooms statistically different than zero at the 5% level of significance? Set-up the correct hypothesis test using the results found in the table in Part (G) using both the critical value and p-value approach. Interpret the coefficient estimate of the slope.
I. Interpret the remaining slope coefficient estimates.Comment on whether the signs are what you are expecting.
J. Interpret the value of the Adjusted R2. Is the overall model statistically significant at the 1% level of significance? Use the p-value approach.
K. Do the results suggest that the data satisfy the assumptions of a linear regression: Linearity, Normality of the Errors, and Homoscedasticity of Errors? Show using scatter diagrams, normal probability plots and/or histograms and Explain.
L. Based on the results of the regressions, is it likely that other factors have influenced housing prices? If so, provide a couple possible examples and indicate whether these would likely influence the regression results if they were included.
M. If a community housing organisation asked for information regarding the characteristics of housing targeting the households of Aboriginals and Torres Strait Islanders, explain whether a clustered sampling technique of the CBDwould provide an accurate representation of these households. (Note: This question does not use the data)
Attachment:- Thursday.xlsx