Create a log file that contains ONLY code and output you want graded; use the "log on" and "log off" commands to exclude extraneous work. At the end, print out the log and format it so your name is shown on the first page and all lines are displayed neatly, e.g., not wrapped around to a 2nd row because the margins are too wide. Appearance and neatness count, and answers must be clearly marked for full credit. Explain answers in writing where appropriate, preferably in a Word document.
a) Go to "assessments.milwaukee.gov".1 Download the sales data for the year 2010, and import the data into Stata. You must show the steps taken in Stata (aka the code from the output window).
Hint: This can be done using the "import" command. You may also have to open the file in Excel first and re-save it as a .xls file before loading it into Stata.
Run the "describe" command, and show the output indicating 1,512 observations and 19 variables (with names such as "Address" and "Year_Built").
b) Look at the distributions of (and/or summary stats on) the following variables: Fin_sqft, Bdrms, Fbath, Hbath, Lotsize, Sale_price. If you're trying to estimate a hedonic pricing model for residential houses, identify at least 2 distinct and specific reasons to disqualify some of the 1512 observations from your sample. (4 points)
c) Generate the following variables: log(Fin_sqft), log(Lotsize) and log(Sale_price). Why might it not be a good idea to take the log of Hbath? Also generate a variable equal to the age of the house.
d) If you're trying to estimate a hedonic pricing model for residential houses, i.e.,
(1) log (SaleProce)
= ??0 + ??1??insqft + ??2Bdrms + ??3??bath + ??4Hbath + ??5Log(Lotsize)+ ??6??ge + u,
how many valid observations do you have? Briefly explain why you will have exactly 6 fewer observations to use in estimating model (2) below.
= ??0 + ??1??insqft + ??2Bdrms + ??3??bath + ??4Hbath + ??5Log(Lotsize)+ ??6??ge + u,
e) Estimate for residential properties. Report the results, interpret the coefficient estimates quantitatively, and discuss their statistical significance. Also identify at least one "unexpected" result and discuss where it may have originated.
f) In real estate, you often hear the cliche "location, location, location". Add indicator variables for the house's aldermanic district to the regression model (2). Perform a test based on this to determine whether "location matters" in the model specification.
Note: To get credit for part (f), state a null and alternative hypothesis and then test them by calculating the appropriate statistic(s) and state whether or not you reject the null.
g) Perform the Breusch-Pagan test for heteroskedasticity on the model in part (f) and state your conclusion. Note: for this part, post-estimation commands beginning with estat are off limits.