Assignment
Instructions: Please print out and complete the following assignment writing your answers clearly and showing your work directly on the assignment. Please keep a log of your work in STATA and print out and attach all of your results. Use a highlighter to highlight all of your commands in STATA (this will make it easier for the graders to see your work). Follow directions carefully (underlining or circling where indicated in your STATA output).
Capital Bike Share is a service provided in the Washington D.C. area that allows individuals to use bicycles for short-term use; individuals pick up the bike at "Point A" and then return it at "Point B." Using data collected from this service, we want to look at the relationship between number of trips made per day and daily low temperature. This dataset comes from one single month (January) where the column "date" denotes each day of the month.
a. Run the following regression in STATA:
trips = β0 + β1*lowtemp
b. Report the coefficient and p-value on lowtemp. Is this statistically significant?
c. Using STATA, plot the residuals (attach to this assignment). What general conclusions might you draw from looking at this plot?
d. Perform a Durbin-Watson test at the .05 level testing that there is positive serial correlation. Be sure to report the lower and upper limit critical values and test statistic. What is your decision, and what does this mean?
Suppose we have the following data on the income and consumption of non-self-employed homeowners: (Data comes from Ando & Modigliani's "The Permanent Income and Life Cycle Hypotheses of Saving Behavior: Comparisons and Tests")
Income Bracket ($)
|
Average Income ($)
|
Average Consumption ($)
|
0-999
|
556
|
2760
|
1000-1999
|
1622
|
1930
|
2000-2999
|
2664
|
2740
|
3000-3999
|
3587
|
3515
|
4000-4999
|
4535
|
4350
|
5000-5999
|
5538
|
5320
|
6000-7499
|
6585
|
6250
|
7500-9999
|
8582
|
7460
|
10000-above
|
14033
|
11500
|
a. Run a regression to explain average consumption as a function of average income. Show your work in STATA and report your results below:
|
Coefficient
|
Std. Error
|
t
|
P > |t|
|
Average income
|
|
|
|
|
b. Use the Park test to test the residuals from the equation you ran in part a for heteroscedasticity, using average income as the potential proportionality factor Z (Use α =.05). Report the coefficient and pvalue on the proportionality factor. What do you conclude as a result?
c. Run a 5-percent White test to test for heteroscedasticity. Report the test statistic and critical value then provide your conclusion.
d. Formulate the White test regression that is used. (Write this down here) Then, run the regression in STATA. Use the output to show how STATA calculates the White test statistic (write the numbers).
e. In this example, the ranges of the income brackets are not constant, so the variables are means of ranges of differing widths. Therefore, it would seem reasonable to think that different range widths might produce different variances for the error term, making heteroscedasticity even more likely.
Re-run the regression from part a using heteroskedastic-corrected standard errors. Report your results below:
|
Coefficient
|
Std. Error
|
t
|
P > |t|
|
Average income
|
|
|
|
|
You are looking at data on determinants of housing prices. An example dataset can be found in the Excel spreadsheet: House_price.xls:
price = house price, $1000s
lotsize = size of lot in square feet
sqrft = size of house in square feet
We want "price" to be our dependent variable and the other variables to be our independent variables; however, we believe that heteroskedasticity may be a concern because of the large range in housing prices. Why would taking the natural log of our variables (creating a double log form) be a possible solution to our problem? Explain. [Hint: If you are not sure, try taking the natural log of the price column to see how it changes the values].