Assignment:
Q1 Which is true about the probability values (i.e P>|t| column) from a Stata printout (in regards to the estimated coefficients):
1. They tell you the significance level at which you can just reject the null hypothesis that the coefficient has no effect (assuming a two-sided test), which is the same as 1 minus the level of confidence with which you can reject the null hypothesis.
2. They tell you the probability that the coefficients understate the true value.
3. They tell you the importance of each variable on the dependent variable.
4. None of the above
Q2 Suppose the following is the true population model of the effect of exercise on weight in pounds: weight = β0 + β1 minutes of exercise + β2height+ β2Female+ε. The average person's weight is 170lbs, the average person exercised 10 minutes per day, and the average person is 65 inches. If your estimate of β1 is -1.5, how do you interpret the estimate of β1?
1. Holding height and gender constant, if one person exercises ten more minutes per day as compared to another person, the first person would be expected to weigh 15 fewer pounds as compared to the second person.
2. Holding height and gender constant, if one person exercises ten more minutes per day as compared to another person, the first person would be expected to weigh 1.5 fewer pounds as compared to the second person.
3. Holding height and gender constant, if one person exercises ten more minutes per day as compared to another person, the first person would be expected to weigh 1.5 more pounds as compared to the second person.
4. Holding height and gender constant, if one person exercises ten more minutes per day as compared to another person, the first person would be expected to weigh 15 more pounds as compared to the second person.
Q3 Let's say an economist ran a regression of wage ($/hr) on education (in years) and experience (in years) and got the following sample regression function: wagehat=10.50+.5ed+.75exp. For someone who has 12 years of education, and has worked for 10 years. What would we predict their wage to be:
1. $10.50
2. $24.00
3. $16.50
4. None of the above.
Q4 Let's say you regress sales_price on gross_square_feet only for two observed houses (that is, you only have two observations in your data set): what will be the value of R2?
1. 1
2. 0
3. .5
4. Not enough information to answer the question.
Q5 If you haven't already done so, using the "QuizIIDataset.csv" data file, create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Also if you haven't already done so, create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2), where the given lat/long coordinates are for the Empire State Building. Next, run a regression of lnPPSF on year_built, year, land_square_feet, gross_square_feet, and DistESB. You must use the robust standard errors, which corrects your standard errors for heteroskedasticity.
1. The coefficient estimate for gross_square_feet is about 6.7 times greater than the standard error (in absolute value).
2. The coefficient estimate for gross_square_feet is small, therefore, we conclude it has no effect on housing prices.
3. We cannot reject the null hypothesis that gross_square_feet does not affect housing prices.
4. The coefficient estimate for gross_square_feet is about 6.3 times greater than the standard error (in absolute value).
Q6 On Stata printout, the standard errors (Std. Error) column tells you which of the following:
1. The estimated standard deviations of the population coefficients.
2. The estimated standard deviation of the residuals.
3. The estimated probability that we can just reject the null hypothesis for each coefficient.
4. The estimated standard deviations for the estimated coefficients.
Q7 The following table is a regression of the price of a house (in thousands of dollars) on the number of bedrooms, the size of the lot (in square feet) and the square feet of house itself.
Dependent Variable: PRICE
Method: Least Squares
Included observations: 88
Variable
|
Coefficient
|
Std. Error
|
t-Statistic
|
Prob.
|
C
|
-21.77031
|
29.47504
|
-0.738601
|
0.4622
|
BEDROOMS
|
13.85252
|
9.010145
|
1.537436
|
0.1279
|
LOTSIZE
|
0.002068
|
0.000642
|
3.220096
|
0.0018
|
SQRFT
|
0.122778
|
0.013237
|
9.275093
|
0.0000
|
R-squared
|
0.672362
|
Mean dependent var
|
293.5460
|
Adjusted R-squared
|
0.660661
|
S.D. dependent var
|
102.7134
|
S.E. of regression
|
59.83348
|
Akaike info criterion
|
11.06540
|
Sum squared resid
|
300723.8
|
Schwarz criterion
|
11.17800
|
Based on the above regression, which of the following is true?
1. The level of significance for which we can just reject the null that the # of bedrooms has no effect on the price is 0.1279.
2. We can reject the null hypothesis that the number of bedrooms does not affect the price with 99% confidence.
3. The number of bedrooms definitely determines the price of the house.
4. We reject the null hypothesis of a zero coefficient on the bedroom variable with a 10% level of significance.
Q8 For this question, you need figure out how to merge the "QuizIIDatasetSandyAddon.csv" data set with the "QuizIIDataset.csv" data file (hint use nycid as the key and merge one to one). After you merge the two data sets (if you haven't done it already) create a new variable called lnPPSF which is the log sale_price divided by the gross_square_feet. Then run a regression lnPPSF on the year_built, year, land_square_feet gross_square_feet, and surgeheight, where the surgeheight variable tells you how many feet the Sandy storm surge rose at each house in the sample. After you run the regression answer the following question.
Based on the regression, we can conclude which of the following:
1. A 10% rise in the storm surge led to about a 2.4% rise in housing prices all else equal.
2. A one foot rise in the storm surge led to about a 2.4% drop in housing prices, all else equal.
3. The storm surge had no statistically significant effect on housing prices.
4. None of the above.
Q9 In Stata, import the "QuizIIDataset.csv" data file. Create a new variable called lnPPSF, which is equal to the log of the sale_price divided by the gross_square_feet. Then run a regression of lnPPSF on year_built, year, land_square_feet, and gross_square_feet. The year variable is year the sale took place; the year_built is the year the house was constructed.
Based on the Stata print out, which of the following is true:
1. On average, prices increased about 13% each year.
2. On average, prices increased about $13,000 each year.
3. We can not reject the null hypothesis of a no price increase over the period.
4. According to the regression, prices fell during the period.
How do we interpret the coefficient for BEDROOMS?
1. For each additional bedroom, the housing price goes up by $13.85.
2. For each additional bedroom, the housing price goes up by 13.85%
3. For each additional bedroom, housing price goes up by $13,852.
4. None of the above.
Q10 If you haven't already done so, using the "QuizIIDataset.csv" data file, create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Also if you haven't already done so, create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2) , where the given lat/long coordinates are for the Empire State Building. After you do that, answer the following question.
Run a regression of lnPPSF on year_built, year, land_square_feet, gross_square_feet, and DistESB, and answer the following question.
According the Stata printout:
1. There is no effect of distance the Empire State Building on housing prices.
2. Each additional mile further away from the Empire State Building shows that housing prices increase, on average, by about 4.4%.
3. Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about 4.4%.
4. Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about 0.044%.
5. Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about $0.044.
Q11 Let's say you run a regression of sales_price on gross_square_feet, and you generate the residuals from this regression. Then let's say you regress gross_square_feet on the residuals from the first regression, what would be the value of R2 be?
There is no way to determine this information.
Which of the following is not one of the classical/standard OLS assumptions?
1. The expected value of the error term is equal to zero
2. The variance of the errors is a function of the size of the exogenous variables.
3. No two of the explanatory variables are perfect linear combinations of each other.
4. The dependent variable is determined by a linear function of the population parameters.
Q12 Using the "QuizIIDataset.csv" data file, if you haven't already done so, create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Next create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2), where the given lat/long coordinates are for the Empire State Building. After you do that, answer the following question.
What is the correlation coefficient between the lnPPSF and DistESB?
1. -.0234751
2. 0.0778
3. 0.0010
4. -0.0778
5. None of the above.
Q13 If you have not already done it, for this question, you need figure out how to merge the "QuizIIDatasetSandyAddon.csv" data set with the "QuizIIDataset.csv" data file (hint: use nycid as the key and merge one to one). After you merge the two data sets, (if you have not done so already) create a new variable called lnPPSF which is the log of sale_price divided by the gross_square_feet. Then run a regression lnPPSF on the year_built, year, land_square_feet, gross_square_feet, and surgeheight, where the surgeheight variable tells you how many feet the Sandy storm surge rose at each house in the sample.
After you run a regression, test to see if there is heteroskedasticity with regard to the residuals and then choose the correct answer below.
1. We cannot reject the null of no hesteroskedasticity.
2. We can reject the null hypothesis of no heteroskedasticity with greater than 99% confidence.
3. The evidence suggests that we need not concern ourselves with heteroskedasticity.
4. We can reject the null hypothesis of no heteroskedasticity with just 70% confidence.