Read each question carefully before attempting to draft an answer to the question. When asked to perform a calculation, make sure to show all work to receive partial or full credit.
PART I
Using data7-24, you wish to explore whether the sale price of houses is affected by location, specifically what (if any) is the impact of the house being in the city of Coto de Caza? Ensure that you clearly articulate any transformations of the variables you find useful, the testable hypothesis, any econometric tests, and your conclusions.
a. Specify and estimate a bivariate model of your choosing. Make sure to motivate why you have specified the model and present the results of the model.
b. Specify and estimate a multivariate model of your choosing. Explain why you have specified the model the way you have. Motivate the multivariate analysis (why might one want to use multivariate analysis?) and compare and contrast the results with the bivariate analysis.
c. Are there any further tests you should conduct and alterations you should make to the analysis. Why? Please conduct test(s) and make necessary corrections in your model. Is there any impact on your conclusions.
PART II
The dataset, Derby, contains winning times for the Kentucky Derby since its inception.
According to a cursory review of the literature, winning time can be, for the most part, explained by the high temperature on race day, whether or not it has rained in the past 24 hours, and whether or not the winning horse is a male. Also, based upon your research, you know that females and geldings have won the race in 1876, 1882, 1888, 1914, 1915, 1918, 1980, 1988, 1920, 1929, 2003, and 2009.
(You might want to use the operator ifelse for this, remember that the use of & means and, and | means or. Also, you can hit enter after the &, |, or == and r knows that you are continuing the same line of code on the next line. For example:
data7_24$bdrms34<-ifelse(data7_24$bedrms==3|
data7_24$bedrms==4, 1, 0)
Returns the value of 1 if there are 3 or 4 bedrooms, and 0 otherwise.)
a. Build an appropriate model and test whether there is empirical evidence to support the contention that the average winning time for male horses is different from females and geldings?
b. Examine the stationarity of the winning time series. What can you conclude regarding the stationarity of the series?
For the following question, examine the period 1945 to 2014 only.
To do this you will need to take a subset of your data using the code subset in R.
The format for this is:
Newdata<-subset(olddata, parameters)
Ex. Newdata<-subset(data7-24, bedrooms<=3)
Would return a subset of the data in which all homes had 3 or fewer bedrooms.
In a recent article, "Thoroughbreds are running as fast as they can, " the author states: "Since 1945, the time it takes thoroughbreds to run around the 1.25-mile track has averaged 2:02.25, and no winning race has deviated by more than 3 seconds from this long-term average."
c. What is the average winning time after 1945? Standard Deviation?
d. Do you find support for the author's above conclusion?
e. Is there evidence to support the claim that the winning time series is stationary after 1945? Is it necessary to transform the series?
f. Transform the series as necessary, examine the hypothesis that, when controlling for other variables, that rain on race day slows the winning time.
g. What can you conclude about the impact of rain on the winning time?
h. Another argument is that it is not the high temperature that affects winning time, but the difference between high and low temperatures on race day. Is there empirical evidence to support this argument?
A final argument is that winning times are fairly stable across time, that is, if you know the winning time in 2013, you can impute the winning time in 2014.
i. Using your model in 2(f) as a starting point, expand the model to include the first lag of winning time as an explanatory variable.
j. What can you now conclude about the impact of rain on the winning time?
PART III
Consider a model for new capital investment in a particular industry, where the cross section observations are at the county level and there are T years of data for each county. The variable taxit is the measure of the marginal tax rate on capital in county i at time t, and disasterit is a dummy variable equal to one if there is a significant natural disaster in county i at time t (for example, a major flood, hurricane, etc). The variables x are other factors affecting capital investment and the a,c represent the unobservable individual and time specific effects.
log(investit) = β1taxit + + β2disasterit + θt + j=1∑k γjxjit + vit
vit = ai + ct + uit
a. Why is allowing for individual and time specific effects in the above equation important?
b. Briefly explain what kinds of effects might be captured in the ct.
c. Briefly explain what kinds of effects might be captured in the ai.
d. Using the nc_crime data set, you want to examine whether tax revenue per capita influences the crime rate as measured by crime committed per person. Estimating the two-way fixed effects model in logarithms and controlling for the probability of arrest, conviction, and receiving a prison sentence and police per capita and population density, what is the impact of a 1% change in tax revenue per capita on the crime rate?
Attachment:- Data.zip