1. J. Longley used the data in sheet "#1" to assess the computational accuracy of least-squares estimates in several computer programs in 1967, but now they are used to illustrate other econometric problems. The data are from 1947 - 1962 and are defined as follows:
Y = Number of people employed (thousands)
X1 = GNP implicit price deflator
X2 = GNP (millions of dollars)
X3 = Number of people unemployed (thousands)
X4 = Number of people in the armed forces
X5 = Noninstitutionalized population over 14 years of age
X6 = Year, equal to 1 in 1947, 2 in 1948, etc...
a) Run a regression of Y on X1 to X6 using OLS.
b) Explain what problem you suspect there is with the regression or data simply by analyzing the regression results.
c) Explain how and why you would address the issue from part b. If you estimate a new model, comment on the estimates generated and why they are an improvement over the model from part a.
2. Suppose you estimated the following model: Yi =β0 + β1Xi + ui . Furthermore, suppose you suspected that the variance of ui was equal to σ2Xi4. Explain how you would estimate internally valid estimates for β1 if this was the case.
3. Sheet "#3" has data on some characteristics of the wine industry in Australia for the years 1955 - 1974. The data (not given in logs) are defined as follows:
Q = real per capita consumption of wine
Pw = price of wine relative to CPI
Pb = price of beer relative to CPI
Y = real per captia disposable income
A = real per capita advertising expenditure
S = index of storage costs
Consider the following demand and supply equations for
ln Q = α0 + α1lnPw + α2lnPb + α3lnY + α4lnA + u (Demand)
ln Q = β0 + β1lnPw + β2ln S + v (Supply)
a) Estimate the above equations using OLS. What is the problem with these estimates?
b) List the exogenous and endogenous variables in both equations and then explain whether each equation is under, over or exactly identified.
c) If the OLS estimates for the demand and supply curve parameters above are problematic, explain and use some other technique to estimate these parameters. Are your new estimates "better" than your estimates from part a? Explain.
4. The data in Sheet "#4" refer to data on the copper industry with data defined as follows:
C = 12-month average U.S. domestic price of copper (cents per pound)
G = Annual GNP ($, billions)
I = 12-month average index of industrial production
L = 12-month average London Metal Exchange price of copper (pounds sterling)
H = Number of housing starts per year (thousands of units)
A = 12-month average price of aluminum (cents per pound)
a) Estimate the following model and interpret the results:
ln C = β0 + β1 ln I + β2 ln L + β3 ln H + β4 ln A + u
b) Analyze the residual plots and comment on the possibility of autocorrelation in the disturbance term.
c) Test for autocorrelation using the Durbin-Watson d statistic.
d) Explain how you would correct for the presence of first order autocorrelation (but you do not need to implement your recommendation).
5. In sheet "#5" is investment data on four companies for the years 1935 - 1954 taken from a famous study of investment theory by Y. Grunfeld. The four companies are:
GE = General Electric (Firm 1)
GM = General Motors (Firm 2)
US = U.S. Steel (Firm 3)
West = Westinghouse (Firm 4)
The data are defined as follows:
Y = Real Gross Investment (millions of $)
X1 = Last year's Real Value of the Firm (Value of Shares - Outstanding Debt)
X2 = Last year's Capital Stock
The theory of investment proposed and tested by Grunfeld is:
Yit = β0 + β1X1it + β2X2it + uit (1)
where i stands for the ith firm and t for the tth time period.
a) Briefly explain the economic logic behind Grunfeld's model and the expected signs of the coefficients in (1).
b) Run OLS on the pooled data.
c) Explain the possible problem(s) with the regression from part b.
d) The data is not quite ready for use as panel data. Create one variable for time and one for the company name (with numeric labels as shown above) so that you can run a fixed effects regression using Stata. Once you have created those variables, you need to declare your data set to be panel data, which you can do following the "longitudinal/panel data à Setup" option in the "Statistics" menu. )
e) Run a fixed effects regression on the data from part d. (Find the linear model option in the longitudinal/panel data menu and be sure to specify "FE" for fixed effects after entering the dependent and independent variables in the dialogue box. If you have trouble figuring out how to run this regression, please feel free to text or call me at 718-208-8606!)
f) If you have not done so already in part c, explain one example of a fixed effect that the model in part e corrects for.
g) Compare your estimates from parts b and e. Which model is better?
h) Explain one possible time fixed effect and how you would go about estimating the model with the inclusion of time fixed effects (but you do *NOT* need to actually estimate this model).