Formulas written on one side of an 8 x 11 sheet. You may also use the statistical tables handed out in class, but you may not have anything written on them. You may use a calculator, but you may not program any formulas in it. Finally, you may also want to use a ruler. Write all your answers in a blue book. For all calculations, show your work, and keep all numbers to the third or fourth decimal. Justify all answers, but note that one or two lines per answer are usually enough.
Problem 1
There are two stocks in the stock market, stock A and stock B. If bought today, each would cost $10. Market analysts have constructed the following table of joint outcomes, with associated joint probabilities. An outcome is defined as the profit made on the stock at the end of one year (note in particular the profit can be negative, in which case it is a loss).
|
Outcomes for stock B (profit in 1 year)
|
-4
|
6
|
Outcomes for stock A (profit in 1 year)
|
-5
|
0.4
|
0.1
|
5
|
0.2
|
0.3
|
Denote by A the random variable that represents the profit of stock A in one year, and by B the random variable that represents the profit of stock B in one year. The table above tells you, for example, that Prob(A= -5,B= -4)=0.4.
a. Calculate the marginal probability that stock B will have a profit of 6 in one year. Express your answer with the appropriate notation.
b. Calculate the covariance between random variable A and random variable B. Express your answer with the appropriate notation.
Hints: 1) This will be much easier if you begin by proving that the expected values of both A and B are zero, that is, E[A]=E[B]=0. 2) Be careful with signs!
c. Based on your answer to part b., can you say whether the random variables A and B are independent? Why or why not? Note: this can be answered without any calculations!
Note: the key to getting an accurate answer for this problem is to be organized and careful. You may want to draw a table with each row being an observation. There, you would put residuals, predicted values, or whatever you need. It is also important to keep track of positive and negative signs.
A dataset contains information about 4 investors. For each investor we know: the investor’s number of years of experience in the stock market (variable Xi); and the investor’s net worth in thousands of dollars (variable Yi). The data is represented on the table below.
Observation number
|
Xi
|
Yi
|
1
|
0
|
23
|
2
|
8
|
-93
|
3
|
12
|
30
|
4
|
20
|
40
|
Consider the model Yi = þO + þ1Xi + ui, and assume that the three Least Squares Assumptions in the
textbook hold. An OLS regression using the data above yields þˆ |
= 2. It will simplify your calculations to
note that Y¯ = 0.
a.
Would you expect
þˆ
to be positive or negative? Explain, using economic reasoning.
b.
Calculate the OLS estimate for þO, that is, þˆ .
c. Sara has 15 years of experience in the stock market. According to the model, what is Sara’s predicted net worth?
d.
What is the interpretation of the regression estimate
þˆ
= 2? Be specific. In particular, your answer
must be quantitative and have some economic content. (Example of unrelated sentences that fit these two conditions: “Firm A makes $10 in profits; consumer B has $3 in consumer surplus; etc.”)
e. Calculate the predicted values for all four observations.
f. Calculate the Explained Sum of Squares (ESS).
g. Calculate the Total Sum of Squares (TSS).
h. Calculate the regression R2 of the regression above.
Note: the large-sample critical values for the two-sided t-statistic at the 10%, 5%, and 1% significance levels are, respectively: 1.64, 1.96, 2.58.
Data on average hourly wages in dollars per hour (avgwage) was regressed on education level by years of schooling (educyrs) and a constant. The population model is argwagei = þO + þ1 educyrsi + ui and the results are as follows. Assume that the three least squares assumptions from the textbook hold.
argˆwage = 5.34 + 1.589 educyrs, R2 = 0.0182, n = 250
(1.09) (0.705)
a. Calculate the p-value for the two-sided test:
HO: þ1 = 0 H1: þ1 G 0
Can you reject the null hypothesis at the 1% significance level? Use the p-value to justify your answer.
b. Construct the 95% confidence interval for þ1.
Is þˆ significantly different from 5.00 at the 10% significance level? Use a t-statistic (not a p-value) to justify your
answer.
d. (Two-line essay) Why do you think the R2 of this regression is so small? (Note that different answers are possible. What is important is to have a clear, concise, and correct economic argument.)
e. (Three-line essay) Are the errors in this model more likely to be homoskedastic or heteroskedastic? Why? (Again, what is most important is a clear, concise, and correct economic argument.)
A researcher has data on the average daily output of a number of workers. The dependent variable is output per worker, in dollars (variable OUTPUT). The researcher regresses this on a number of independent variables, with results as shown below.
REGRESSION CHARACTERISTICS
|
ANOVA TABLE
|
SS
|
Observations 420
|
Regression
|
1209.45
|
Dependent Variable OUTPUT
|
Residual
|
45678.12
|
|
Total
|
46887.57
|
Independent Variable
|
Independent Variable Definition
|
Coefficients
|
Standard Errors
|
Constant
|
|
-3.51
|
1.27
|
Effort
|
Each worker’s effort level (0-100 scale), measured by an objective evaluator.
|
1.03
|
0.32
|
Hours
|
Average number of hours worked by each worker
|
4.82
|
1.67
|
IQ
|
The worker’s IQ, measured through an IQ test
|
0.541
|
0.878
|
Educ
|
Education of the worker (in number of years)
|
2.48
|
0.97
|
a. Calculate the R¯2 of the regression (not the R2!).
b. An omitted variable in this regression is the incentives to production that each worker gets from his or her firm: variable “Incentives,” measured in cents received per dollar of the worker’s output. Do you think this variable may cause omitted variable bias? Explain.
c. For the omitted variable in part b. (Incentives), assume that it does cause omitted variable bias. Use a graph to explain whether the predicted sign of the bias is positive or negative.
d. Consider the regression Yi = þO + þ1X1i + þ2X2i + þ3X3i + ui. Suppose that the data has multicollinearity, in that X1i = 2X2i — 5X3i . In your own words explain why it is impossible to get an estimate for this regression.
e. Suppose that you run the regression Yi = þO + þ1X1i + þ2X2i + ui. The formula for the variance of the OLS estimator is as follows, where the notation is the same as the textbook.
2 1 1 2
oþˆ1 = n (1 — q2 ) o2
X1X2 X1
Which element of this formula represents the fact that this estimator is consistent? Briefly justify your answer.