Regression Nuts and Bolts
Consider the following income and consumption data (each measured in $1,000s) from 10 randomly sampled households. Copy the data into excel.
a) For income and consumption: at the bottom of each column calculate the mean (=AVERAGE),the variance (=VAR), and the standard deviation (=STDEV) for each variable.
b) For x=income, create a column which contains (xi - x ¯ ), the deviations from the mean. What is the mean of this column? Explain. Excel Hint: suppose the first data element is in cell c1 and mean is in cell c25. When you write the formula for the difference from the mean write it as c1-c$25, not c1-c25.
c) Create another column that has these terms squared:(xi - x ¯ )2 . Then use the column to manually calculate the variance and get the same answer as in question 1. Again, as in (3a) divide by n-1 instead of n when calculating the variance. Write the formula that you used.
d) Now let y=consumption, and create a column with the values of (yi y ¯), and yet another column with the products, (xi - x ¯ ) (yi - y ¯ ). Using the data in this column, manually calculate the sample covariance between income and consumption. As with the sample estimate of the variance, divide by n-1 instead of n. State which formula you used. (If you check your work using the covar function in Excel, note that Excel divides by n when calculating the covariance but n-1 when calculating the variance. but when calculating the covariance it divides by n.6) Using the means, variances and covariances calculated above, find the following:
a) The slope coefficient β1. Note: to get the right β1 you must ensure that the covariance and variance have the same denominator (i.e. both n-1).
b) The intercept: β0
c) A column of predicted values: y^
d) A column of residuals:u^
Verify that the fitted values have the same mean as the actual y-values, and that the residuals have mean zero.
Show that the variance of the fitted values plus the variance of the residuals adds up to the variance of y. Find the R-squared.