We now turn to multiple regression and, in particular, the three-variable regression model.
The population regression function is:
Yi = β0 + β1X1i + β2X2i + i
Consider what happens if X2i = 1 + 2X1i.
Yi = β0 + β1X1i + β2X2i + i
Yi = β0 + β1X1i + β2(1 + 2X1i) + i
Yi = β0 + β1X1i + β2 + β22X1i + i
Yi = (β0 + β2) + (β1 + 2β2)X1i + i
Yi = β00 + β01X1i + i
You can see that we're only getting two parameters (β0 and β01) back even though the original model speciβcation had three (β0, β1, and β2).
There are multiple values of β0, β1, and β2 which will solve the two equations:
β01 = β1 + 2β2
β00 = β0 + β2
The bottom line is that we cannot estimate the separate in uence of X1 and X2 on Y.
What we have been discussing so far is really perfect multicollinearity.
In the case where we have two independent variables, this occurs when the correlation, r, between two variables is 1.
In the case where we have more than two independent variables, this occurs when your independent variables, or combinations of your independent variables, are not linearly independent, i.e., your matrix of independent variables is not of full rank.