1. Introduction (Background and Purpose)
The data used for this paper concern salary and other characteristics of all faculty in a small Midwestern college collected in the early 1980s for presentation in legal proceedings for which discrimination against women in salaries was at issue. All persons in the data hold tenured or tenure track positions (temporary faculty are not included). The data were collected from personal files and consists of the quantities described in task 2.1 of this paper.
The outcome of this process was that the salary is not influenced significantly by the sex of the person. When reading this book I was personally very surprised about this since normally the average women earns less than a man in the same position (which is definitely not a good thing!). Based on this, I was very interested which other factors influence the salary. That is why I chose this dataset to analyze for the final term paper in the course of economics.
2. Multiple regression
2.1 Data source and data description - (State explicitly where you download the data set for this project. Summarize variables in a table, where descriptive statistics and measurement unit of variables should be included.)
Variable
|
Description
|
Unit
|
Mean
|
S.D.
|
|
|
|
|
|
|
|
|
|
|
2.2 Model specification - (The model here should be the best model, see the following topic 5.2)
2.3 Result report - (Estimated equation, interpretation of coefficient, significance of explanatory variables, explain the meaning of R-squared and F value showed in Stata outcome)
2.4 Conclusion -
3. Omitted variable bias
3.1 Drop a variable - Drop one of explanatory variable from original multiple regression and report estimation result. Explain changes in estimated coefficients (why it changes a lot or why it changes a little, it is upward or downward bias)
3.2 omitted variable - Name a variable excluded from your original model and state the possible impact on estimated coefficients.
4. t test, F test and LM test
4.1 t test (type A) - Test the significance of one independent variable with t, F (in terms of SSR), F (in terms of R2) and LM test.
4.2 t test (type B) - Test H0 : βx1 = a, where a is a value you choose other than zero. After specification, you need to state the meaning of this hypothesis with your own language. Value a should be meaningful. You should avoid choosing meaningless number such as βx1 = 10000.
4.3 F test - Test a multiple hypothesis other than joint exclusion of all independent variables.
5. Functional forms
5.1 Try different functional forms - For at least 3 different functional forms, such as log-log, log-level, level-log, quadratic form, interaction term and inverse terms, report regression outcomes and interpret estimated coefficients for each model.
5.2 Determine the best model - Choose the best model among above models and elaborate reasons, which cannot just be "I think".
6. Prediction
Choose a reasonable set of particular values of independent variables, say (x1 = c1, . . . , xk = ck), and then give the estimate of E(y|x1 = c1, . . . , xk = ck) and its 95% CI.
Values of ci should be meaningful. For example, if x1 is years of education, you cannot just set c1 = 100, c2 = 100, c3 = 100 because 100 years of education is impossible and meaningless.
7. Heteroskedasticity
7.1 Perform BP test - (hint: estat hettest)
7.2 Report estimated equation with heteroskedasticity-robust standard error (hint: reg y x, robust)
Attachment:- Assignment Files.rar