1. At this point we have done most of the techniques except linear regression. So please give me your proposed research question and just tell me what your data set you plan to use for your project (no need to send data).
2. Book questions 7.16, 7.72, 7.78, 7.80, 7.94, 7.103, 7.128, 8.84, 8.86, 8.90, 8.92, 8.94, 8.104 For questions that provide data, please use stata to perform the t-tests.
3. Demonstrate the convergence of a t distribution to a normal distribution using stata graphics. This should follow the code from previous homeworks with only minor changes. I hope you can alter the code yourself, but will help in class if you can’t.
I. Generate 50,000 observations from a random t (gen t03= rt(3)) distribution with the following degrees of freedom [3, 9, 27, 81] and also a standard normal distribution.
II. Determine the two tailed 95% scores of the t distribution for each degree of freedom and compare them to a z of the same significance (1.96).
III. Make a Kernel density plot which overlays all of the distributions.
(a) At what value do you feel the normal approximation of a t is ”good enough” and why based on this plot?
(b) Why would a normal distribution be a poor approximation for a t-distribution with 3 degrees of freedom. Make a normal probability plot (pnorm t03) and interpret it as part of your answer.
4. There are lots of difficult problems in the book based on sample size and power calculations. How about we let STATA do the work instead?
(a) Using STATA, determine necessary sample size for a study on the difference between test scores between two independent samples. Based on previous research, we expect the difference between samples to be about 10 points. The standard deviation for the first group is about 15 points and the second group’s standard deviation is about 17. Determine the sample size needed using the code below:
sampsi 0 10, sd1(15) sd2(17) power(.8) alpha(.07)
What is the sample size needed for each group?
(b) Using the same code as is above, consider the case where the standard deviations are equal and both 15. What is the new sample sizes needed? Why did this change?
(c) Now lets use the same setup to calculate power. Lets assume we have the same difference of 10, s1=15, s2=17, and group sample sizes of n1 = 50, n2=23. Calculate the power level and interpret.
sampsi 0 10, sd1(15) sd2(17) n1(50) n2(23)
(d) Now change both sample sizes to 31 and recalculate and interpret. Does the lower sample size (62<73) have more or less power? Please explain why this is occurring.
5. For this question, you will need to download the current population data from Census. This is so that you earn how to merge data files in Stata and also deal with some real, frequently used econometric data.
Step 1: Download the CPS data and dictionaries for October and November of 2015 from
https://www.nber.org/data/cps_basic.html and https://www.nber.org/data/cps_basic_progs.html } respectively.
Step 2: Read the data into STATA and merge based the household ID (hrhhid) using STATA’s merge command. Report how many cases merge correctly (the CPS is a rotational panel, so you will not be able to merge every observation.)
Step 3: Generate a new variable that indicates if the household income has increased from the previous month and test the hypotheses that the two months are not equal. Check all the assumptions for the model.
Step 4: Out of curiosity, what happened to variables with missing values? Were they included in the analysis? What is the danger caused by missing values?