Statistical Methods for Research Workers
Second Exam
Instructions: Answer every part of each of the four questions on this exam. Show all of your work. You may earn partial credit for incorrect or incomplete answers if you show work that demonstrates appropriate statistical reasoning or the application of an appropriate statistical method. Incorrect responses that do not show the reasoning behind the solution will receive no credit. This exam constitutes 20% of the final course grade.
When you review lecture materials and work on ordinary course assignments you are encouraged to participate in study groups, but this is an exam and it is expected that the work you turn in is completely your own. You may use your textbook, lecture materials, previous assignments, and other textbooks or web materials, and any computer software of your choice to complete this exam, but you are not allowed to discuss this exam with any other person except Joe Papio. Data sets for this exam are posted in the exams folder on the Blackboard course page. You may use the JMP software package, or any other software of your choice, to help answer the questions on this exam.
Your answers to the exam are due by 11:59 pm, Sunday, June 26, 2016. To receive full credit for this exam your solutions should be submitted on Blackboard as a pdf attachment or Word document. Copy relevant graphs and summary tables from the JMP output into appropriate parts of your solution. Part of your grade will depend on how well you communicate your conclusions. Be careful to use statistical language in appropriate ways. You responses should be thorough, concise and well written. By showing work, such as a formula for a test statistics or indicating what statistical methods was used, you may receive partial credit for answers even if numerical values are incorrect. Incorrect numerical solutions reported without showing work will receive no credit. Longer responses will not necessarily receive more credit. Submitting unfocused and rambling responses and long lists of irrelevant computer output may demonstrate a lack of understanding of the statistical concepts covered in this course and may not receive full credit, even if they contain all pieces of a correct response. Put your name on your solutions
Question 1:
Data on 200 men and 200 women were obtained from representative random samples of young men and women who applied for employment at a large U.S. corporation in 2012. During the application process, each applicant was given a Reading comprehension test, a Mathematics reasoning test, along with several other exercises which were used to generate an overall competency score. A review committee used these scores along with other factors when considering the applicants for employment. All of the applicants were between the ages of 18 and 25. The data file is posted as exam2_employment_competency.csv attached to the exam in Blackboard. You may read the file into JMP, or any other software package of your choice, to answer the following questions. The data file contains information on the following five variables
- Subject: subject identification number
- Gender: coded 1 for females and 2 for males.
- Reading: score on a reading comprehension test
- Mathematics: score on a mathematics reasoning test
- Competency: an overall assessment of employment competency obtained from a complex combination of the scores on the reading and mathematics tests and the scores on several other tests of reasoning, communication, organization, and social skills.
(a) Do these data provide evidence that women perform better at Reading tasks than men? Set up appropriate null and alternate hypotheses to answer this question, report a formula and a value for your test statistic, and clearly indicate how you reached your conclusion. Use an α = .05 significance level, and state your conclusion in the context of this study.
T-statistic= 4.38 with p-value <.0001.
The p-value is well below our 0.05 significance level, indicating that we should reject the null hypothesis. There is enough evidence to conclude that women do perform better at reading tasks then men.
(b) Construct a 99 percent confidence interval for the difference between mean Mathematics performance for women and mean Mathematics performance for men. Interpret your confidence interval in the context of the employment competence tests. Using an α = .01 significance level, would you conclude that there is a significant difference between mean Mathematics performance for women and mean Mathematics performance for men? Justify you answer.
Difference
|
-1.0050
|
t Ratio
|
-1.662
|
Std Err Dif
|
0.6047
|
DF
|
397.1905
|
Upper CL Dif
|
0.5601
|
Prob > |t|
|
0.0973
|
Lower CL Dif
|
-2.5701
|
Prob > t
|
0.9513
|
Confidence
|
0.99
|
Prob < t
|
0.0487*
|
99% C.I. for difference between mean mathematics performance for women and men = (-2.57, 0.56)
We are 99% confident that the difference between mean mathematics performance between women and men falls within the interval (-2.57, 0.56). This confidence interval contains the value zero, so we can conclude (at 0.01 significance level) that there is no significant difference between the mean mathematics performance for women and men.
(c) For parts 1(a)-1(b) of this question, state the conditions that must be checked to validate the use of the statistical methods that you employed. What evidence can you provide regarding whether these conditions are satisfied?
In parts 1(a and b) I used the t-test for two sample mean.
Randomization condition: This has been met, it was a representative sample of men and women who were applying at large U.S corporations.
10% condition: It is reasonable to assume that 200 men and 200 women are less than 10% of the population of men and women who applied to work at large U.S corporations in 2012.
Nearly normal condition: This condition has been meet by both groups of men and women which can be seen in a normal quantile plot. The normal quantile plot had some stacking of scores outside of the boundaries, but it is close enough to meet the nearly normal condition.
Independent condition: This condition has been met, because the men and women scores on their test were not dependent of each other.
Question 2.
The data file, exam2_student_survey.csv, consists of responses of a random sample of 60 students who earned undergraduate degrees from the social science programs at a large public university located in the southern United States. The headings at the top of the columns refer to the following variables:
- subject = subject identification number
- gender = gender (female, male)
- age = age in years
- hsgpa= high school GPA (on a four-point scale)
- cogpa = college GPA (on a four point scale)
- hiv = number of people you know who have died from AIDS or who are HIV-positive
- pa = political affiliation (D = Democrat, R = Republican, I = independent)
- pi = political ideology (1 = very liberal, 2 = liberal, 3 = slightly liberal, 4 = moderate,
5 = slightly conservative, 6 = conservative, 7 = very conservative)
- re = how often you attend religious services (0=never, 1=occasionally, 2=most weeks,
3=every week)
- ab = opinion about whether abortion should be legal in the first three months of pregnancy (yes, no)
- aa = support affirmative action (yes, no)
(a) Do students tend to have higher grade point averages in college than they did in high school? Explain how you arrived at your conclusion, and indicate what evidence you used to determine whether any necessary assumptions might be violated.
(b) Is there evidence of a linear relationship between high school grade point average and college grade point average? If so, it is a strong relationship? Explain carefully how you arrived at your conclusion, making full use of available information provided by the output.
(c) Find the least squares estimate of the regression line for predicting college GPA from high school GPA. Construct and interpret a 95 percent confidence interval for the slope of the population regression line for all students who have earned undergraduate degrees from the social science programs at this university
(d) Comment on how well conditions for using a linear regression model appear to be satisfied. You may use residual plots to help with this assessment.
(e) Construct a 90 percent prediction interval for the undergraduate college GPA for a student with a high school GPA of 3.25. Give an interpretation of this interval.
(f) Is there a significant association between attitudes regarding abortion and support for affirmative action? In your answer, be sure to clearly state the null hypothesis and the alternate hypothesis, describe how your test statistic is calculated, and check conditions for using your testing procedure. If you cannot demonstrate that all conditions are satisfied, state your concerns but proceed with your test and report the value of the test statistic, its degrees of freedom, and the corresponding p-value. Using a 0.05 level of significance, clearly state your conclusion in the context of the study. If you find a significant association, describe it.
Question 3.
A June 8, 2015 ABC/Washington Post Poll: "Poll Marks a Love/Hate View of the Affordable Care Act," summarized the results of a survey of US adults on support for The Affordable Car Act. A pdf with full results, charts, and tables is available at: https://www.langerresearch.com/uploads/1169a3ACA.pdf and has been posted with this exam.
The story reads in part as:
Public support for Obamacare tied its all-time low in the latest ABC News/Washington Post poll-even as most Americans say the Supreme Court should not block federal subsidies at the heart of the health care law.
With the high court set to rule on the latest challenge to the ACA, the poll reflects the public's split view of the law - criticism of its insurance mandate, yet support for extended coverage.
Overall, just 39 percent support the law, down 10 percentage points in a little more than a year to match the record low from three years ago as the Supreme Court debated the constitutionality of the individual mandate. A majority, 54 percent, opposes Obamacare, a scant 3 points shy of the high in late 2013 after the botched rollout of healthcare.gov.
METHODOLOGY - This ABC News/Washington Post poll was conducted by landline and cellphone telephone May 28-31, 2015, in English and Spanish, among a random national sample of 1,001 adults.Results have a margin of sampling error of 3.5 points for the full sample, including design effects. Partisan divisions are 30-22-36, Democrats-Republicans-Independents.
Additional details from the poll, including the following group breakdowns, can be found at:
https://www.washingtonpost.com/page/2010-2019/WashingtonPost/2015/06/08/National-Politics/Polling/release_398.xml
Overall, do you support the federal law that made changes to the health care system?
All 39%, Democrats 64%, Republicans 19%, Independents 35%
Overall, do you oppose the federal law that made changes to the health care system?
All 54%, Democrats 30%, Republicans 78%, Independent 56%
No Opinion
All 7%, Democrats 6%, Republicans 3%, Independent 9%
Use this information and any other information in the article to answer the following questions:
(a) Construct 95% confidence intervals for the percent of all US adults who support the healthcare law, the percent of Democrats who support the healthcare law, Republicans who support the healthcare law, and Independents who support the healthcare law. What assumptions are necessary for these results to be valid?
(b) Conduct an appropriate statistical test to see whether there are any differences in the percent of the populations of Democrats, Republicans, and Independents who oppose the healthcare law. Clearly state your null and alternative hypotheses. Describe how the value of the test statistic is calculated, check conditions, and report the value of your test statistic, its degrees of freedom and the corresponding p-value. Interpret your results in the context of this study.
(c) With respect to estimating the proportion of all US adults who support the healthcare law, check to see if the reported margin of sampling error of 3.5 percentage points for all respondents is correct. Also, calculate the margin of sampling error for Democrats, Republicans, and Independents. Show all necessary calculations and explain what you conclude from these findings. In constructing your answer, feel free to use the information available at https://abcnews.go.com/PollingUnit/story?id=5984818&page=1#.UU9KJzdTk2g, which is the explanation of sampling error provided by ABC News.
(d) ABC News is planning to do another survey in August, 2015, as part of their ongoing effort to track the current percentage of US adults who support the healthcare law. They want to obtain an estimate with a margin of sampling error that is no larger than 1.5 percentage points. How many adults do they need to include in their survey? Show how you arrived at your answer.
Question 4.
A study published in the New England Journal of Medicine examined the relationship between the presence of schizophrenia and the volume of the left hippocampus region of the brain. To do this, the researchers identified 15 pairs of monozygotic twins from the U.S. and Canada for which one twin was affected by schizophrenia and the other twin was not affected by schizophrenia. The researchers used magnetic resonance imaging to measure the volume (in cm3 ) of the left hippocampus region of the brain for every person in the study. The data are displayed in the following table.
Volume (cm3) of Left Hippocampus
Region of the Brain
|
Twin Pair
ID Number
|
Affected by Schizophrenia
|
Unaffected by Schizophrenia
|
1
|
1.94
|
1.27
|
2
|
1.44
|
1.63
|
3
|
1.56
|
1.47
|
4
|
1.58
|
1.39
|
5
|
2.06
|
1.93
|
6
|
1.66
|
1.26
|
7
|
1.75
|
1.71
|
8
|
1.77
|
1.67
|
9
|
1.78
|
1.28
|
10
|
1.92
|
1.85
|
11
|
1.25
|
1.02
|
12
|
1.93
|
1.34
|
13
|
2.04
|
2.02
|
14
|
1.62
|
1.59
|
15
|
2.08
|
1.97
|
These data have also been posted in the file exam2_schizophrenia.csv in the Data Folder of the course Blackboard page and in the Exam folder where this exam is posted.
(a) Is this an example of an experiment or an observational study? Explain.
(b) If the information is provided, identify the Who, What, When, Where, Why and hoW.
(c) Use the data to determine if the mean volume of the left hippocampus region of the brain is different for those affected with schizophrenia and those unaffected by schizophrenia. Clearly state your null and alternative hypotheses, check conditions for inference, report the value for your test statistic and a corresponding p-value, and clearly state your conclusion. Use an significance level.
(d) Comment on the advantages and disadvantages of using twins in this study.