Summary Statistics and Hypothesis Testing Problem Set
Useful Commands for this Problem Set -
The underlined portion of each command indicates the abbreviation that can be used for that command. You should also refer to the guide from the Stata tutorial as you work through this problem set.
codebook: Provides a summary of a variable, including the mean, standard deviation, number of unique values, etc.
summarize: Provides summary statistics for a variable.
ttest: Performs a t-test of the mean.
Part I - Note: Part I does not require the use of Stata.
Variable
|
Definition
|
Hurricaned
|
Coded 1 if a FL county was hit by a hurricane, 0 otherwise.
|
Bush04
|
The % of the vote for Bush in the 2004 presidential election in FL counties (among the 2 major parties).
|
. sum bush04
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
bush04 | 67 .6 .1079281 .3 .78
1. Using the Stata output presented above, provide the following summary statistics. If you perform a calculation using the statistics provided in the Stata output, please show your work.
- What is the mean of the bush04 variable (X ¯)?
- What is the variance of the bush04 variable (SX2)
- What percentage of the vote share did Bush win in the county with the highest % of the vote in his favor?
2. Research by political scientists, Larry Bartels and Chris Achen, suggests that the act of voting can be more irrational than previously thought. They examine whether voters tend to punish incumbents for acts that could not in any way be the fault of an incumbent or an incumbent party. For instance, they looked at the effects of shark attacks in New Jersey in 1916 on the vote for Wilson in the presidential election. Findings suggested the shark attacks had a negative impact on the vote for Wilson. The counties that experienced the attacks failed to support the incumbent in that election. It seemed voters blamed the incumbent for something entirely out of his control. Bartels and Achen examined this hypothesis on other contexts, noting that voters tend to blame incumbents for droughts and other natural events outside of the government's influence.
Do voters blame incumbents for situations the government has no control over? To test this hypothesis, this question focuses on the impact of hurricanes in the 2004 presidential election in Florida at the county level. That year, Florida experienced four back to back hurricanes within months and weeks of the election. Did voters blame incumbent president Bush? The dependent variable is the Bush vote (of the 2-party vote) in the 2004 presidential election in Florida.
The variable Hurricaned equals 1 if a FL county was hit by a hurricane (hurricane-hit) and 0 otherwise (no-hurricane). Notice that 4 counties were not hit by a hurricane.
- What is % mean bush04 vote among counties that were not hit by a hurricane (X ¯no-hurricane)?
- What is mean % bush04 vote among hurricane-hit counties (X ¯hurricane-hit)?
- What is the difference between these means (X ¯no-hurricane- X ¯hurricane-hit)?
- What is the standard error of the difference between the means?
- State the null hypothesis and the alternative hypothesis to test that the mean vote for Bush in '04 in counties that were not hit by hurricanes is the same as the meanvote for Bush in hurricane-hit counties.
- Calculate the t-statistic you would use to test the null hypothesis.
- Test the null hypothesis at the 5% significance level. Do you reject or fail to reject the null hypothesis?
- Calculate a 95% confidence interval for the difference between the means (show your work).
- Does this evidence support the argument that voters punish the incumbent for situations outside their control (e.g. natural disasters)? In your explanation, consider whether this analysis excludes any important variables.
Part II - In this section you will use Stata to conduct an analysis on the relationship between having a college education and support for Trump in the 2016 presidential election.
This past year, it was widely reported that Trump's favorability was highest among those without a college education. Do education levels explain Trump's popularity in the recent election? Using the cnn2016election.dta dataset, determine whether there is a difference in support for Trump among those with and without a college education.
3. To accomplish this task, you will first need to create a new variable for college education with two categories, where 1 contains all individuals with at least some college education, and 0 contains all those without any college education. Label the new variable "colleduc."
- State the null and alternative hypothesis for this test.
- What is the t-statistic? Show how this t-statistic was calculated.
- What is the result of the hypothesis test?
- Do you believe this analysis provides a satisfactory answer to the question of whether education levels explain support for Trump in '16? Why or why not?
4. In Stata, repeat the t-test above but, instead of comparing college educated and non-college educated respondents, compare males and females. To do this, create a new variable labeled "gender" (from the variable "sex"), where male equals 0 and female equals 1.
- What is the t-statistic reported in the Stata output?
- Is there a statistically significant difference in support for Trump between men and women?
Attachment:- Assignment.rar