Assignment: Comparing Means From Two or Fewer Groups
This assignment will require that you extract data from the course database using MySQL workbench and evaluate the data for analysis using RStudio. In the space below each problem, include the MySQL scripts or R scripts and the respective output that was generated to solve the problem.
You previously conducted an analysis to investigate nursing home clinical outcomes. Based on your previous findings you discovered that the average rate of high risk long residents with pressure ulcers in MN nursing homes was lower than the national benchmark. Further, you discovered that there were no differences in the average percentages when comparing the state of MN and WI. In an attempt to determine if nursing homes in MN show similar results for a short stay residents, you were tasked with conducting a similar analysis on this resident population. Specifically, you were charged with determining if there are differences in the average percentage of short stay residents with pressure ulcers that are new or worsened when comparing MN to the national benchmark. Also, you were asked to evaluate if differences exist between MN and WI.
1. Use the course database to extract the following attributes of data from the respective fields. Also, limit the results to just reveal data where the msr_cdis equal to the measure for the percent of short stay residents with pressure ulcers that are new or worsened. For your answer to this problem, show the MySQL script you used for your query.
Table Name
|
Attribute name
|
nursing_home_provider_info
|
provnum
|
nursing_home_provider_info
|
state
|
nursing_home_quality_measures
|
msr_descr
|
nursing_home_quality_measures
|
measure_score_3qtr_avg (rename this column to avg_value)
|
HINT: Remember to join the nursing_home_provider_infoand nursing_home_quality_measurestables based on the attribute that exists in both tables! Also, don't forget the WHERE clause! You should have 10,130 rows returned!
2. Export the results of the MySQL query to a CSV file and save the file as shortstay.csv. Import this spreadsheet to RStudio and attach the data. Next, create a new dataframe called mnwidata where you limit the entire shortstay dataset to observations where the state is equal to MN or WI. Attach the mnwidatadataframe.
3. Explain the distribution of the avg_value. Demonstrate the method that you used to determine the distribution of the data.
4. Explain how the distribution of the data can impact the results of a T-test. If necessary, demonstrate the method you used to ensure the data is normally distributed.
5. Specify a null and alternative hypothesis to determine if there is a difference in the average percentage of short stay residents with pressure ulcers that are new or worsened in MN nursing homes compared to the national benchmark of 1.5%.
6. Use a statistical test to determine if the average percent of short stay residents with pressure ulcers that are new or worsened in MN nursing homes is different than the national benchmark of 1.5%
a. Include the R scripts and R output from your statistical test.
b. Offer an interpretation of the p-value and specify if you should accept (fail to reject) or reject the null hypothesis.
7. Specify a null and alternative hypothesis to determine if there is a difference in the average percentage of short stay residents with pressure ulcers that are new or worsened in MN nursing homes compared to WI nursing homes.
8. Use a statistical test to determine if the average percent of short stay residents with pressure ulcers that are new or worsened in MN nursing homes is different WI nursing homes.
a. Include the R scripts and R output from your statistical test.
b. Offer an interpretation of the p-value and specify if you should accept (fail to reject) or reject the null hypothesis.
c. Specify the mean percentage (not the log of the mean)of short stay residents with pressure ulcers that are new or worsened for MN and WI nursing homes.
9. Create a boxplot to compare the avg_value between MN and WI. Include axis labels and a title.
10. Create a barplot with error bars to compare the avg_valuebetween MN and WI. Include axis labels and a title.