Assignment - Probabilities and Power Analysis
PART ONE: PROBABILITIES
Suppose that 30 persons who ate lunch at a company picnic developed salmonella infections (i.e. food poisoning). We are interested in finding out what food at the picnic was associated with becoming ill. To investigate this, we ask the 30 individuals who became ill (the cases) and another 80 persons who did not become ill (the controls) what they ate. Of note, eating a particular food is considered an "event". We recorded the food items offered that are we thought were most likely to be the source of the infection. This is not a collectively exhaustive list of all food items, just those foods we considered highly suspect. Also, some folks may have consumed more than one food item so not all of the events are mutually exclusive.
Tables 1.1 (page 2) and 1.2 (page 3) show the results of our data collection for case and controls, respectively. Questions 1-3 ask you to calculate probabilities based on the food consumed by cases while Questions 4-6 ask you to calculate probabilities based on the food consumed by controls.
Refer to Chapter 4 in Dawson and Trapp (2004) for definitions and rules related to probabilities. In particular, you will be need to refer to the Addition and Multiplication rules for calculating the combination of events. Look for cue words in the questions to help you decide which rule to apply. As a final note, pay close attention to what food items the questions are referring to when calculating the probability. Not all questions are the same.
1) Calculate the probability of eating each food for those who became ill (i.e. the 30 cases). In Table 1.1 (below), report the probabilities (as proportions not percentages) and show 2 decimal places.
Table 1.1: Probabilities of eating different foods for cases.
Food Item
|
Number of Cases who ate that food
|
Proportion (or probability) who ate that food
|
Chicken
|
18
|
|
Hamburger
|
10
|
|
Hot Dog
|
12
|
|
Chicken Salad
|
15
|
|
Seafood Salad
|
8
|
|
Potato Salad
|
16
|
|
Pasta Salad
|
8
|
|
Deviled Eggs
|
10
|
|
Coleslaw
|
6
|
|
* Eating a particular food is considered an event
Use the results from Table 1.1 to help you calculate the following probabilities for cases. Report proportions out to two decimal places.
2) What is the probability that a case ate either a seafood salad or chicken salad if no one ate both?
3) If the conditional probability of eating pasta salad given that someone ate chicken is equal to 0.20 for cases, what is the probability that a case ate both pasta salad and chicken?
4) Calculate the probability of eating each food for those who did not become ill (i.e. the 80 controls). In Table 1.2 (below), report the probabilities (as proportions not percentages) and show 2 decimal places.
Table 1.2: Probabilities of eating different foods for controls.
Food Item
|
Number of Controls who ate that food
|
Proportion (or probability) who ate that food
|
Chicken
|
20
|
|
Hamburger
|
30
|
|
Hot Dog
|
30
|
|
Chicken Salad
|
16
|
|
Seafood Salad
|
20
|
|
Potato Salad
|
16
|
|
Pasta Salad
|
22
|
|
Deviled Eggs
|
27
|
|
Coleslaw
|
16
|
|
* Eating a particular food is considered an event
Use the results from Table 1.2 to help you calculate the following probabilities for controls. Report proportions out to two decimal places.
5) What is the probability that a control ate either a hamburger or hot dog if no one ate both?
6) If the conditional probability of eating potato salad given that someone ate a hamburger is equal to 0.25 for controls, what is the probability that a control ate both potato salad and a hamburger?
PART TWO: POWER ANALYSIS
Nutritionists at George Washington University want to compare two different diets for a group of diabetic patients. Investigators plan to test the null hypothesis that the mean difference in blood glucose (mg/dL) for patients following Diet 1 will be the same as those patients following Diet 2. The research hypothesis states the mean difference in blood glucose will be different between the two diet groups. Investigators plan to draw their random sample of diabetic patients from the Washington DC area. Recruited patients will be randomly assigned to one of two diets. A fasting blood glucose test will be conducted on each patient at the beginning of the study and again 8 weeks later.
The biostatistician on the project wants to conduct a power analysis to determine the sample size needed to detect group differences of 6 to 10 mg/dL. The standard deviation of blood glucose distribution for Diet Group 1 is reported to be 11.5 mg/dL; the standard deviation of blood glucose distribution for Diet Group 2 is reported to be to be 8.8 mg/dL. The biostatistician wants to estimate the number of subjects needed in each group (assuming equal sized groups) and decides to run an analysis for a two-sample t-Test at a significance level of 0.05 for a two-tailed test. In order to create a thorough recommendation for the study team, the analysis is run at four levels of power (80%, 85%, 90% and 95%).
Follow the instructions provided on Blackboard to complete Questions 1 and 2. A link to the online power calculator is also available on Blackboard.
1) Calculate the effect sizes and fill in the results in Table 2.1 (below) accordingly.
Table 2.1: Sample size estimations for two-group comparison
Mean Difference in Blood Glucose for Diet Group 1 (SD = 11.5 mg/dL)
|
Mean Difference in Blood Glucose for Diet Group 2 (SD = 8.8 mg/dL)
|
Effect Size (Cohen's d)
|
4 mg/dL
|
10 mg/dL
|
|
3 mg/dL
|
10 mg/dL
|
|
2 mg/dL
|
10 mg/dL
|
|
1 mg/dL
|
10 mg/dL
|
|
0 mg/dL
|
10 mg/dL
|
|
2) Carry over the effect sizes that you calculated in Table 2.1 and put them in the first column of Table 2.2 (below). Calculate the sample size estimations for each effect size and level of statistical power. Fill in the results accordingly.
Table 2.2: Sample size estimations for two-group comparison*
|
Statistical Power |
Effect Size
|
80%
|
85%
|
90%
|
95%
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Numbers within each cell represent the sample size per group
3) Refer to Table 2.2 and describe what patterns you see in the sample size values. What happens to sample size as you read across a row (i.e. as statistical power changes)? What happens as you read down a column (i.e. as effect sizes change)?
4) Choose two cells from Table 2.2 (from two different columns and two different rows) and interpret those values. Remember that the purpose of this power analysis is to provide an estimate for the sample size of the study. So in your interpretation, it is the number in the cell that reflects how many people would need to be recruited per group given the power and the effect size (column and row respectively).