1. There are 17 people in a doctor's waiting room, 5 of whom have the flu. If one person is chosen at random, find the probability that this person has the flu (yes, this is really easy).
2. A spinner can randomly land in an area that's one-third white, one-third garnet, and one-third black. We spin this twice.
a. Write out the sample space.
b. Let X = number of times the spinner lands on garnet in the two spins. Find P( X = 1 ), the probability of landing on garnet exactly one time.
3. In a lottery, suppose you select 5 numbers out of 30.
a. In how many ways can this be done, if the order in which they are selected is not important?
b. There is only one set of winning numbers. What is the probability you win?
4. Suppose 3 genes are to be chosen from 9. In how many ways can this be done, if the order in which they are selected is important?
5. Fifty lakes in South Carolina are studied. Of the 30 man-made lakes, 12 are polluted. Of the 20 natural lakes, 5 are polluted.
a. "By hand", draw a two-way table to display these results. Let the type of lake define the rows, and whether or not the lake is polluted define the columns.
b. If one lake is randomly selected from this group, find the following probabilities:
i. P( natural )
ii. P( man-made and polluted )
iii. P( man-made or polluted )
iv. P( not polluted | natural )
v. P( man-made | polluted )
c. Are the events of being man-made and being polluted independent? Give a numeric justification for your answer.
d. If two lakes are randomly selected (without replacement) from this group, find the probability that at least one is polluted (hint: use the complement rule).
e. if three lakes are randomly selected (without replacement) from this group, find the probability that all three are natural.
6. Use SAS to calculate the two-way table for problem #5. Start by creating a data set with three variables: type, polluted, and count. You will have four lines of data; on each line type should be 'manmade' or 'natural', and polluted should be 'no' or 'yes'. Count is the number of observations in that cell of the 2x2 table. Then use PROC FREQ to obtain the two-way table.
a. Turn in the SAS code you used and resulting printout.
b. On the printout, highlight the answers to parts (5-b-ii) and (5-b-iv) of problem #5 above.
7. Suppose 7% of construction workers have been exposed to asbestos. We randomly sample 4 construction workers (assume they are independent). Find the probability that none have been exposed to asbestos.
8. The results of a screening test for prostate cancer are given in the table below. Find the sensitivity and specificity of the screening test.
Prostate Cancer
Yes No
9. A free screening test for diabetes is offered at a health fair. Seventy-four people have a positive screen - of these, 45 are later determined to have diabetes. Sixty-eight people have a negative screen - of these, 1 is later determined to have diabetes. Using this data, find the sensitivity and specificity of the screening test.
10. Suppose a screening test for liver cancer has Se = .89 and Sp = .97.
a. Find the PPV and NPV for a population where 2% of the people have the disease.
b. Find the PPV if the prevalence in the population is .1, and interpret what this means in the context of this setting.
c. Explain why the PPV is higher in one of the two parts above than in the other. In doing so, use two of the following four terms: "true positives", "true negatives", "false positives", and "false negatives".
11. A cohort study looks at diet and obesity in children. It used 249 children whose diets were high in fat, and another 338 whose diets were not high in fat. After 10 years, the children were defined as obese or non-obese. The results are that 95 of those with a high fat diet are obese, while 43 of those whose diets were not high in fat are obese. Calculate the relative risk, and interpret it in the context of this problem.
12. A case-control study looks at traffic accident fatalities and seat belt usage. Of the 87 accidents in which there was a fatality, 23 of the drivers were wearing a seat belt. Of the 215 accidents in which there was not a fatality, 160 of the drivers were wearing a seat belt.
a. Draw a 2x2 table, with fatality/no fatality defining the columns and seat belt use defining the rows. Make sure the upper-left cell is the "fatality/no seat belt" combination - this means that we are treating a fatality as the "disease" and not wearing a seat belt as the "exposure".
b. Calculate and interpret the odds ratio, in the context of this problem. Interpret this first in the strict sense of an odds ratio from a case-control study, and then again as an estimate of the relative risk.
c. Use SAS to obtain a 2x2 table and the odds ratio. Again make sure the upper-left cell is the "fatality/no seat belt" combination. Turn in the result.
13. Consider a situation where we look at people who do not smoke. Suppose 20% of such people are exposed to second-hand smoke. Of those exposed, 40% eventually have some sort of breathing problem. Of those not exposed, 10% eventually have some sort of breathing problem. Draw a tree diagram to demonstrate this, and calculate the probabilities of the four possible endpoints.