CASE STUDY: Are the Fish Safe to Eat?
Assessing which Factors affect Mercury Levels in Fish in Maine Lakes.
Mercury is a heavy metal that occurs naturally in the environment through several forms (Elemental, organic, inorganic) and through human activity (burning of fossil fuels and Incineration of waste). In particular, large amounts of mercury are found in some fish.
Mercury enters fish in two ways: (i) direct absorption of mercury from water and (ii) by eating other organisms which contain mercury. Human consumption of mercury is known to lead to neurological and physical disorders. A study was carried out in Maine to investigate which characteristics of lakes are associated with higher levels of mercury in the fish in the lakes. Data were collected on 120 lakes.
The data is posted on the course website. The variables in the dataset are:
_ Lake: name of the lake.
_ Mercury: the mercury level in the Fish in parts per million (ppm).
_ Dam: whether a dam is present or not - an indicator variable which is 0 if there is no functional dam present (so all water flow is natural) and is 1 if there is a man-made dam in the drainage area of the lake.
_ Type: each lake is classified as 1. 'oligotrophic' (sustains fish based on its vege-tation and oxygen), 2. 'eutrophic' (has few fish), or 3. 'mesotrophic' (in between oligotrophic and eutrophic).
Some questions of interest are:
A. Most states consider a mercury level of more than 0.5 ppm to be high enough to take action (issuing a health advisory, clean-up methods, etc.). Are mercury levels in Maine high enough to be of concern?
B. The industries that benefit from dams are concerned that environmentalists will claim that high levels of mercury in fish are related to the presence of dams. Does the data support this claim?
C. Do mercury levels vary by lake type? If so, specifically how?
D. Do mercury levels vary by lake type differently for lakes with dams than for lakes without dams.
Assignment Questions:
It would be helpful to examine the data and do some preliminary analysis / descriptive statistics before answering these questions. (You do not have to hand in this preliminary work but you may use it when commenting on justification of methods). Treat your work as an exploratory analysis, in which case Type II errors are typically of more concern than Type I errors. So use a significance level of 0:1. For each of the following questions, include the necessary output/plots/numbers to answer the questions and give a complete commentary on the results:
1. Answer Question of Interest A. Name the statistical method(s) you are using and Justify. Specify the parameter or test of interest and define your notation. Display a relevant confidence interval and/or test and then make a conclusion in practical terms.
2. Create a new variable called \DamType" that combines the type of lake and whether or not a dam is present. i.e. create 6 categories, one for each combination of lake type and dam status. Create three sets of side-by-side box plots to compare mercury levels: (1) between lakes that have a dam and do not have a dam, (2) between different types of lakes, (3) between the 6 combinations of type and dam status. Do there appear to be differences? Elaborate. Are there any outlier lakes? If so, name them.
(To code \DamType" in R, create a new vector that will hold category names for the combined type and dam status. Then loop through the vectors for Dam and Type, use and if statement to check for each combination and then assign a category name for each combination. Use categories names that will easily identify the combinations. Similar code was used in the Spock Conspiracy Trial to create the variable \judge1").
3. Use a two-sample t-test to investigate Question of Interest B. If you found out-lier(s) in Q2, then do the analysis both with and without the outlier(s) and give the conclusions. Are the outlier(s) influential?
4. Use a One-Way ANOVA to investigate Question of Interest B. What relationship should exist between the test statistic from this question and from Q3? If you found outlier(s) in Q2, then do the analysis both with and without the outlier(s) and give the conclusions. Are the outlier(s) influential?
5. Use a One-Way ANOVA to investigate Question of Interest C. If you found outlier(s) in Q2, then do the analysis both with and without the outlier(s) and give the conclusions. Are the outlier(s) influential? If you conclude that there are differences in the mean mercury levels between the types of lakes, conduct an appropriate post-hoc analysis to determine where the differences lie. Give a practical conclusion.
6. Use a One-Way ANOVA to investigate Question of Interest D. If you found out-lier(s) in Q2, then do the analysis both with and without the outlier(s) and give the conclusions. Are the outlier(s) influential? If you conclude that there are differences in the mean mercury levels between the six categories of combined type and dam status, conduct an appropriate post-hoc analysis to determine where the differences lie. Give a practical conclusion.
7. Assess the validity of the model assumptions required for Q6 above. Display the relevant diagnostic plots / numerical values and comment with reference to them.
8. 8. Suppose you were to answer Question of Interest D. using a Two-Way ANOVA instead. Without carrying out the analysis in R, what conclusion would you expect and why? (Do not do the analysis or include output.)
9. The mercury level in each lake was found by combining flesh from a few fish caught in the lake and then analyzing the mixed sample. The number of fish per lake ranged from 2 to 5 and hence different lakes had different numbers of fish sampled. Is this a concern? If so, explain specifically why and suggest an alternative approach for doing the analysis (but do not actually redo the analysis). If not, explain why.
Attachment: data_mercury.xlsx