Question 1
In an Australian city last year, the most popular organised sports for children were swimming and outdoor soccer. There were 19% of children who participated in swimming and 13% who participated in outdoor soccer. Suppose the two sports (swimming and outdoor soccer) were organised independently.
[a] What was the proportion of children who did not participate in swimming?
[b] What was the proportion of children who neither participated in swimming nor outdoor soccer?
[c] What was the proportion of children who participated in swimming only or outdoor soccer only but not both swimming and outdoor soccer?
Question 2
We have data on the lean body mass and resting metabolic rate for 14 women who are subjects in a dieting study. Lean body mass, given in kilograms, is a person's weight leaving out all fat. Metabolic rate, given in calories burned per 24 hours, is the rate at which the body consumes energy. The dataset is in the Excel file called ‘Metabolic' on Moodle.
[a] Construct a histogram to study the metabolic rate variable, and provide some brief comments describing the histogram.
[b] Find the correlation coefficient between the two variables. Provide some brief comments.
[c] Create a scatterplot that shows how metabolic rate depends on body mass. Comment on this scatterplot briefly.
[d] Find the least-squares regression line for predicting metabolic rate from body mass. Add this line to your scatterplot. Comment on the regression briefly.
[e] Identify one outlier in the Y direction and one outlier in the X direction. Remove them from the dataset. After removing these two outliers, find the new correlation between the two variables. Draw a new scatterplot for metabolic rate on body mass. Comment on how the correlation and the scatterplot changed after the two outliers were removed, and why.
[f] Fit a new regression for predicting metabolic rate from body mass using the dataset from [e], where the two outliers were removed. Add this regression line onto the scatterplot you created in [e]. Which subject has a particularly high metabolic rate value and which subject has a particularly low metabolic rate value relative to the pattern for the remaining subjects after the two outliers have been removed?
[g] Check the regression assumptions for the regression you fit in [f].
[h] Compare the two regression equations using their R-squared values, and identify which one is better. Why?
[i] Using the better regression equation which you identified in [h], predict the metabolic rate for a woman with a lean body mass of 45 kilograms.