In this question you will explore and summarize the cereals.xls dataset.
(a)For each of the 16 variables in the dataset, state whether the variable is categorical (qualitative) or numerical (quantitative).For categorical variables, state whether the variable is nominal or ordinal.
(b)Create a table that summarizes the mean, median, min, max, standard deviation, and coefficient of variation for each of the quantitative variables.
(c)Examine the summary statistics for the eight variables that reflect the nutritional value of the cereals (calories through potassium). Use the coefficient of variation to find the three nutritional variables with the largest variability (we cannot use the standard deviation because the variables have different units). Generate histograms and boxplots for these three variables.
(i)Describe the distributions.Are the variables skewed?
(ii)Are there any outliers for these variables?
(d)Plot a side-by-side boxplot comparing the calories in hot versus cold cereals. What does this plot show us?
(e)What type of plot would you use to compare calories to rating?What type of plot would you use to compare type to rating?Generate the plots and describe the relationship between the variables.
(f)Generate the correlation matrix for the quantitative variables.
(i)Which variable has the strongest positive correlation with calories?Which variable has the strongest negative correlation with calories?
(ii)What percentage of the variance in these three variables would we capture if we replace the variables with thei