For this assignment you will continue to use data derived from Capital Bikeshare trip records from 2011 and 2012, this time analysing patterns in daily numbers of rentals by casual users.
References and Data Sources:
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.
Data file for this assignment:
The data file for this assignment is called daily.sas7bdat and contains daily counts of bike rentals for 2011 and 2012, derived from Capital Bikeshare trip history data, with additional weather and seasonal information. The data was downloaded from the UCI Machine Learning Repository. Variables in that file are as follows:
Assignment tasks:
Question 1
Carry out a one-way analysis of variance relating casual to weekday. Use contrasts to test at least one a-priori hypothesis of your choice. Examine and comment on residuals. Also carry out appropriate post-hoc comparisons and discuss your results.
Question 2
Use SAS to perform a one-way ANCOVA relating casual to weekday with atemp as a covariate, including appropriate post-hoc comparisons:
• Confirm that there is a linear relationship between the response variable and the covariate (a scatterplot and a correlation coefficient plus a comment will suffice);
• Check the two additional ANCOVA assumptions (report and comment only on the parts of the output most directly relevant to condition checking):
o Independence of the covariate and the treatment effect (perform a one-way ANOVA test; there should be no statistically significant difference);
o Equality of slopes (add and check significance of the interaction term);
• Report and briefly discuss your results.
Technical note: Make sure you obtain and examine Type III Sum of Squares (ss3). Also obtain estimates of ‘least squares means' (lsmeans) which are means by treatment adjusted for the covariate.
Question 3
(a) Carry out a one-way analysis of variance relating casual to season. Use contrasts to test at least one a-priori hypothesis of your choice. Also carry out appropriate post-hoc comparisons and discuss your results.
(b) Extend your analysis in part (a) to test whether there is evidence of interaction between season and the type of day (working day vs weekend or public holiday). Carry out appropriate post-hoc comparisons and discuss your results.
(c) The distribution of the number of casual users by season is actually not Normal so a Kruskal-Wallis test may be more appropriate to relate casual to season. Carry out this test and for post-hoc analysis, consider comparisons between summer and each of the other seasons. Discuss and compare your results to those in part (a).
Question 4
Write a summary of your findings from Questions 1 to 3. Keep the technical details of the analyses that led you to these conclusions to the absolute minimum. Rather, focus on practical significance and present your findings in non-specialist terms. One page will be sufficient.