Biostatistics written project:
For your final project, you may choose any type of data set, whether it is one that you go out and collect or one that you find published elsewhere (design, analysis and interpretation), about a real or hypothetical case of study.
Process of statistics:
Use the following process:
A. Design of experiment:
This includes knowing what an adequate population is, whether the population sampled is appropriate to the question, and also what the necessary sample sizes are.
B. Sampling:
For sampling, we need to know whether it was random (implying the use of a random number generator), haphazard (whoever you happened to grab), or nonrandom (biased toward a certain result).
C. Data collection:
In data collection, we need to be certain of the accuracy and consistency of the data collection, such as whether it was done by multiple observers or across different time frames.
D. Data presentation:
The graphical representation of data needs to be clear, and misleading or difficult to understand results need to be presented carefully.
E. Descriptive statistics:
This includes range (highest to lowest value obtained) and central tendency (mean, median, mode).
F. Statistical analyses:
We need to know what procedures to apply, and how many to apply.
G. Presentation of results:
How are results presented in a written presentation?
Guidelines for the various stages of Investigation:
You must be oriented that this project is about statistical thinking, much more than about statistical computation. Statistical thinking is required at all stages of an investigation into data, including:
1. Planning the Study:
• Here, we must understand the problem, then formulate it in statistical terms, being sure to clarify the scientific objectives carefully. This requires an extensive search of the literature and, mostly, forces us to ask lots of questions.
• Once we understand the problem, we can plan the data collection. We certainly want to balance the effort expended (and expense incurred) in collecting data with our efforts to analyze them. How the data are collected is crucial to determining our eventual analysis. Mistakes made here can rarely be overcome with nifty “analytical” work down the line.
2. Collecting the Data:
• Gathering data is not done haphazardly and well simultaneously, and poorly gathered data is often worse than no data at all. Gathering data will take 3 times as long as it “should” if you are working as part of a very experienced team, and have been through similar processes multiple times. Otherwise, it will take at least 5 times as long. Early timelines are always wildly optimistic, especially since HIPAA.
• After data collection, we need to process the data into a suitable form for analysis. This is another time sink – but a crucial one. Some sort of initial data analysis is mandatory, where we look at the data through graphical and numerical summaries, hunting for problems in data cleaning, outliers, surprising values of variables and combinations of variables, etc. Should you find a statistical programmer who is eager to do and very capable of doing this kind of work, hold onto them as if your life depends on it.
3. Analyzing the Results:
• Having assessed the data’s structure and quality, through checks of coding, typing and editing, and a thorough data cleaning, we then describe the data, identifying interesting features. Often, such a summary reveals much useful information, and guides further work. Statistical work is iterative. One look at data teaches us what to look at next.
• In selecting and carrying out more detailed analyses, we often assume a model structure and use it to test hypotheses. Our primary obligation is to investigate the plausibility of the assumptions of our model, usually through graphical and numerical summaries of the data and of model residuals. We then consider refinements of the model in light of our investigations in an attempt to find the best description of the phenomenon of interest.
4. Interpreting the Analyses:
• Having formulated a sensible model for the data, fit it, and checked the quality of fit, we can move on to interpreting the results. Often, this takes the form of comparing findings with prior results and (perhaps) acquiring further data to confirm findings.
5. Presenting the Study:
• Finally, we need to present the conclusions of the study effectively. Statisticians have much to say about useful presentation of data and inferential results. The key point is to let the data speak for themselves, convincingly.