Background
The Health Information National Trends Survey (HINTS) is a national survey uniquely dedicated to learning how people find, use, and understand health information. An information sheet about the study is available on the Blackboard site. We will use the data for some simple statistical analysis.
Data
You will be using the HINTS 4 (Cycle 4) data to answer the assignment questions. A random sample of size 1500 was taken in order to simplify data handling. The data are available on Blackboard in .Rdata format (hints4_cycle4_sample1500.RData). You will need to download and save the data file onto your computer and then LOAD it into Rcmdr (Data => Load dataset...)
Do NOT print the codebook.
Note that data have not been recoded nor changed in any way but simply read into R by first reading in the SPSS format data saved as a text (.csv) file. This will mean the R automatically made some variables factors if there was only a small number of values. The variable names match those in the codebook. You may find it easier to remove variables that you don't need to make data handing and using the analysis menus more manageable.
Finally, you may see reference to analysis using sample weights. While this is common in complex surveys, statistical analyses should not be weighted for this assignment.
Assignment Questions
The answers to questions 1-3 must include
- what data you used (inclusion / exclusion)
- and selection of appropriate statistical measures and why,
- an appropriate presentation of the data in a graph and / or table,
- statements about assumption testing, and
- an appropriate conclusion based on a statistical test, with justification in text.
Report the p-value to 2 or 3 decimal places (p=0.02, p<0.001, p=0.008 for example) and other values to 2 decimal places. There will be penalties for going over specified word and page limits.
Analysis questions using HINTS data
These analyses require using several variables from the HINTS4 Cycle 4 data set and assessing some relationships between these variables. In particular, you will be testing the relationships between BMI and vegetable consumption, both as continuous variables or as categorical variables. Since the variables of interest are provided as either continuous or categorical then the other variable type will need to be constructed.
Variable names are in brackets and their details may be found in the codebook.
1. For all adults (18 and over), BMI categories are defined as follows: underweight is BMI < 18.5; healthy weight is BMI >= 18.5 to <25; overweight is BMI > = 25 to <30; obese is BMI > = 30. Construct a categorical variable with these categories from the continuous variable BMI (BMI). The number of cups per day of vegetables consumed is available in the HINTS dataset. Use the categorical variable (Vegetables) dropping the last category '4 or more cups' to look at the relationship between the BMI and Vegetable categories defined above. (Note that we wish to drop the last vegetable category so we can use the data as a continuous variable in Question 3).
2. Consider the two variables BMI as continuous (BMI) and the categorical variable the number of cups of vegetables consumed per day (Vegetables). Is average BMI (kg/m2) different between vegetable consumption categories?
3. Consider body mass index BMI (BMI) as a continuous variable. While we technically have only vegetable consumption categories (Vegetables) and not individual values measured precisely, we wish to consider the vegetable consumption variable as continuous. Construct a new continuous variable (Vege_Contin) where every value is set to the half way point of the category range. For instance, "None" would be set to 0 and "1 to 2 cups" would be set to 1.5. Use the variables BMI and Vege_Contin to examine the relationship between continuous BMI and continuous vegetable consumption. Discuss the appropriateness of this approach and interpret your results. Does this analysis add any new information to that obtained in Questions 1 and 2?
4. Write 200 words describing your overall conclusions from the above analyses in Questions 1-3. Comment on the findings. 10%
Data analysis plan
5. Write an analysis plan to examine the relationships between several variables with the minutes per week of at least moderate intensity exercise (WeeklyMinutesModerateExercise) from the HINTS4 Cycle 4 participants. These include age (SelfAge) and also as five categories (AgeGrpB), sex (GenderC) and health related apps on your tablet or smartphone (TabletSmartPh_HealthApps) apps that helped you achieve a health-related goal, such as quitting smoking etc (HealthApps_AchieveGoal) and apps that helped you make a decision about how to treat an illness or condition (HealthApps_MakeDecision). Finally, consider a brief analysis plan for a new variable ModExerciseCat constructed from WeeklyMinutesModerateExercise which is categorised as either being in the top 20% of the distribution (high level of moderate exercise) or the rest (other levels of moderate exercise).
Attachment:- sample data and example.rar