Lab 1: Using SPSS, Graphing, Basic Descriptive Statistics, z-scores, & t-tests
TURN IN:
- A word document containing answers to the questions below only including appropriately labeled and formatted supporting tables and graphs ~ 120 points
- Copies of your annotated SPSS syntax and output ~10 points
a) When one annotates syntax it means to add notes and comments to oneself and others that SPSS does not read as code. In SPSS you would do this in the syntax window. Click where you want to add your comment and begin with an asterisk (*), then end your comment with a period (.). If it is correct it will turn gray. Your code should be annotated such that I will be able to easily identify what question the code is addressing.
Part I: Entering and analyzing a small data set (30 points)
Radial Growth (RGR) of Lichen R. geographicum in 17 Successive 3-month periods in North Whales in Relation to Eight climatic variables
RGR
|
Temp_max
|
Air Frosts
|
Rain Days
|
Rainfall
|
Temp_min
|
Ground Frosts
|
Sunshine hours
|
Wind Speed
|
Season
|
0.04
|
7.6
|
8
|
59
|
207.7
|
6.2
|
16
|
609
|
8.6
|
1
|
0.58
|
20.1
|
0
|
38
|
306
|
11.6
|
0
|
237.3
|
8.1
|
2
|
0.15
|
10.6
|
3
|
47
|
317.7
|
5.5
|
33
|
181.9
|
10.7
|
3
|
0.18
|
11.6
|
17
|
51
|
194.5
|
1.9
|
49
|
171.6
|
11.8
|
4
|
0.07
|
8.3
|
8
|
24
|
97.8
|
6.1
|
33
|
619
|
8.1
|
1
|
0.37
|
19.6
|
0
|
41
|
287.4
|
11.4
|
1
|
287.4
|
8.7
|
2
|
0.33
|
19.4
|
3
|
68
|
457.7
|
6
|
25
|
186.9
|
9.3
|
3
|
0.17
|
6.8
|
43
|
44
|
175.8
|
0.26
|
57
|
276.8
|
8.7
|
4
|
0.10
|
13.9
|
2
|
48
|
295.6
|
6.9
|
19
|
488.5
|
10
|
1
|
0.29
|
17.9
|
0
|
63
|
328
|
12
|
0
|
318.4
|
10
|
2
|
0.16
|
10.6
|
17
|
52
|
318.8
|
4.2
|
41
|
200.6
|
8.7
|
3
|
0.35
|
19.8
|
34
|
49
|
233.4
|
0.87
|
52
|
217.4
|
8.4
|
4
|
0.18
|
14.5
|
2
|
48
|
197.8
|
7
|
16
|
521.2
|
8.5
|
1
|
0.14
|
18
|
0
|
42
|
231.1
|
11.3
|
3
|
495.5
|
7.5
|
2
|
0.22
|
10.9
|
13
|
57
|
463.9
|
5
|
26
|
223
|
8.6
|
3
|
0.20
|
8.9
|
8
|
72
|
349.8
|
4
|
17
|
199.2
|
12.4
|
4
|
0.34
|
17.9
|
2
|
31
|
140.3
|
7.2
|
20
|
220.3
|
6.7
|
1
|
- Enter the data into SPSS:
Give each of the variables a label and set the type of variable in the variable view window. Set the decimal places to the correct place. For the season variable create labels such that 1 = spring, 2 = summer, 3 = fall, and 4 = winter. (Hint: You should be able to copy and paste columns of data into SPSS, remember to exclude the variable names when copying.)
2. Create a variable that indicates which observations fall above the mean RGR and which ones fall below it. Do the same for Rainfall. Be sure to give your new variables names, labels, value labels, and set the level of measurement. (10 pts)
(Hint: In SPSS use Transform à Recode into Different Variables).
- Use the data to answer the following questions:
a. What are the counts and percent of the categorical variables? (5 pts)
b. Based on Season and the new rainfall variable during what season does it rain more?
Explain the basis for your choice (3 pts).
c. Create a bar graph which depicts RGR by season. Add a title to the graph, copy the
graph and paste it into the document (5 pts). There are a few ways you can answer
this question. You only need to include 1 figure.
d. Create a histogram of RGR overall and another paneled by the new rainfall variable.
Add a title to the graphs, copy the graphs and paste them into the document (5 pts).
4. Paste a screen shot of the variable view into the document. (2 pts)
Part II: Computing and comparing descriptive statistics (25 points)
Using the Calcium.sav data set available on Blackboard, answer the following questions:
- Compute descriptive statistics for Alkaline Phosphatase, Calcium, and Inorganic Phosphorus.
a. Compare the mean, median, and mode in words for each variable. (5pts)
b. Create a histogram for calcium paneled in rows by gender and boxplot for Alkaline Phosphatase by gender. Edit each graph to have an appropriate title and labels, paste the graphs into this document. (5pts)
6. Compare characteristics between Males and females in the Calcium data set and fill in the frequency table below. For continuous variables put the mean and standard deviation. For categorical put the percent in each category. Hint: Use crosstabs for categorical variables and compare means for continuous variables. Then write a paragraph describing the differences between the men and women in this sample based on your analysis. (15 pts)
Table 1: Characteristics of the patients in Calcium.sav
Characteristic
|
Male (n=)
|
Female (n=)
|
|
|
Mean (sd) or N (%)
|
Mean (sd) or N (%)
|
|
Age (years)
|
|
|
|
Alkaline Phosphatase ((IU)/Liter)
|
|
|
|
Calcium (mmol/L)
|
|
|
|
Inorganic Phosphorus (mmol/L)
|
|
|
|
Lab
|
|
|
|
Metpath
|
|
|
|
Deyor
|
|
|
|
St. Elizabeth's
|
|
|
|
CB Rouche
|
|
|
|
YOH
|
|
|
|
Horizon
|
|
|
|
|
|
|
|
|
|
Part III: z-scores and t-tests (65 points)
Use the bodyfat2.sav file from Blackboard to answer the following questions.This data set contains body measurements body fat percent calculated using 2 different methods for a random sample of 252 college graduates.
- Make histograms of adiposity, fatpct1, fatpct. Superimpose a normal curve on them. Do any of these distributions appear approximately normal? Describe how each of the distributions is different from the normal distribution (make sure to discuss skew & kurtosis). Include the histograms with appropriate titles and labeling in your response. (5pts)
- Generate a fake, normally distributed Caloric intake for each case in the file. To do this, use TransformàCompute variable and place the following expression into the dialog boxes:
Variable name: Calories
= RV.NORMAL(1500,200)
These instructions generate a random sample from a normal distribution with a mean of 1500 and a standard deviation of 200.
a. Make a histogram of Calories and superimpose a normal curve on it. (2pts)
For the following "your sample" refers to the variable Calories you created in the data set. If it was exactly normally distributed then it would have the mean and standard deviation you requested. (Hint: You will NOT use SPSS to answer b, c, and d directly.)
b. If Calories is exactly normally distributed, what percentage of cases should have values between 1350 and 1600? What percentage of cases in your sample have Calories in this range? (2 pts)
c. What percentage of cases would you expect to have Calories 1800 or more if Calories is exactly normally distributed? What percentage of cases in your sample have Calories greater or equal to 1800? (2 pts)
d. What percentage of cases in your sample have Calories less than 1200? What would you expect if the distribution is exactly normal? (2 pts)
- Compute standard scores for the Calories variable. (Hint: Use AnalyzeàDescriptive Statisticsà Descriptives; click the "save standardized values as variables" box)and check the box to save standardized values as variables. This creates a new variable in your dataset called zcalories, what the z-score of calories would be for each subject in your dataset.)
-
- Make a histogram with a normal curve superimposed. (1 point)
-
- What is the mean of this distribution? The standard deviation? Is this what you expected? Why? (2 pts)
-
- Using the z-score table, what percentage of the cases would you expect to have standard scores between -1 and 1? Between 0 and 1.5? Greater than 2? Less than -2? (4 pts)
One-sample t-test
- Test the hypothesis that for all college graduates, the population value for body fat percentage is 20 when using method 1
a. What are the null hypothesis you want to test? The alternative hypothesis?
b. Test the hypotheses and write a brief summary of your conclusions. (Hint: You have ONE sample.)
c. What is the 95% confidence interval for the average body fat percentage using method 1? (Hint: use the Explore procedure) How does it differ from the 95% confidence interval for the difference?
d. Based on the 95% confidence interval for the mean can you reject the null hypothesis that the average population value for body fat percentage using method 1 is 18%? Explain.
Independent samples t-test
- We can use the body fat data to test hypotheses about differences in body fat percentages based on gender and age.
- Describe the age and gender composition of those in the sample. Be sure to look at age by gender as well.
d. Test the null hypothesis that men and women have the same body fat percentage using method 1. List your null and alternate hypotheses and summarize the results. Can we assume equality of variance for the two groups? Why or why not?
e. Select only those 50 years old or older ). (Hint: use Dataà Select casesà "if condition is satisfied" and click if, enter the conditionage > 49.9). Test the null hypothesis that the average body fat percentage is the same for men and women over 50 years old. List your null and alternate hypotheses and summarize the results. Can we assume equality of variance for the two groups? Why or why not?
Paired samples t-test
- Use Compute to create a new variable that is the difference between the between the two body fat percent variables (variables fatpct1 and fatpct2). (Hint: use TransformàCompute)
a. Make a histogram of the difference.
b. Make a Q-Q plot of the differences. How does the distribution differ from the normal distribution? (Hint: Use AnalyzeàDescriptive StatisticsàExplore; under the "Plots" button, click the "Normality Plots with Tests" box)
c. Perform a paired t-test using the Paired-Samples T-Test procedure. Write a brief summary of your results. Be sure to state your null and alternative hypotheses. (Hint: Use the original variables)
d. Run a one-sample t-test on the difference variable. What is the difference you are testing for? Write a brief summary of your results. Be sure to state your null and alternative hypotheses.
e. Compare the results of the two tests. In what ways are the two tests different?