1. Use the REACTIONTIME data set. We want to look at the relationship between reaction time and IQ, looking only at visual stimulus (use a WHERE statement to limit the results).
a. Plot the data, with reaction time on the vertical axis. Turn in the plot, and briefly comment on what you see.
b. Obtain the Pearson correlation. Do not turn in the printout, but report the correlation and the corresponding p-value.
c. A colleague sees these results, and concludes that people with higher IQ's react more quickly to a visual stimulus than people with lower IQ's. Do you agree with this? Why or why not?
2. Waist-to-hip ratio (WHR) was measured on 8 men just before they entered a weight loss program (Time 1) and again 6 months after the program (Time 2). The results are below.
Subject 1 2 3 4 5 6 7 8
Time 1 1.03 0.99 1.18 0.80 1.02 1.05 1.06 0.82
Time 2 1.01 1.02 1.12 0.78 1.06 1.00 1.08 0.76
data two;
input time1 time2;
datalines;
1.03 1.01
0.99 1.02
1.18 1.12
0.80 0.78
1.02 1.06
1.05 1.00
1.06 1.08
0.82 0.76
;
run;
a. Use PROC PLOT to plot the data, with the Time 2 WHR on the vertical axis. Turn in the plot, and briefly comment on what you see.
b. Calculate Spearman's correlation "by hand". Since there are no ties in either variable, use the easier method with the di for this calculation.
c. Use PROC CORR to obtain Pearson's and Spearman's correlations. Turn in the printout.
d. Use PROC REG to analyze the data, treating Time 2 WHR as the dependent variable (thus, you are predicting Time 2 WHR from Time 1 WHR). Use the P, R, CLM, and CLI options to get predicted values, residuals, confidence intervals, and prediction intervals, resp. Turn in the printout.
e. Use the prediction equation to predict the Time 2 WHR when the Time 1 WHR is 1.07.
f. On the printout, highlight the following:
- the largest absolute residual (the point where observed and predicted values differ the most)
- a 95% confidence interval for mean Time 2 WHR when the Time 1 WHR is 1.18
- a 95% prediction interval for an individual Time 2 WHR when the Time 1 WHR is 1.05
3. Again use the REACTIONTIME data set. Now we want to predict reaction time from age, looking only at tactile stimulus (again use a WHERE statement to limit the results).
a. Using PROC PLOT, plot the data, with the correct variable on the vertical axis. Turn in the plot, and briefly comment on what you can see from it.
b. Use PROC REG, with only the CLB option, to analyze the data. Turn in the printout.
c. Give an estimate of the variance, s2.
d. Give and interpret the R2 of this analysis.
e. Interpret what the estimated slope means. Be as specific to this problem as possible.
f. Give a 95% confidence interval for the slope.
g. Give the t0 test statistic and corresponding p-value used to test if the slope is zero, then interpret the result of this test. Use a = .05.
4. I ran an ANACOVA model on the REACTIONTIME data set, predicting reaction time from three independent variables: stimulus type, sex, and age. Edited results are below.
The GLM Procedure
Class Level Information
Class Levels Values
stimulus 3 auditory tactile visual
sex 2 female male
Dependent Variable: reaction
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 1050811.148 262702.787 49.45 <.0001
Error 258 1370500.579 5312.018
Corrected Total 262 2421311.726
R-Square Coeff Var Root MSE reaction Mean
0.433984 15.23198 72.88359 478.4905
Source DF Type I SS Mean Square F Value Pr > F
stimulus 2 133829.0964 66914.5482 12.60 <.0001
sex 1 13119.2442 13119.2442 2.47 0.1173
age 1 903862.8072 903862.8072 170.15 <.0001
Source DF Type III SS Mean Square F Value Pr > F
stimulus 2 206241.4252 103120.7126 19.41 <.0001
sex 1 480.4163 480.4163 0.09 0.7639
age 1 903862.8072 903862.8072 170.15 <.0001
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 266.6040715 B 16.14383997 16.51 <.0001
stimulus auditory 36.0504131 B 10.85996401 3.32 0.0010
stimulus tactile 68.0453367 B 11.02424621 6.17 <.0001
stimulus visual 0.0000000 B . . .
sex female 2.7490637 B 9.14125209 0.30 0.7639
sex male 0.0000000 B . . .
age 3.5157761 0.26952541 13.04 <.0001
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer
reaction LSMEAN
stimulus LSMEAN Number
auditory 484.403034 1
tactile 516.397958 2
visual 448.352621 3
Least Squares Means for effect stimulus
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: reaction
i/j 1 2 3
1 0.0204 0.0030
2 0.0204 <.0001
3 0.0030 <.0001
a. Give the p-value for the test of a relationship between reaction time and age (adjusting for the other variables in the model), then interpret the test result. Also give the estimated slope of this relationship.
b. For each of the other two independent variables, give the p-value for the test of a relationship between reaction time and that variable (adjusting for the other variables in the model), then interpret the test result. When necessary, complete the interpretation by making use of the least squares means information.
- A logistic regression is used to model the probability of having lung cancer, using smoking status (yes or no, where no is the referent level) and age as independent variables. Edited results of this analysis are below.
The LOGISTIC Procedure
Model Information
Response Variable LungCa
Number of Response Levels 2
Number of Observations Read 327
Number of Observations Used 327
Response Profile
Ordered Total
Value LungCa Frequency
1 yes 122
2 no 205
Probability modeled is LungCa='yes'.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 434.019 404.410
SC 437.809 415.780
-2 Log L 432.019 398.410
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 33.6087 2 <.0001
Score 32.6373 2 <.0001
Wald 30.1337 2 <.0001
Type 3 Analysis of Effects
Wald
Effect DF Chi-Square Pr > ChiSq
smoker 1 22.3236 <.0001
age 1 10.3185 0.0013
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -3.0705 0.6647 21.3417 <.0001
smoker yes 1 1.1440 0.2421 22.3236 <.0001
age 1 0.0329 0.0103 10.3185 0.0013
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
smoker yes vs no 3.139 1.953 5.046
age 1.033 1.013 1.054
a. Give and interpret the odds ratio for each independent variable (it might be easiest to interpret these as estimates of relative risks).
b. For each variable, give the CI for the odds ratio, state whether or not its CI includes 1, and what this implies.
Survival analysis is used to model survival time (in months) after a diagnosis of pancreatic cancer. A Kaplan-Meier curve is presented for the two treatment types (Experimental or Standard). Also, a Cox proportional hazards model is used, which adjusts for sex, race, and BMI. Edited results are below.
The PHREG Procedure
Model Information
Data Set WORK.ONE
Dependent Variable months
Censoring Variable censor
Number of Observations Read 508
Number of Observations Used 508
Type 3 Tests
Wald
Effect DF Chi-Square Pr > ChiSq
treatment 1 19.5948 <.0001
sex 1 0.0003 0.9851
race 1 7.5735 0.0059
BMI 1 5.2431 0.0220
Analysis of Maximum Likelihood Estimates
Parameter Standard Hazard 95% Hazard Ratio
Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits
treatment E 1 0.48363 0.10926 19.5948 <.0001 1.622 1.309 2.009
sex F 1 -0.00205 0.10961 0.0003 0.9851 0.998 0.805 1.237
race B 1 0.29154 0.10594 7.5735 0.0059 1.338 1.088 1.647
BMI 1 0.04968 0.02170 5.2431 0.0220 1.051 1.007 1.097
a. Give the hazard ratio and corresponding confidence interval for treatment (note that Standard is the referent level). Interpret this hazard ratio, state whether or not its CI includes 1, and what this implies.