Mat5212 biostatistics assignment perform the mann-whitney


BIOSTATISTICS ASSIGNMENT-

QUESTION 1- In total 34 students sat the test. Let xi be student i's assignment mark and yi be the same student's test mark. Below is the critical summary information needed to perform the regression.

i=1Σ34 xi = 1064.5,               i=1Σ34 yi = 1043.5,                i=1Σ34 xi2 = 37,921.75,       

i=1Σ34 yi2 = 35,487.25,        and        i=1Σ34 xiyi = 35,654.75

(a) Calculate the sum of square totals Sxx, Syy and Sxy.

(b) Calculate the correlation coefficient and coefficient of determination. How much variability in test results is explained by the assignment marks?

(c) Find the least-squares estimate for the regression of assignment marks onto test scores and write down the equation.

(d) Perform a hypothesis test to test whether β0 = 0 and another to test whether β1 = 0 (you can use any of the three we did in class).

(e) What is the 95% confidence interval for the expected grade on the test for a student that scored 40 on the assignment?

(f) What is the 95% prediction interval for the grade on the test for a student that scored 40 on the assignment? Comment on this result.

(g) In order to estimate how the students would do on the exam the lecturer looked at last year's exam  and  test  results  and  found  that  they  had  a  correlation  of  r = 1.05.  What can you conclude from this analysis?

QUESTION 2- The percentage of body fat is usually measured by a process called hydrostatic weighing, where the person is weighed while submerged in water to see the difference between their submerged weight and normally measured bodyweight. This is expensive and not always practical. The percentage of body fat of 78 males from Scotland between 18 and 30 years of age was measured along with their age (years), body mass (kg) and height (cm). A regression of age, height and mass was done onto the percentage of body fat to develop a model for predicting body fat percentage based on some easier to obtain measurable. The SPSS output of this regression is shown below.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

T

Sig.

B

Std. Error

Beta

1

(Constant)

19.957

10.638

 

1.876

.065

 

Age

.120

.052

.252

2.308

.024

 

Body Mass

.131

.057

.281

2.288

.025

 

Height

-.119

.065

-.227

-1.837

.070

 

a. Dependent Variable: %Body Fat

 

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

Regression

183.395

3

61.132

4.763

.004b

Residual

949.683

74

12.834

Total

1133.078

77

 

a. Dependent Variable: %Body Fat

b. Predictors: (Constant), Height, Age, Body Mass

Model Summaryb

Model Summaryb

 

Model

 

R

 

R Square

Adjusted R Square

Std. Error of the Estimate

1

.402a

.162

.128

3.5824

a. Predictors: (Constant), Height, Age, Body Mass

b. Dependent Variable: %Body Fat

(a) What is the multiple correlation coefficient and the coefficient of multiple determination? Explain what each of these terms mean, and comment on their value.

(b) What is the equation of the line of best fit relating the body fat percentage with the other variables?

(c) Predict the body fat percentage of a 32 year old male of height 181cm and body mass 104kg.

(d) Examine the ANOVA table in the SPSS output. What are the null and alternate hypotheses corresponding to the results of this ANOVA table? What can you conclude based on the results shown in this table?

(e) Which of the variables are significant predictors of body fat percentage? Explain your reasoning.

(f) Look at the residual plots below. Explain what each of them mean and comment on the validity of the assumptions of the multiple linear regressions in this case.

302_Figure.png

1322_Figure1.png

QUESTION 3- A birth is considered premature if the baby is born before 37 weeks. It is believed that smoking while pregnant can lead to a premature birth. Below is summary data from a study done on births in one particular hospital.

 

Born < 37 weeks

Born ≥ 37 weeks

Smoked

throughout

36

370

Never smoked

during pregnancy

168

3396

(a) Set up a 2 × 2 contingency table with the observed and expected values for each of the four categories as shown in the lectures.

(b) Determine at the 5% level of significance (α = 0.05) whether there is an association between premature births and smoking while pregnant. For full marks state the null and alternative hypotheses being tested, calculate the test statistic and the degrees of freedom, find the p-value, and interpret the results in terms of the hypotheses.

QUESTION 4- We saw a twins study done on smoking-discordant monozygotic twins. Students who performed an independent ??-test found that there was no difference in the mean at either a 1% (α = 0.01) or 5% (α = 0.05) significance level, students who did a dependent paired ??-test found that the mean difference was significant at a 5% but not a 1% significance level. The data is repeated below.

Non-Smoking Twin

Smoking Twin

 

Non-Smoking Twin

Smoking Twin

 

52.2

65.5

 

25.5

63.2

16.0

47.1

19.6

36.4

37.1

37.0

22.9

12.6

13.3

25.8

52.1

54.3

27.6

48.9

28.8

43.1

15.9

7.8

10.0

20.3

Assume we had reason to believe that the data may not necessarily be normally distributed. Instead we will try both approaches with the equivalent non-parametric test on the medians of the two groups.

(a) Perform the Mann-Whitney U-test on the two groups of data and see if the difference in the median values is significant at a 5% and 1% level.

(b) Perform the Wilcoxon signed-rank test for the paired samples and see if the difference in the median values is significant at a 5% and 1% level.

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Mat5212 biostatistics assignment perform the mann-whitney
Reference No:- TGS01407552

Expected delivery within 24 Hours