Discuss the prostate dataset in faraway package, Basic Statistics

Discuss the prostate dataset in faraway package

Assignment:

Prepare a short report with relevant output, your comments, and answers to the questions (this does not need to be exhaustive or polished, but should contain enough to show that you completed all tasks and analyses).

1. Consider the prostate dataset in the faraway package. Let lpsa be the outcome and treat all other variables as predictors.

(a) Fit the full model. Comment on the fit and discuss which variables seem to be signifiant or not.

(b) Use the regsubset function in R to compute the RSS for each of the best models of a particular size. Plot these best RSS values against the number of betas in the model. Comment on what you see.

(c) Use the RSS values to compute the AIC, BIC, adjusted R², and Mallow's C_p. Plot these values, comment on the models that they choose, and discuss how similar/different they are.

(d) Use the step function in R to apply stepwise regression with the AIC. Compare with the previous model you selected using AIC.

(e) Refit the model with the lowest BIC. Comment on the fit and discuss which variables seem to be significant. Contrast this with what you saw in (a).

2. Consider the prostate dataset in the faraway package. Let lpsa be the outcome and treat all other variables as predictors.

(a) Fit the full model (as with the previous assignment). You do not need to comment on the fit or parameters.

(b) Compute the Ridge estimate which has the lowest GCV. Use 100 equally spaced lambdas between 0 and 10. Comment on any differences between the resulting estimate and the one form (a).

i. Fit the model using glmnet and plot the coefficients as a function of the penalty, comment on any patterns you see.

ii. Fit the model using cv.glmnet and plot the standard errors as a function of the penalty. Create a second "zoomed in" plot using ylim, so that one can clearly see where the minimum is reached.

iii. Fit the model using glmnet but take λ to be lambda.min from cv.glmnet. Do the same thing, but now using lambda.1se. Compare the estimates with each other as well as with (a) and (b).

iv. Take the predictors selected by lasso with "lambda.1se" as the tuning parameter and refit the model using lm. Comment on any differences with (a).

3. Consider the data from playoffs.csv. In this data set, ten years worth of baseball seasons are summarized (1995-2004). We are interested in in the relationship between the number of playoff appearances and the population size of the teams market (in millions).

(a) Compute a logistic regression with playoff appearance as the outcome and population as the predictor. Interpret the beta for population in terms of the odds.

(b) Construct a new categorical predictor with levels "small", "medium", and "large", where "small" denotes a market with a population under 3 million and a "large" market is over 5 million.

(c) Refit your logistic regression, but with the categorical predictor above (don't include the old population variable). Set the baseline level as "large". Summarize the fit of the model.

(d) Using the output above, compute the following:

i. the odds that a team from a large market makes the playoffs,

ii. the odds that a team from a small market makes the playoff,

iii. the odds ratio between a medium market team and small market team making the playoffs. Interpret the ratio in terms of the odds.

Attachment:- Playoff Appearances.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Basic Statistics: Discuss the prostate dataset in faraway package

Reference No:- TGS02072704

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Basic Statistics: Discuss the prostate dataset in faraway package

Reference No:- TGS02072704

Have a Question? (oR Write a Review)

Recent Questions Asked Basic Statistics

Q : When fixed costs do not change

Q : Explain what are the possible causes of the symptoms

Q : Write a paper on the global impact of asthma

Q : What are the total equivalent units for direct materials

Q : Discuss the prostate dataset in faraway package

Q : Discuss current events and how they are related to your data

Q : Compute the ending balance of work in process account

Q : Prepare a cost of production report

Q : What are the risks of enacting or not enacting the policy

What physical wellness in the workplace refers to

Process of redesigning several floors of office space

What if an adult diagnosed with schizophrenia

Discuss problems within the health care system

What are parts of the patient safety competency

Discuss brain development and emotional intelligence

Caring for a client with pancreatic cancer

Request for Solution File

Ask an Expert for Answer!!

Basic Statistics: Discuss the prostate dataset in faraway package

Reference No:- TGS02072704

Recent Questions Asked Basic Statistics

Q : When fixed costs do not change

Q : Explain what are the possible causes of the symptoms

Q : Write a paper on the global impact of asthma

Q : What are the total equivalent units for direct materials

Q : Discuss the prostate dataset in faraway package

Q : Discuss current events and how they are related to your data

Q : Compute the ending balance of work in process account

Q : Prepare a cost of production report

Q : What are the risks of enacting or not enacting the policy

Asked Questions