1. Facts about correlation.
Answer the following questions about correlation (r).
What is (are) the strongest value(s) the correlation can ever be? __________
If there is no relationship, r is equal to __________.
The correlation coefficient ranges from ________ to _______.
If the points fall in an almost perfect, negative linear pattern, r is close to: _____
If the points fall in an almost perfect, positive linear pattern, r is close to: _____
2. Relationship between Height and Weight.
Data has been collected on 219 STAT 200 students. Weight is measured in pound and Height in inch. Below are some
descriptive statistics of Weight and Height.
Then a linear regression was performed on height and weight. The output looks as follows:
In Minitab:
Write the regression equation based on the output.
What is the response variable (dependent variable) and what is the predictor (independent variable)?
Based on the equation, what is the slope? Please interpret the slope as the change in Y per unit change in X in the context of
the variables used in this problem. Be specific: For each additional unit in (X variable), we expect that (Y variable) will (increase
or decrease) by ______.
Based on the output (the coefficient output-remember, the constant is the y-intercept and NOT the slope), what is the test of the slope for this regression equation?
That is, provide the null and alternative hypotheses, the test statistic (t value), p-value of the test, and state your decision and conclusion.
Assume a student is 65 inches tall. Is it possible to predict their weight based on this analysis (we should only use the
regression equation to predict Y for the range of X values in the data set-is 65 within the range of the X values)?
If so, please estimate their weight using the regression equation.
What do the Fitted (predicted) values and Residuals represent?
For example, there is one record in the data set with height = 54 and weight = 110.
Please use these values to explain what is the fitted value and what is the residual (calculate the fitted value and then calculate the residual value using the values given to you).
What is the interpretation of R-square (20.72%) and how to calculate the correlation based on it?
3. Relationship between Weight and Gas Mileage in Automobiles:
Data has been collected on 25 vehicles of various models and makes. Weight is measured in pounds and Gas Mileage is
measured in MPG (miles per gallon). Below are some descriptive statistics of Weight and Mileage.
Descriptive Statistics: Weight, Mileage (Gas Mileage)
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum
weight 25 0 4038 196 982 2460 3319 3863 4797 6400
mileage 25 0 24.56 1.18 5.92 17.00 19.50 23.00 29.00 38.00
Open the dataset Weight_Mileage found in the Datasets folder in ANGEL
In order to decide whether to use a Regression Model to see if there is any relationship between the weight of a vehicle and
the gas mileage for that vehicle, we must see if there is a linear relationship between the variables.
It does not make sense to use a Regression Model if the variables do not have a linear relationship between them.
a. Create a scatter plot of the measurements by selecting Mileage for the y-axis (response) and Weight for the x-axis (predictor).
Describe the relationship between Mileage and Weight. Is the relationship a linear relationship?
If the scatterplot indicates a linear relationship, is the relationship positive or negative? Copy and paste your scatterplot below:
Minitab: Graph > Scatter Plot > Simple > y = Mileage, x = Weight
b. Using software find the correlation between Mileage and Weight.
The correlation also indicates the strength of a linear relationship. Provide the correlation value and the correlation p-value.
Does this agree with our findings in the scatterplot? Is the correlation statistically significant (is the correlation found in our
sample just by chance or is there enough evidence to conclude that the linear relationship found in the sample is due to an
underlying linear relationship between the variables in the population)?
Minitab: Regression > Correlation > variables = mileage, weight
c.Perform a linear regression with the Response (dependent variable) Mileage and the variable Weight as the Predictor (independent variable).
Minitab: Regression > Simple > choose your response and predictor variables >
do not choose any "options" > graphs-check "residual plots" (you will only need to copy and paste the "residuals vs fits" and
the "normal probability plot of residuals" later in the problem). Copy and paste your output below (except plots):
i) What is the regression equation?
ii) What is the R-square value (see Model Summary)?
iii) What is the slope coefficient, the slope coefficient t value and its p-value? What does this indicate?
iv) Copy and paste your "residuals vs fits" plot and indicate whether you believe the constant variance assumption is valid or not valid and why? (we have a very small data set, so it is unlikely to satisfy the assumptions exactly, but are the residuals approximately in a horizontal band around 0 with equal distance on either side of 0?)
v) Copy and paste your "normal probability plot of residuals" and indicate whether you believe the assumption of normality is valid or not valid and why? (are most of the residuals approximately aligned along the diagonal line?)