Question 1:
The production manager of American Tool and Castings Company is conducting a study regarding the relationship between the numbers of alloy caps milled on a lathe versus the measure of distance from specification of outside cap diameters. The lathe uses a sharp steel cutting tool in a milling process to cut and shape raw alloy bars into caps. The lathe tool turns at a high speed while cutting into the alloy, in essence, cutting the alloy down to size and shaping it to resemble a round cap. A similar lathe tool cuts into the inside of the cap. The caps are later fit with interior gaskets and permanently sealed onto airtight canisters. After the steel cutting tool is used repeatedly, the tool begins to wear, hence cutting a larger outside cap diameter than desired. If the outside cap diameter is too large the cap can't be properly affixed and sealed to the canister. The production manager would like to build a model to estimate/predict how many caps a tool can mill until it wears down too much, hence milling caps that are too large in diameter and unusable. Each cap costs approximately $400 to mill, so defective caps are expensive. The main variable of interest (y) is "distance from specification" of outside cap diameter.
To conduct the study, 62 lathe tools were randomly sampled. Each lathe operator keeps a record of the number of caps milled by particular tool. Each cap milled is measured to see how close to specification the outside diameter is. According to specification, each cap should be 6 inches in diameter. For example, the measure in record one is 0.36, meaning it was 0.36 inches larger than specification. When each tool was sampled, the number of caps milled by the tool was recorded, as well as a measure from specification of the diameter of the last cap milled by that particular tool. The data for each cutting tool sampled and the measure of distance from specification of the outside cap diameter of the last cap milled is in the spreadsheet labeled American.
1. Scatter plot
Construct a scatter plot revealing the relationship between the number of caps milled by a tool and the distance from specification of the outside cap diameter. Make sure the x variable is on the x-axis and the y-variable is on the y-axis. Move the chart so that it starts in cell E3. Do not resize the chart beyond the red shaded region.
2. Correlation
Using a built-in Excel function in cell F22, calculate the correlation (r) between the number of caps milled by a tool and the distance from specification of the outside cap diameter.
In cell F23, indicate the strength of the linear relationship as very strong, relatively strong, very weak, relatively weak, or no relationship.
In cell F24, indicate if the relationship is positive or negative.
3. Anchoring the output in cell P3, generate the regression output. Make sure you select an appropriate "Residual Plot," and place the residual plot in the designated area near cell E32.
4. Output
In cells J23 and J24, enter the value of the intercept and slope (respectively) by referencing the appropriate cells in the regression output.
In cell K24, enter the value of the t test statistic for testing the slope significance by referencing the appropriate cell from the regression output.
In cell L24, enter the p-value regarding the slope significance by referencing the appropriate cell from the regression output.
In cell M24, indicate with the word "Yes" or "No" if the slope coefficient is significant. Assume α.01.
5. In cell F29, provide the predictive power (a.k.a. the coefficient of determination) of the model by referencing the appropriate cell from the regression output.
6. In cell J29, write the prediction equation relating NM to DS using the intercept and slope values. This is a text input that starts with a number, so you must start the input with a space to trick Excel into interpreting the input as text. For example, if a = 4 and b = 10, enter 4 + 10(NM), placing a space before the value 4.
7.
Cell E32 should contain the residual plot. Keep the plot within the red shaded area.
In cell F48, comment on the assumption of linearity as interpreted using this residual plot.
In cell F49, comment on the assumption of constant variance as interpreted using this residual plot.
8. Prediction and Residual
In cell F53, predict the distance from specification of a cap milled by a tool when the cap is the 20th cap to be milled.
In cells F54 and F55, calculate the lower and upper values for the range of definition for this data set.
9. Prediction Interval
Using the table in cells J52:K53 as the Predication Data Set and StatTools, calculate the lower limit and upper limit for a 95% prediction interval for the DS of a cap that is the 20th cap milled. Anchor your StatTools Regression output in cell A1 of the Regression Worksheet Place the values in cells J58 and K58 by referencing the appropriate cells in the StatTools output. Note that this will shift the columns of your worksheet.
Question 2:
A mental health agency measured the self-esteem score for randomly selected individuals with disabilities who were involved in some work activity within the past year. The spreadsheet named Self Esteem provides the data including each individuals self-esteem measure (y), years of education (YrsEdu), age, months worked in the last year (MonWork), marital status dummy variables (MS2, MS3, MS4) indicating if the individual is single, married, separated, or divorced, and a support level (SL) dummy variable indicating if the level of job support (counseling, etc) was provided directly (1) or indirectly (0). Regarding marital status, if single all MS indicators are 0, while MS2 = 1 indicates married, MS3 = 1 indicates separated, and MS4 = 1 indicates divorced.
In cell N4, use Excel's "Correlation" Data Analysis tool to construct a correlation matrix for all the variables. Note that the categories in columns I and J should not be included since the data are already represented as dummy variables in columns E through H.
Considering the correlation between self esteem and each x variable identify the three variables that, based on correlation with y alone, should be considered as best candidates for inclusion in the model. Shade the appropriate cells containing the correlation values in yellow. Ignore any multicollinearity concerns for this part.
Considering the correlation between each pair of x variables, identify the variables that would possibly cause multicollinearity problems if included in the model. Shade the appropriate cells containing the correlation values in green.
Based on your conclusions in parts b and c, shade in red color the names of any variables that should not be included in the initial model because of possible multicollinearity problems.
With cell N19 as the upper left hand corner of the output, fit the full regression model. (Do not include a residual plot)
Considering the regression output from part e, shade (in yellow) the name of any x variable that appears significant and should remain in the model. Also shade the t stat and p-value. Consider the p-value small if it is less than 0.05.
Partial Regression Model: With cell N51 the upper left hand corner of the output, fit the model including only the x variable(s) that were found to be significant in part f. (Do not include a residual plot)
Question 3:
A bank must prepare for a discrimination suit filed on behalf of female employees that claim females are paid less than male employees. The bank manager sampled employee files to see if he could build a useful model for predicting salary as a function of gender and other characteristics. For each employee, the data includes salary (y, in thousands of dollars), years experience (YrsExp), years prior experience (YrsPrior), and Gender. The data is in the spreadsheet named Bank.
1. Since Gender is a categorical variable, construct the appropriate dummy variable in column E to indicate gender as female = 1 and male = 0. You must use an "IF" statement in the appropriate cell(s) to indicate the correct dummy value based on gender.
2. With cell H7 the upper left hand corner of the output, fit the full model. (Do not include a residual plot).
3. Based on the regression output from part b, shade (in yellow) the name of any x variable that appears significant and should remain in the model. Also shade the t stat and p-value.
Attachment:- Assignment.rar