Question 1
Recent trends show that the yield of your company's flagship product is declining.
You are uncertain if the supplier of a key raw material is to blame, or if it is due to a change in your process conditions. You begin by investigating the raw material supplier.
The data available has:
• = 24
• = 6 + 1 designation of process outcome
• =data set raw-material-characterization
• Description: 3 of the 6 measurements are size values for the plastic pellets, while the other 3 are the outputs from thermogravimetric analysis (TGA), differential scanning calorimetry (DSC) and thermomechanical analysis (TMA), measured in a laboratory.
These 6 measurements are thought to adequately characterize the raw material. Also provided is a designation Adequate or Poor that reflects the process engineer's opinion of the yield from that lot of materials.
Import the data, and set the Outcome variable as a secondary identifier for each observation, as shown in the illustration below.
The observation's primary identifier is its batch number.
1. Build a latent variable model for all observations and use auto-fit to determine the number of components.
If your software does not have and auto-fit features (cross-validation), then use a Pareto plot of the eigenvalues to decide on the number of components.
2. Interpret component 1, 2 and 3 separately (using the loadings bar plot).
3. Now plot the score plot for components 1 and 2, and colour code the score plot with the Outcome variable.
Interpret why observations with Poor outcome are at their locations in the score plot (use a contribution plot).
4. What would be your recommendations to your manager to get more of your batches classified as Adequate rather than Poor?
5. Now build a model only on the observations marked as Adequate in the Outcome variable.
6. Re-interpret the loadings plot for 1 and 2.
Is there a substantial difference between this new loadings plot and the previous one?