To study how male lions interact in their natural habitats, a group of scientists decided to compare 500 recorded (x, y) locations of two male lions. As the team’s data scientist, you are tasked to analyze these patterns. Not knowing what to do, you decide to adopt an unsupervised dimensionality reduction technique by simply performing a PCA on the complete data matrix (grouping points from both lions).
1. Determine the direction of the principal component. How well are the projected (x, y) locations separate between the two lions? (Recalling undergraduate statistics you learned, you could actually quantify such a separation by using t-statistic with a two-sample t test!)
2. Another scientist (who is not very good at math!) argues that you should be using the second component instead. You know it is wrong but decide to go ahead and do it anyways. How well are the second components separate between the two lions? Is your nonmath colleague correct in saying that the second component is better than first?
3. Now yet a third non-math colleague says that you should use first component, but on the original data matrix, not the demeaned/centered data matrix. Is he right in saying so?
We need to use MATLAB