Application Background
Computer vision and image analysis are becoming more and more important in sports analytics, the science of analyzing and modeling processes underlying sporting events. Sports with a high media coverage create a demand for systematic review and objective evaluation of the performance of individual athletes as well as of teams. Across almost all sports, management and coaches make use of statistics and categorized video material to support their strategies.
Question 1
(data understanding and preprocessing). Download and extract the data. Consider the training data trainInput.csv and the corresponding labels trainTarget.csv. Plot a histogram showing the probability distribution of classes in the training data. The ith row in trainInput.csv are the features of the ith training pattern. The class label of the ith pattern is given in the ith row of trainTarget.csv. Deliverables: description of software used; single histogram plot
Question 2
(principal component analysis). Perform a principal component analysis of the training data trainInput.csv. Plot the eigenspectrum (see Figure 12.4 in [1] for an example). How many components are necessary to \explain 90% of the variance"? Visual- ize the data by a scatter plot of the data projected on the rst two principal components.
Use dierent colors for the di erent classes in the plot.
Deliverables: description of software used; plot of the eigenspectrum; indicate number of components necessary to explain 90% of variance; scatter plot of the data projected on the rst two principal components with di erent colors indicating the 7 di erent classes
Question 3
Perform 7-means clustering of trainInput.csv (feel free to play around with the number of clusters). After that, project the cluster centers to the rst two principal components of the training data. Then visualize the clusters by adding the cluster centers to the plot from the previous exercise. Brie y discuss the results: Did you get meaningful clusters?
Deliverables: description of software used; one plot with cluster centers and data points; short discussion of results
Attachment:- document.pdf