Discussion Post: Intro to Data Mining
Consider the mean of a cluster of objects from a binary transaction data set.
i. What are the minimum and maximum values of the components of the mean?
ii. What is the interpretation of components of the cluster mean?
iii. Which components most accurately characterize the objects in the cluster?
Homework:
Answer the following questions in a point by point fashion.
i. For sparse data, discuss why considering only the presence of non-zero values might give a more accurate view of the objects than considering the actual magnitudes of values. When would such an approach not be desirable?
ii. Describe the change in the time complexity of K-means as the number of clusters to be found increases.
iii. Discuss the advantages and disadvantages of treating clustering as an optimization problem. Among other factors, consider efficiency, non-determinism, and whether an optimization-based approach captures all types of clusterings that are of interest.
iv. What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means?
v. Explain the difference between likelihood and probability.
vi. Give an example of a set of clusters in which merging based on the closeness of clusters leads to a more natural set of clusters than merging based on the strength of connection (interconnectedness) of clusters.
Format your homework according to the following formatting requirements:
i) The answer should be typed, using Times New Roman font (size 12), double spaced, with one-inch margins on all sides.
ii) The response also includes a cover page containing the title of the homework, the student's name, the course title, and the date. The cover page is not included in the required page length.
iii) Also include a reference page. The Citations and references must follow APA format. The reference page is not included in the required page length.