Assignment: Data Mining and Data Warehousing
Question 1
a) What are outliers? List four applications of outlier detection.
b) What are the challenges of outlier detection?
Question 2
a) How does PAM (K-medoids) form clusters; how does DBSCAN form clusters?
b) Assume you apply DBSCAN to the same dataset, but the examples in the dataset are sorted differently. Will DBSCAN always return the same clustering for different orderings of the same dataset? Give reasons for your answer.
Question 3
Measuring geodesic distance for the graph G in given figure, calculate the following:
i. Eccentricity
ii. Radius
iii. Diameter
iv. Peripheral vertex

Question 4
Why is it often necessary to do constraint-based clustering? Describe the terms hard constraint and soft constraint.