Introduction to Data Mining Problems
Use R to devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing your code. Develop such a system using both a:
(a) User-based collaborative filtering approach. Use Euclidean, Manhattan, correlational, and cosine similarity distance measures. What problems (if any) do you run into?
(b) Item-based collaborative filtering approach. Use an adjusted cosine similarity approach as discussed in class. How does this approach compare to the user-based approach?
To load the data into R you will need to use the read.csv function. (i.e. read.csv(filename,header=TRUE)). Please type in ?read.csv" to the R console to see the syntax if you would like further info regarding the function's syntax.
Make your programs functions, where the names of users, can be entered into the R prompt.
(c) What are some general problems with both approaches? Conceptually speaking, how can these issues be ameliorated?