Description
This assignment will provide you with practice using arrays.
Your job is to build a simple Recommender System, similar to the one that Netflix (an online movie-rental service) uses to recommend movies to customers. The basic idea is to find out some movies that a user likes, and then recommend other movies that the user might also like.
On Oct. 2, 2006, Netflix announced a challenge to programmers everywhere to come up with a better way of figuring out how to predict what movies to recommend to users. They offered a prize of $1 million to anyone who could beat their own technique by 10% in prediction accuracy. A team of programmers eventually claimed the prize in 2009. You can read about the challenge on Wikipedia, or on Netflix's own page about it. This assignment is a simplified version of the kinds of recommendation techniques used by Netflix, Amazon, and others.
Helper code: The MovieFileHelper class
For this assignment, you will need the class MovieFileHelper. The code is available here. This class includes two methods that you will need to use for your assignment, but YOU DO NOT NEED TO UNDERSTAND HOW THEY WORK. The code for these methods involve several things you have not learned yet. Feel free to look at them and try to understand them, but it is not necessary to understand their code to do this assignment.
You will also need to download these two files, movies.txt and ratings.txt. The first one stores the names of 20 Hollywood movies. The second one stores movie ratings from 30 (fictional) users, who rated these 20 movies between 0 and 5 (5 being the best), or -1 if they haven't seen it.
The first method that is useful to you in MovieFileHelper is loadMovieNames(String filename). When you call this method with the String "movies.txt", it will load all of the movie names from that file into an array of Strings, and return that array.
The second method for you to use in MovieFileHelper is loadRatings(String filename). When you call this method with the String "ratings.txt", it will load all of the recommendations from that file into a 2-D array of doubles, and return that array.
Your Task
Create a Java file called Recommender.java. Your program should behave as follows:
• Using the helper class and text files described above, load the 20 movie names and the movie ratings from 30 people into two arrays in memory.
• Ask the user to enter a rating (between 1 and 5, or -1 if they haven't seen it) for each movie.
• Create a method that determines a score for each of the 30 people, which represents how similar that person's tastes are to the current user's tastes. Store these similarity scores in an array of 30 doubles. The similarity scores should be between 0 and 1 each.
• Create an array that represents recommended ratings for the user. There should be 20 numbers in this array, one for each movie. The number for each movie should be the average over all 30 ratings for the movie that are greater than 0 (only include ratings for users who have actually seen the movie). However, it should be a weighted average: people who are more similar to the current user should have a higher weight than people who are less similar.
• Display the names of the top-3 ranked movies (according to the recommended ratings from the previous step) that the user has not seen yet.
Suggestions and hints
Don't try to program everything all at once. Do it in parts, by writing some methods that accomplish part of the whole assignment. Write some println commands that show what's going on in memory after you call a method that you've just written, and run your program to make sure that the new method is working correctly. Repeat this for each new method you write.
You can come up with your own way of judging how similar 2 people's ratings are. One suggestion is to compute what's called "cosine similarity":
• for person 1, compute the square of each movie rating for movies they have seen, and add these up and then take the square root. Store the result in a variable called p1. For example, if person 1 saw 3 movies and rated them 4, 4, and 2, then p1 = sqrt(4*4 + 4*4 + 2*2) = sqrt(36) = 6.
• do the same for person 2, and store the result in a variable called p2.
• for each movie that both people have seen, compute the product of their ratings. Add up all of these products, and store the result in a variable called both. For example, if person 1 and person 2 both saw movies 7 and 14 (out of 20), and person 1 rated them as 4 for movie 7 and 2 for 14, and person 2 rated them as 2 for movie 7 and 3 for movie 14, then both = 4*2 + 2*3.
• The cosine similarity score between person 1 and person 2 is (both / (p1 * p2)).
The mathematical formula for a weighted average, where there are N numbers stored in an array called a, and N corresponding weights stored in an array called w, goes like this:
weighted_average(N, a, w) = (a1*w1 + a2*w2 + ... + aN*wN) / (w1 + ... + wN)