programming in java
This assignment will provide you with practice using arrays.
Your job is to build a simple Recommender System, similar to the one that Netflix (an online movie-rental service) uses to recommend movies to customers. The basic idea is to find out some movies that a user likes, and then recommend other movies that the user might also like.
On Oct. 2, 2006, Netflix announced a challenge to programmers everywhere to come up with a better way of figuring out how to predict what movies to recommend to users. They offered a prize of $1 million to anyone who could beat their own technique by 10% in prediction accuracy. A team of programmers eventually claimed the prize in 2009. You can read about the challenge on Wikipedia, or Netflix's own page about it. This assignment is a simplified version of the kinds of recommendation techniques used by Netflix, Amazon, and others.
Helper code: The MovieFileHelper class
For this assignment, you will need the class MovieFileHelper. The code is available here. This class includes two methods that you will need to use for your assignment, but YOU DO NOT NEED TO UNDERSTAND HOW THEY WORK. The code for these methods involve several things you have not learned yet. Feel free to look at them and try to understand them, but it is not necessary to understand their code to do this assignment.
You will also need to download these two files, movies.txt and ratings.txt. The first one stores the names of 20 Hollywood movies. The second one stores movie ratings from 30 (fictional) users, who rated these 20 movies between 0 and 5 (5 being the best), or -1 if they haven't seen it.
The first method that is useful to you in MovieFileHelper is loadMovieNames(String filename). When you call this method with the String "movies.txt", it will load all of the movie names from that file into an array of Strings, and return that array.
The second method for you to use in MovieFileHelper is loadRatings(String filename). When you call this method with the String "ratings.txt", it will load all of the recommendations from that file into a 2-D array of doubles, and return that array.
Your Task
Create a Java file called Recommender.java. Your program should behave as follows:
Using the helper class and text files described above, load the 20 movie names and the movie ratings from 30 people into two arrays in memory.
Ask the user to enter a rating (between 1 and 5, or -1 if they haven't seen it) for each movie.
Create a method that determines a score for each of the 30 people, which represents how similar that person's tastes are to the current user's tastes. Store these similarity scores in an array of 30 doubles. The similarity scores should be between 0 and 1 each.
Create an array that represents recommended ratings for the user. There should be 20 numbers in this array, one for each movie. The higher the number, the more strongly your program thinks the user will like the movie. The number should be the average over all 30 ratings for the movie that are greater than 0 (only include ratings for users who have actually seen the movie). However, it should be a weighted average: people who are more similar to the current user should have a higher weight than people who are less similar.
Display the name of the top movie (according to the recommended ratings from the previous step) that the user has not yet seen.
Suggestions and hints
Don't try to program everything all at once. Do it in parts, by writing some methods that accomplish part of the whole assignment. Write some println commands that show what's going on in memory after you call a method that you've just written, and run your program to make sure that the new method is working correctly. Repeat this for each new method you write.
Calculating Similarity
Your program is going to try to decide whether or not you might like a movie that you haven't yet seen. It's going to come up with a score, which represents the likelihood that you'll enjoy it. If people who have tastes similar to yours seem to like it, the program will assign that movie a higher score. The question is, How do you determine similarity?
You can come up with your own way of judging how similar two people's ratings are. One suggestion is to compute what's called cosine similarity:
for person 1, compute the square of each movie rating for movies they have seen, and add these up and then take the square root. Store the result in a variable called p1. For example, if person 1 saw 3 movies and rated them 4, 4, and 2, then p1 = sqrt(4*4 + 4*4 + 2*2) = sqrt(36) = 6.
do the same for person 2, and store the result in a variable called p2.
for each movie that both people have seen, compute the product of their ratings. Add up all of these products, and store the result in a variable called both. For example, if person 1 and person 2 both saw movies 7 and 14 (out of 20), and person 1 rated them as 4 for movie 7 and 2 for 14, and person 2 rated them as 2 for movie 7 and 3 for movie 14, then both = 4*2 + 2*3.
The cosine similarity score between person 1 and person 2 is (both / (p1 * p2)).
Making a Recommendation
Think: the goal is to come up with a recommendation for a movie that the user might like. If you and I have similar preferences, and there's a movie that I haven't seen that you like, chances are good that I might like it. Suppose there's another person, whose tastes kind of match mine, who also happens to like the movie. We now have even more evidence that I might like the movie. (Because my tastes only kind of match the other person's, it only lends a little bit of weight to the decision.)
Imagine, now, that I don't just have information of a couple of friends upon which to base a recommendation. In the case of this assignment, we have information from 30 people about their preferences. We put all of that information together to form a single score for each movie, which can be calculated as a weighted average of all of the ratings of the other users. We assign more weight to the ratings of people whose preferences are similar to ours, and a smaller weight for people whose preferences are dissimilar.
We calculate a score for each movie, and the one we recommend is the one with the highest score.
The mathematical formula for a weighted average, where there are N numbers stored in an array called a, and N corresponding weights stored in an array called w, goes like this:
weighted_average(N, a, w) = (a1*w1 + a2*w2 + ... + aN*wN) / (w1 + ... + wN)