Youve probably all experienced the power of recommendation systems. Every time you start up Netflix or Spotify

Question:

You’ve probably all experienced the power of recommendation systems. Every time you start up Netflix or Spotify or Amazon or any one of many other online merchants, you are provided with a list of things (movies, songs, books, etc) that the company thinks you might enjoy. The recommendations are compiled in many different ways, using many different sources of data, but one important and common approach is that of collaborative filtering. This method uses data collected from many users to see how "similar" you are to other users, where

"similar" can be defined in terms of the angle between two vec-

tors. Then the recommendation algorithm recommends items to you based on what other "similar" users have liked.
In this exercise we look at a simple example of how to calculate the similarity between users. Suppose we ask four students (S1, S2, S3, S4) to rank four different lecturers (James, Rachel, Duncan, and a Princess) by giving them each a grade between 1 and 5. Each student only gives a grade for lecturers they have personally seen and listened to, otherwise they just leave the form blank

a. First we manipulate the data just a little bit to make it a more accurate representation. Calculate the average of each row (i.e., of the scores of each student) and subtract this average from each entry. (Don’t do anything to the blank entries.) This will give you a row that has an average of 0, with some entries negative and some positive, and other entries still blank.

b. Now put a 0 into each blank entry, and, for each student, put all their entries into a vector. (For example, in the table above, the vector for S3 would be (1, 0, −1, 0).) This will Of course, when you do it the entries will be different as you will have subtracted the line average.
give you four student vectors.

c. Why do you do nothing to the blank entries in the first step (part a.), but then set them all to zero in the second step (part b.)?

d. Now normalise each student vector.
This normalisation is to cancel out the inconvenient fact that some students always like all their lecturers while other students always dislike all of them. We don’t want this to mess up our recommendation algorithm.

e. Finally, the similarity between two students is defined to be the angle between their vectors. Which two students are most similar? Which two students are the least similar?

Fantastic news! We've Found the answer you've been seeking!