Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Building Recommender Systems for Movie Rating Prediction In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently 2 5

Building Recommender Systems for Movie Rating Prediction
In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently 25 million user-movie ratings. Since the entire data is too big, we use a 1 million ratings subset MovieLens 1M, and we reformatted the data to make it more convenient to use.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time
from sklearn.model_selection import train_test_split
from scipy.sparse import coo_matrix, csr_matrix
from scipy.spatial.distance import jaccard, cosine
from pytest import approx
from sklearn.metrics.pairwise import cosine_similarity
MV_users = pd.read_csv('data/users.csv')
MV_movies = pd.read_csv('data/movies.csv')
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')
from collections import namedtuple
Data = namedtuple('Data',['users','movies','train','test'])
data = Data(MV_users, MV_movies, train, test)
Starter codes
Now, we will be building a recommender system which has various techniques to predict ratings. The class RecSys has baseline prediction methods (such as predicting everything to 3 or to average rating of each user) and other utility functions. class ContentBased and class Collaborative inherit class RecSys and further add methods calculating item-item similarity matrix. You will be completing those functions using what we learned about content-based filtering and collaborative filtering.
RecSys's rating_matrix method converts the (user id, movie id, rating) triplet from the train data (train data's ratings are known) into a utility matrix for 6040 users and 3883 movies.
Here, we create the utility matrix as a dense matrix (numpy.array) format for convenience. But in a real world data where hundreds of millions of users and items may exist, we won't be able to create the utility matrix in a dense matrix format (For those who are curious why, try measuring the dense matrix self.Mr using .nbytes()). In that case, we may use sparse matrix operations as much as possible and distributed file systems and distributed computing will be needed. Fortunately, our data is small enough to fit in a laptop/pc memory. Also, we will use numpy and scipy.sparse, which allow significantly faster calculations than calculating on pandas.DataFrame object.
In the rating_matrix method, pay attention to the index mapping as user IDs and movie IDs are not the same as array index.
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix, lil_matrix
from scipy.spatial.distance import squareform, pdist
class RecSys():

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Relational Database And SQL

Authors: Lucy Scott

3rd Edition

1087899699, 978-1087899695

More Books

Students also viewed these Databases questions

Question

10. Describe the relationship between communication and power.

Answered: 1 week ago