Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Building Recommender Systems for Movie Rating Prediction In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently 2 5
Building Recommender Systems for Movie Rating Prediction
In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently million usermovie ratings. Since the entire data is too big, we use a million ratings subset MovieLens M and we reformatted the data to make it more convenient to use.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import time
from sklearn.modelselection import traintestsplit
from scipy.sparse import coomatrix, csrmatrix
from scipy.spatial.distance import jaccard, cosine
from pytest import approx
from sklearn.metrics.pairwise import cosinesimilarity
MVusers pdreadcsvdatauserscsv
MVmovies pdreadcsvdatamoviescsv
train pdreadcsvdatatraincsv
test pdreadcsvdatatestcsv
from collections import namedtuple
Data namedtupleDatausers'movies','train','test'
data DataMVusers, MVmovies, train, test
Starter codes
Now, we will be building a recommender system which has various techniques to predict ratings. The class RecSys has baseline prediction methods such as predicting everything to or to average rating of each user and other utility functions. class ContentBased and class Collaborative inherit class RecSys and further add methods calculating itemitem similarity matrix. You will be completing those functions using what we learned about contentbased filtering and collaborative filtering.
RecSys's ratingmatrix method converts the user id movie id rating triplet from the train data train data's ratings are known into a utility matrix for users and movies.
Here, we create the utility matrix as a dense matrix numpyarray format for convenience. But in a real world data where hundreds of millions of users and items may exist, we won't be able to create the utility matrix in a dense matrix format For those who are curious why, try measuring the dense matrix self.Mr using nbytes In that case, we may use sparse matrix operations as much as possible and distributed file systems and distributed computing will be needed. Fortunately, our data is small enough to fit in a laptoppc memory. Also, we will use numpy and scipy.sparse, which allow significantly faster calculations than calculating on pandas.DataFrame object.
In the ratingmatrix method, pay attention to the index mapping as user IDs and movie IDs are not the same as array index.
import numpy as np
import pandas as pd
from scipy.sparse import csrmatrix, lilmatrix
from scipy.spatial.distance import squareform, pdist
class RecSys:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started