Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 30, 2024

Building Recommender Systems for Movie Rating Prediction In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently 2 5

Building Recommender Systems for Movie Rating Prediction

In this assignment, we will build a recommender systems that predict movie ratings. MovieLense has currently

25

million user

-

movie ratings. Since the entire data is too big, we use a

1

million ratings subset MovieLens

1

M

,

and we reformatted the data to make it more convenient to use.

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

import time

from sklearn.model

_

selection import train

_

test

_

split

from scipy.sparse import coo

_

matrix, csr

_

matrix

from scipy.spatial.distance import jaccard, cosine

from pytest import approx

from sklearn.metrics.pairwise import cosine

_

similarity

MV

_

users

=

pd

.

read

_

csv

('

data

/

users

.

csv

')

MV

_

movies

=

pd

.

read

_

csv

('

data

/

movies

.

csv

')

train

=

pd

.

read

_

csv

('

data

/

train

.

csv

')

test

=

pd

.

read

_

csv

('

data

/

test

.

csv

')

from collections import namedtuple

Data

=

namedtuple

('

Data

', ['

users

',

'movies','train','test'

])

data

=

Data

(

MV

_

users, MV

_

movies, train, test

)

Starter codes

Now, we will be building a recommender system which has various techniques to predict ratings. The class RecSys has baseline prediction methods

(

such as predicting everything to

3

or to average rating of each user

)

and other utility functions. class ContentBased and class Collaborative inherit class RecSys and further add methods calculating item

-

item similarity matrix. You will be completing those functions using what we learned about content

-

based filtering and collaborative filtering.

RecSys's rating

_

matrix method converts the

(

user id

,

movie id

,

rating

)

triplet from the train data

(

train data's ratings are known

)

into a utility matrix for

6040

users and

3883

movies.

Here, we create the utility matrix as a dense matrix

(

numpy

.

array

)

format for convenience. But in a real world data where hundreds of millions of users and items may exist, we won't be able to create the utility matrix in a dense matrix format

(

For those who are curious why, try measuring the dense matrix self.Mr using

.

nbytes

()) .

In that case, we may use sparse matrix operations as much as possible and distributed file systems and distributed computing will be needed. Fortunately, our data is small enough to fit in a laptop

/

pc memory. Also, we will use numpy and scipy.sparse, which allow significantly faster calculations than calculating on pandas.DataFrame object.

In the rating

_

matrix method, pay attention to the index mapping as user IDs and movie IDs are not the same as array index.

import numpy as np

import pandas as pd

from scipy.sparse import csr

_

matrix, lil

_

matrix

from scipy.spatial.distance import squareform, pdist

class RecSys

()

:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Relational Database And SQL

Relational Database And SQL

Authors: Lucy Scott

3rd Edition

1087899699, 978-1087899695

More Books

Students also viewed these Databases questions

Question

★★★★★

Lennons Tannery Corporation reports revenues and expenses of $196,400 and $80,200, respectively, for the period. Give the remaining entries to close the books assuming the ledger reports Additional...

Answered: 1 week ago

Question

★★★★★

Stephen Patterson held an account with Suntrust Bank in Alcoa, Tennessee. Juanita Wehrmanwith whom Patterson was briefly involved in a romantic relationshipstole his debit card and used it for...

Answered: 1 week ago

Question

★★★★★

=+20.15. 1 (a) Show that, if X and Y are independent and have the standard normal density, then X/Y has the Cauchy density with u = 1.

Answered: 1 week ago

Question

★★★★★

Marvelous Music provides music lessons to student musicians. Some students pay in advance for lessons; others are billed after lessons have been provided. Advance payments are credited to an account...

Answered: 1 week ago

Question

★★★★★

BFIN-350 You need to choose befween two machines based on the following information: Machine 1 has a 4 year life, costs $322,500 with pre-tax operating costs of $64,500 per year. Machine 2 has a...

Answered: 1 week ago

Question

★★★★★

3) Instructions Required Net present value method, present value index, and analysis for a service company Continental Railroad Company is evaluating three capital investment proposais by using the...

Answered: 1 week ago

Question

★★★★★

28. Yakuza Company issued 2,500 share of P25 par value preference shares with detachable warrants. The security package sells for P105. Each warrant enables the holder to purchase two shares of P10...

Answered: 1 week ago

Question

★★★★★

Good Vibes Manufacturing is a firm with an annual net income of $20 million, revenue of $60 million and cost of goods sold of $25 million. If the balance sheet amounts show $2.5 million of inventory...

Answered: 1 week ago

Question

★★★★★

A random sample of 45 Burger King Whoppers found a sample mean weight of 120g and a sample standard deviation of 15g. Find the 90% confidence interval of the true mean weight of a Whopper. Write your...

Answered: 1 week ago

Question

★★★★★

Ignoring GST, which of the following entries correctly records the purchase of land for $300 000 financed by a $100 000 cash deposit with the balance payable via a 20-year, 6% loan? Select one: a. DR...

Answered: 1 week ago

Question

★★★★★

Sizwe conditioners a manufacturing firm based in Gauteng makes three different types of air conditioners: the ceiling type, the cassette type and the wall mounted type. Weekly sales of each type are...

Answered: 1 week ago

Question

★★★★★

7. The celebration of Buddhas birthday is not held on Christmas, but instead on: a. Fourth of July b. July 14 c. Asian Lunar New Years Day d. Hanamatsuri

Answered: 1 week ago

Question

★★★★★

2. What is the name of the dish that features black-eyed peas and rice (although sometimes collards, ham hocks, stewed tomatoes, or other items) and is served in the South, especially on New Years...

Answered: 1 week ago

Question

★★★★★

10. Describe the relationship between communication and power.

Answered: 1 week ago

Previous Question Next Question