Question

1 Approved Answer

Posted on Sep 25, 2024

: can please someone help me with this assignment python Build your own Python based Recommendation System Part 1 Introduction In this final project, you

can please someone help me with this assignment python

Build your own Python based Recommendation System Part 1 Introduction In this final project, you will learn how to use Python and Spark to build a useable recommendation system with proper datasets. In particular, we will make a personalized movie recommendation system first, then we will work as a team to collect data, and modify the code to work with the newly created dataset. Finally, as a team, you are expected to do a 15 min presentation, a report with source code, dataset, and performance evaluation. Part 1.1 Database background: We will work with 10 million ratings from 72,000 users on 10,000 movies, collected by MovieLens. This dataset is publicly available at https://github.com/databricks/spark-training/tree/master/ data/movielens. For quick testing of your code, you may want to use a smaller dataset under / movielens/medium, which contains 1 million ratings from 6000 users on 4000 movies. We will use two files from this MovieLens dataset: ratings.dat and movies.dat. All ratings are contained in the file ratings.dat and are in the following format: UserID::MovieID::Rating::Timestamp Movie information is in the file movies.dat and is in the following format: MovieID::Title::Genres Part 1.2 Recommendation system and Collaborative filtering Collaborative filtering is commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix, in our case, the user-movie rating matrix. Spark provide an excellent machine learning lib called MLlib which supports model-based collaborative filtering, in which users and products are described by a small set of latent factors that can be used to predict missing entries. In particular, we implement the alternating least squares (ALS) algorithm to learn these latent factors. Note: The background about recommendation system and collaborative filter are not required to make your code work. However, it would be great if you want to know things about it. Check out http://spark.apache.org/docs/latest/ml-collaborative-filtering.html for details. 1 Part 1.3 3. Setup and run We will be using a standalone project template for this demonstration. You should find the following items in the directory. This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service. The data are contained in three files. movies.dat ratings.dat and tags.dat. Also included are scripts for generating subsets of the data to support five-fold crossvalidation of rating predictions. These datasets are publicly available at movelens.org. 2 Your task today: Download the dataset, configure the python file, run the code, and watch the output. Based on your selection, Spark might take a minute or two to train the models. When it is finished. You should see the movies specially recommended to you on the screen. Our program will list the top 50 recommendations and you can see whether they look good to you. The output should be similar to Movies recommended for you: 1: Silence of the Lambs, The (1991) 2: Saving Private Ryan (1998) 3: Godfather, The (1972) 4: Star Wars: Episode IV - A New Hope (1977) 5: Braveheart (1995) 6: Schindler's List (1993) 7: Shawshank Redemption, The (1994) 8: Star Wars: Episode V - The Empire Strikes Back (1980) 9: Pulp Fiction (1994) 10: Alien (