Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

((Please run the Code and take screenshot )) The given code is a combination of two Pig Latin scripts. The first script finds the oldest

((Please run the Code and take screenshot ))

The given code is a combination of two Pig Latin scripts. The first script finds the oldest 5-star movies, and the second script finds bad movies with an average rating below 2.0. The schema and content of the metadata relation are also shown.

To run these scripts, you need to save them to separate files with .pig extension and execute them using the Pig interpreter. You also need to ensure that the input data files exist in the specified locations.

-> The first script is used to find the oldest 5-star movies:

-- Load the ratings data with a given schema

ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);

-- Load the metadata with a specified delimiter and schema

metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray, videoRelease:chararray, imdbLink:chararray);

-- Extract the movie ID, title, and release time from the metadata relation

nameLookup = FOREACH metadata GENERATE movieID, movieTitle, ToUnixTime(ToDate(releaseDate, 'dd-MMM-yyyy')) AS releaseTime;

-- Group the ratings by movie ID

ratingsByMovie = GROUP ratings BY movieID;

-- Calculate the average rating for each movie

avgRatings = FOREACH ratingsByMovie GENERATE group AS movieID, AVG(ratings.rating) AS avgRating;

-- Filter the movies with an average rating greater than 4.0

fiveStarMovies = FILTER avgRatings BY avgRating > 4.0;

-- Join the five-star movies with the name lookup relation to get the movie title and release time

fiveStarsWithData = JOIN fiveStarMovies BY movieID, nameLookup BY movieID;

-- Order the movies by release time to get the oldest 5-star movies

oldestFiveStarMovies = ORDER fiveStarsWithData BY nameLookup::releaseTime;

-- Output the results

DUMP oldestFiveStarMovies;

Explanation for step 1

This script loads the movie ratings data and metadata from two files in HDFS, using LOAD with a given schema and USING PigStorage('|') with a specified delimiter and schema, respectively. The metadata relation is transformed to extract the movie ID, title, and release time using FOREACH and ToDate/ToUnixTime built-in functions. The ratings are grouped by movie ID, and the average rating is calculated for each movie using AVG. The movies with an average rating greater than 4.0 are filtered using FILTER, and the result is joined with the name lookup relation using JOIN. Finally, the movies are ordered by release time using ORDER, and the results are displayed using DUMP.

-> The second script is used to find bad movies with an average rating below 2.0:

-- Load the ratings data with a given schema

ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);

-- Load the metadata with a specified delimiter and schema

metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray, videoRelease:chararray, imdbLink:chararray);

-- Extract the movie ID and title from the metadata relation

nameLookup = FOREACH metadata GENERATE movieID, movieTitle;

-- Group the ratings by movie ID

groupedRatings = GROUP ratings BY movieID;

-- Calculate the average rating and number of ratings for each movie

averageRatings = FOREACH groupedRatings GENERATE group AS movieID, AVG(ratings.rating) AS avgRating, COUNT(ratings.rating) AS numRatings;

-- Filter the movies with an average rating less than 2.0

badMovies = FILTER averageRatings BY avgRating < 2.0;

-- Join the bad movies with the name lookup relation to get

Explanation for step 2

This script loads the movie ratings data and metadata from two files in HDFS, using LOAD with a given schema and USING PigStorage('|') with a specified delimiter and schema, respectively. The metadata relation is transformed to extract the movie ID, title, and release time using FOREACH and ToDate/ToUnixTime built-in functions. The ratings are grouped by movie ID, and the average rating is calculated for each movie using AVG. The movies with an average rating greater than 4.0 are filtered using FILTER, and the result is joined with the name lookup relation using JOIN. Finally, the movies are ordered by release time using ORDER, and the results are displayed using DUMP.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning C# 2005 Databases

Authors: Karli Watson

1st Edition

0470044063, 978-0470044063

More Books

Students also viewed these Databases questions