Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In Python3: mapreduce Please complete the mapper.py and reducer.py In the final section of the lab, you are given two data files in comma-separated value
In Python3: mapreduce
Please complete the mapper.py and reducer.py
In the final section of the lab, you are given two data files in comma-separated value (CSV) format. These data files (joins/music_small/artist_term.csv and joins/music_small/track.csv) contain the same music data from the previous lab assignment on SQL and relational databases. Specifically, the file artist_term.csv contains data of the form ARTIST-ID, tag string and track.csv contains data of the form TRACK_ID, title string,album string, year,duration, ARTIST_ID No skeleton code is provided for this part, but feel free to adapt any code from the previous sections that you've already completed. 4.2 Aggregation queries For the last part, implement a map-reduce program which is equivalent to the following SQL query SELECT track.artist_id, max(track.year), avg(track.duration), count (artist_term.term) FROM track LEFT JOIN artist_term ON GROUP BY track.artist_id track.artist_id- artist_term.artist id That is, for each artist ID, compute the maximum year of release, average track duration and the total number of terms matching the artist. Note: the number of terms for an artist could be zero! In the final section of the lab, you are given two data files in comma-separated value (CSV) format. These data files (joins/music_small/artist_term.csv and joins/music_small/track.csv) contain the same music data from the previous lab assignment on SQL and relational databases. Specifically, the file artist_term.csv contains data of the form ARTIST-ID, tag string and track.csv contains data of the form TRACK_ID, title string,album string, year,duration, ARTIST_ID No skeleton code is provided for this part, but feel free to adapt any code from the previous sections that you've already completed. 4.2 Aggregation queries For the last part, implement a map-reduce program which is equivalent to the following SQL query SELECT track.artist_id, max(track.year), avg(track.duration), count (artist_term.term) FROM track LEFT JOIN artist_term ON GROUP BY track.artist_id track.artist_id- artist_term.artist id That is, for each artist ID, compute the maximum year of release, average track duration and the total number of terms matching the artist. Note: the number of terms for an artist could be zeroStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started