Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A web scraping program has been developed to extract the data and save it in TXT format in a text file. Here are the first

A web scraping program has been developed to extract the data and save it in TXT format in a text file. Here are the first five lines of the file that contains the songs shown in the screenshot:
"November 3,20198:53 PM","After A Few","Travis Denning"
"November 3,20198:49 PM","Better Life","Keith Urban"
"November 3,20198:46 PM","Mercy","Brett Young"
"November 3,20198:42 PM","What She Wants Tonight","Luke Bryan"
"November 3,20198:39 PM","Prayed For You","Matt Stell"
The files that you are given for the assignment may contain a different list of songs. If you are given multiple files, combine them into one file. There may be duplicate records.
Your program will process the data file to answer the following questions:
Breaks for commercials and other contents. After continuous play of a few songs, the station takes a break for commercials or other contents (e.g., announcements, interviews, or taking calls from listeners). Your task is to identify times at which a song is followed by a break. This can be detected by examining the time pattern (e.g ., a song appearing to
be longer than 5 minutes most likely includes time for break).
Top songs. Songs on heavy rotation will be aired multiple times on a given day. Your task is to identify these songs.
Top artists. On a given day, most artists have one song aired during the day (even though the song may be aired several times), some artists may have more than one song aired.
Your task is to use two metrics to identify top artists: (1) those who have the most distinct songs (count of distinct songs); (2) those who have the most air plays (all songs combined)
1.1 Solution Approach
Here is one approach (there are other approaches).
To identify commercial breaks:
Order the data by time
Calculate the time between two songs to obtain nominal time of all but the last song
If a songs nominal time is longer than 5 minutes, a break follows the song
We choose to use 5 minutes because most songs on radio are around 3 minutes.
To find the top songs, we need to count how many times each song gets aired. This is a simple task conceptually for each distinct song, just count its occurrences in the dataset. In practice, we need a way of keeping track of records, something that guarantees uniqueness (for song) and can scale up (from only a few songs to millions of songs). Then sort by count to find top songs.
To find top artists, we need to keep track the air plays for each artist: which song, how many times the song is air. Then we can count how many distinct songs each artist has, and we can also count how many total airplays each artist has.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Server T-SQL Recipes

Authors: David Dye, Jason Brimhall

4th Edition

1484200616, 9781484200612

More Books

Students also viewed these Databases questions