Question

1 Approved Answer

Posted on Oct 16, 2024

MIST.3050 Practice Programming Problem 3 This assignment will allow you to use functions and lists to develop modular solutions. You should save your code in

MIST.3050 Practice Programming Problem 3 This assignment will allow you to use functions and lists to develop modular solutions. You should save your code in three files: p3mod.py, p3test.py, and p3.py. Don't forget to include comments in source code to document your code! The main tasks of the assignment are to process song play list of a radio station to determine commercial breaks, songs with heavy rotation, and artists with most songs played. These tasks are broken down to multiple steps to help you develop the solutions incrementally. You will be provided with the data in the CSV format. You will learn to use list, tuple, and dictionary effectively to process data. 1. Overview The data is obtained from Boston Country 102.5 station's website: https://country1025.com/stream/WKLBFM/ This site is dynamically updated to list recently aired songs. Below is a screenshot of first 5 songs: Recently Played Songs on Boston's Hottest Country November 3, 2019 8:53 PM After A Few - Travis Denning 8:49 PM Better Life - Keith Urban 8:46 PM Mercy - Brett Young 8:42 PM What She Wants Tonight - Luke Bryan 8:39 PM Prayed For You - Matt Stell A web scraping program has been developed to extract the data and save it in CSV format in a text file. Here are the first five lines of the file that contains the songs shown in the screenshot: "November 3, 2019 8:53 PM", "After A Few", "Travis Denning" "November 3, 2019 8:49 PM", "Better Life", "Keith Urban" 'November 3, 2019 8:46 PM", "Mercy", "Brett Young" "November 3, 2019 8:42 PM", "What She Wants Tonight", "Luke Bryan" "November 3, 2019 8:39 PM", "Prayed For You", "Matt Stell" The files that you are given for the assignment may contain a different list of songs. If you are given multiple files, combine them into one file. There may be duplicate records. Your program will process the data file to answer the following questions: Breaks for commercials and other contents. After continuous play of a few songs, the station takes a break for commercials or other contents (e.g., announcements, interviews, or taking calls from listeners). Your task is to identify times at which a song is followed by a break. This can be detected by examining the time pattern (e.g., a song appearing to be longer than 5 minutes most likely includes time for break).0 Top songs. Songs on heavy rotation will be aired multiple times on a given day. Your task is to identify these songs. 0 Top artists. On a given day, most artists have one song aired during the day (even though the song may be aired several times), some artists may have more than one song aired. Your task is to identify top artists who have the most songs aired. Pause here for a moment and think ab0ut how you w0u1d accomplish these tasks, manually or using any tool you know. Write down the steps. 1.1 Solution Approach Here is one approach (there are other approaches). To identify commercial breaks: Order the data by time Calculate the time between two songs to obtain \"nominal\" time of all but the last song If a song's nominal time is longer than 5 minutes, a break follows the song To nd the top songs, we need to count how many times each song gets aired. This is a simple task conceptually , for each distinct song, just count its occurrences in the dataset. In practice, we need a way of keeping track of records, something that guarantees uniqueness (for song) and can scale up (from only a few songs to millions of songs). Then sort by count to nd top songs. To nd top artist, we need to nd the number of songs each artist has in the data. That is, for each artist, keep track of the songs aired; then count the number of songs of each artist. Sort by count to nd top artist. 1.2 Data Structures You may have noticed that for the solution approach to work, two capabilities are important: 1) keeping track of a distinctive list of things (timestamp, song, or artist) 2) sorting A dictionary ensures that its key is unique and a list has built-in sort method. The solution approach can be implemented using these data structures. Let us explore the following data structure considerations. 1. Use dictionaries to keep track of information by time, song, and artist The data in the CSV format can be stored in a dictionary with time as the key, and a tuple of song title and artist as the value: playlist_dict: {timeh (song1,artist1) , ..., timen: (songn,artistn) ] From this dictionary, we can derive two other dictionaries for songs and artists: song_count: {songl:count1, ..., songk:countk)} artist_songs: {artistl:{songl:countl, ...}, ..., artistm:{songizcounti,...})} That is, for each song, we keep track of its number of occurrences. Since two different songs may have the same title, we can use \"songTitle-artist\" as the key to distinguish between different songs with the same title. Foreach artist, the value is another dictionary that keeps track of the air times of each song by the artist. 2. Use dictionaries to keep track of songs/artist with the same count There are often multiple songs that are aired the same number of times. Likewise, there are multiple artists who have the same number of songs aired. We can store this kind of information in dictionaries where the key is a count and the value is a list: count_songs: [countl: [sl,sZ,...] , ..., countk: [sx, sy,...]} count_artists: {countl: [al.a2,...] , ..., countk: [ax,ay,...]} 1.3 Build functions for modularization and unit testing Some of the tasks are Suitable to be implemented as functions. We will provide details about functions later. At a high level, we need functions to derive count-based dictionaries. Since the order of items in a dictionary may appear quite random, we need to derive lists om dictorinaries - lists can be sorted, which will help us identify top songs and top artists (ordered by high to low according to count). We will develop the solution by building some functions first. 2. Functionl: get_count_i terns (key_count) In p3mod.py le, create ge t_count_items function to return a dictionary with count being the key and a list being the value to gather all items with the same count. The functions can be used for both songs and artists. def get_count_items (key_count) : given a key_count dictionary (e.g., [song:count] or {artist:count}) return a count_items dictionaryte.g., {count: [items_with_same_count]} Since you are returning a new dictionary, you must rst create it (initially blank). As you iterate through the input dictionary, update the content in the return dictionary. Algorithm: create empty dictionary count_items ## {count:[list_of_items]} for each item_key in key_count: retrieve its value (which is a count) if the count (as key) exists in count_items: append the item_key to the list else: add the count:[item_key] pair to count_items dictionary return count_items dictionary After you have created the function, you want to test it to make sure it works as indented. You will include all test code in p3test.py. You can hard-code the test data in the test program (i.e., using variables to explicitly store test data in the code). The following table list three test cases. {1: ['a'l} {10: ' 'c'], 2: ['b']} 'z':100} {3: , 'y'], 100: ['z']] 3. FunctionZ: sorted_dict_content (key_val) For a given dictionary with a sortable type of key, this function returns the content of the dictionary as a list, sorted according to the key in descending order. Add this function into pemod.py. def sorted_dict_content(key_val): given a key_val dictionary with a sortable key return a list of the content sorted by the key in descending order} Algorithm: create empty list key_itemList for each key in sorted list of keys of key_val: append the keyvalue pair to key_itemList return key_itemList To obtain a sorted list of the keys in the key_val dictory, you may use the sorted method: sorted(key_val, reverse=True)## or sorted(key_val.key(), reverse=True)## After you have added the function in pemodpy, add code in petest.py to test the function. The output for each input is shown below. [['a':l]] H'z'=ll:{'Y'=l}:{'x'=1}1 {3:['x','y'], 100:['Z']} [[100:['Z}r {3=['X': 'Y'll] 4. The main program The main program will be stored in pe.py file. You will implement the program one step at a time. It needs several modules: import csv from datetime import datetime from p3mod import 4. 1 Read in CSV file and store data in a dictionary The goal is to open the data file, parse it, and store the data as entries of a dictionary playlist_dict with key being a datatime object of the timestamp, and the value being a tuple of song title and artist: ( datetime_key: (song_title, artist) } We us a datatime object because it allows us to compute time difference between two entries. Algorithm: read the file and store content in string variable playlist_csv use splitlines () method to store records in variable lines (it is list) iterate through lines to store data in playlist_dict dictionary For the last step, use the following code as the first part of the loop: for 1 in csv. reader (lines, quotechar='"', delimiter=', ', quoting=csv . QUOTE_ALL, skipinitialspace=True) : dt=datetime . strptime (1[0], ' 8B :d, .Y :1: M .p' ) The for-statement uses the CSV reader function to obtain a list of the values in each record. In this case, the variable 1 is a list of three strings: date-time, song-title, artist. The line in the body of the loop creates a datatime object using the data-time string and stores it in variable at. You only need just another line in the body to add the entry into the playlist_dict dictionary. If you complete this step correctly, the dictionary should have 344 entries and the first few entries may look like the following: {datetime . datetime (2020, 8, 13, 0, 2): ('Kinfolks', 'Sam Hunt' ) , datetime . datetime (2020, 8, 12, 23, 58) : ('One Margarita', 'Luke Bryan' ) , datetime . datetime (2020, 8, 12, 23, 54) : ('Pretty Heart', 'Parker Mccollum' ) , ... 4.2 Determine (commercial) breaks Entries in a dictionary is not ordered according to the order of each entry is added. We have to order the entries (or least the timestamps) so we can compute the difference between two adjacent timestamps. Assume t1 and t2 are two datetime objects, (t2-t1 ) . seconds evaluates to the number of seconds between t2 and t1.The following algorithm describes a possible strategy of determine breaks and their timing information. Algorithm: deltas={} # store time stamp and seconds until next entry time_with_breaks= # store timing information of breaks timelist