Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Needing help with TODO task 1 in python. My input for TASK 1 was: import datetime def load_phone_calls_dict(data_dir): phone_calls_dict={} for file in os.listdir(data_dir): if file.startswith('phone_calls_')

Needing help with TODO task 1 in python.

My input for TASK 1 was:

import datetime

def load_phone_calls_dict(data_dir): phone_calls_dict={} for file in os.listdir(data_dir): if file.startswith('phone_calls_') and file.endswith('.txt'): with open(file) as f: lines=[line.rstrip() for line in f] for l in lines: date,time,phone=l.split() timestamp=data+' '+time areacode=phone[3:6] format = '%b %d %Y %I:%M%p' datetime_var = timestamp.datetime.strptime(date_time,format) midnighttime=datetime.datetime(2000, 1, 1, 00, 00, 00) sixamtime=datetime.datetime(2000, 1, 1, 00, 00, 00) if(datetime_var.time()>=midnighttime.time() and datetime_var.time()

When I try to submit the error comes up:

line 23, in load_phone_calls_dict with open(file) as f: ^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'phone_calls_2020.txt'

Can you try and fix the error?

image text in transcribed

image text in transcribed

image text in transcribed

image text in transcribed

In this task, you are going to extend the phone calls analysis you have performed in the preceding unit. This is quite a typical pattern; based on the analysis you performed earlier you are asked to take the work further. Your goal is again to discover certain pattems in the data and capture them in your report. In the preceding module, your task was to select the lines that have the phone numbers with the 412 area code and time stamps between midnight and 6am. Here, you will no longer limit your inquiries to the 412 area code, but you will still focus on the same time span. Your goal now is to report the 100 numbers that received the most calls. Additionally, you will identify all the instances where a single number was re-dialed in less than ten minutes (between midnight [included] and 6am [not included]) and create a separate report for each area code. Inspect the Handout Files There are twelve files used in this task - task2.py and the 11 phone_data/phone_calls_YYry.txt files. You will be making changes to the task2. py file only. You will not make any changes to the 11 phone_data/phone_calls_YYYY.txt files. These are data files your program will read, analyze, and report on. Note that this is no longer just a sample. Hence, there is quite a bit more data than what you dealt with last time. specifically, there are now multiple files, each having records for a single year (between 2010 and 2020). Inspect several files, and you will confirm that the format is still the same: 2020010100:00:09:+1(892)53292432020010100:00:57:+1(342)94442132020010100:02:08:+1(601)42161542020010100:02:25:+1(671)97739442020010100:02:54:+1(901)77033052020010100:03:05:+1(761)8231060 There is a timestamp consisting of the date ( YYYY-MM-DD ) and time ( HH:MM: SS ), followed by a phone number (+0(000)0000000 ). These two are separated by a colon and a space. Warning While there is nothing that prevents you from utilizing all the available data during development it is almost always a good idea to create a development set that only holds a small subset of data. This enables you to test the workings of your program quickly before processing all the data. Remember that it is a good practice to run your program as often as possible. In this way, whenever something does not work as expected, you can (usually) attribute the problem easily to a recent change that you have made. Creating a development set is an art on its own. On the one hand, you would like it to be small so as to enable the test run of your program to be less than 1 second (i.e., instantaneous). On the other hand, you want the set to have all the complexities of the original one. Ideally, you would just replace the development set with the original at the end of the development with no changes to your code. In this task, we recommend you generate a smaller version of each file by taking the first several thousand lines. This should make the development set rather small while preserving all the necessary complexities. The taskz.py file has the create_dev_set function that Map Data Files to Convenient Python Data Structure You probably already realized that the task here differs from the one in the preceding module in that it is not obvious how reading the files line by line could answer the questions. In order to provide the list of 100 numbers that were called the most we need to somehow keep a record for each number. To report on the instances where a number has been called more than one time within ten minutes we need to keep a record of when the number has been called for the last time. Also, we need to separate the numbers with respect to the area codes for this second report. While there are countless options for a convenient data structure, we suggest the following structure: For example, consider this toy data set: 2020010100:00:09:+1(892)53292432020010200:03:05:+1(761)82310602020051500:00:10:+1(761)82310602020060500:01:20:+1(892)53292432020080900:05:10:+1(892)53292432020083000:01:36:+1(761)82310602020091500:03:18:+1(892)53292432020100100:06:28:+1(761)82310602020120200:02:55:+1(761)82310602020121500:02:45:+1(892)5329243 The data set cast to the envisioned data structure would look like this: This means that there will be a dict that maps area codes (key) to a nested dict (value). Each such nested dict contains phone numbers (key) for a specific area code. The phone numbers are each mapped to a list of timestamps (value) of the calls to the respective phone number. Information Before commencing the work, it may be helpful to reflect on how such a data structure could help solve the tasks. As to the problem of listing the busiest phone numbers, let us consider how to query the data structure ( phone_calls_dict) for the information on how many times a specific number has been called: area_code ="123" phone_number ="+0123)456789 num_calls = len(phone_calls_dict[area_code] [phone_number]) Inspect the above code snippet carefully and convince yourself that the num_calls variable holds an int that corresponds to the number of times the number stored in the phone_number variable has been called. As to the task of identifying the instances where a specific number has been re-dialed within less than 10 minutes (strictly less), consider the following code snippet: area_code ="123" phone_number ="+0(123)456789 timestamps = phone_calls_dict[area_code] [phone_number ] for i in range (0, len(timestamps )1): preceding_timestamp = timestanps [i] current_timestanp = timestamps [i+1] time_diff = current_timestamp - preceding_timestamp Inspect the above code snippet carefully and convince yourself that in each iteration the time_diff variable provides the kind of information one needs in order to decide whether to report the instance as a re-dial in less than 10 minutes. Open the taskz.py file in your Visual Studio Code. In TODO 1, you will develop the load_phone_calls_dict function that loads data from the phone calls files into the above described data structure. The function has a single (data_dir) parameter that expects a str identifying the directory in which the data files are to be found. Given the provided data, it outputs the described nested data structure. Suggested logic for the load_phone_calls_dict function: 1. Assign a phone_calls_dict variable with an empty dict. 2. Implement a for loop that iterates over the phone_calls_*.txt files. HINT: The os. listdir function may be helpful here. 3. For each file, open it (preferably using the with statement) and read it line by line. 4. Extract the tinestamp and the phone_number from each line into separate variables. HINT: Use the split method to separate a line into the two constituents. 5. From the phone_number extract the area_code. HINT: Use slicing on the str which can be understood as a list of characters. You want to extract the three digits enclosed in the parentheses. 6. Cast the timestamp into the datetime object. This will enable you to easily determine if the call happened after a certain hour in a day. HINT: You have done this in the preceding module. 7. Check if the phone call happened between midnight (included) and 6 am (not included). 8. If the phone call happened between midnight and 6 am append the timestamp to the appropriate place in the phone_calls_dict. First, you should use conditional statements to check if the necessary keys are already present in the phone_calls_dict. Finally, you would (presumably) append the timestamp to the phone_calls_dict like this: phone_calls_dict[area_code] [phone_number]. append(timestamp) Danger Note that the above code snippet will error out if you do not create the necessary keys in the phone_calls_dict

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Records And Database Management

Authors: Jeffrey R Stewart Ed D, Judith S Greene, Judith A Hickey

4th Edition

0070614741, 9780070614741

More Books

Students also viewed these Databases questions

Question

How do Dimensional Database Models differ from Relational Models?

Answered: 1 week ago

Question

What type of processing do Relational Databases support?

Answered: 1 week ago

Question

Describe several aggregation operators.

Answered: 1 week ago