Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Statement: Let us assume that you have a web server or application that appends a line to a log file every time it serves

image text in transcribedimage text in transcribedimage text in transcribed

Problem Statement: Let us assume that you have a web server or application that appends a line to a log file every time it serves a request. Some examples of lines in the log file are as follows (two lines of the log file are shown here (the format of the input log file). 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 2006245 unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET/shuttle/countdown/ HTTP/1.0" 2003985 Here is the meaning of the fields of the above log lines. That is what is typically called metadata (or columns here). Note that unknown is the column that can ignore for this assignment. And note that content_size is an int type, i.e., number of bytes. host, unknown1, unknown2, timestamp, method, url, version, response_code, content_size Prerequisites To do the data analysis, the first step is to implement a function that reads in the provided log file and stores the data in a Pandas Dataframe. Notes: - Make sure the names and order of the columns follow the metadata above. - (Hint) Store column names/headers to the Dataframe correctly could make the following tasks easier. - (Hint) Avoid storing incorrect column names, missing column names, or storing column names in a different order. - Using the two example log lines given above, it should generate a Pandas Dataframe looks like this: Problem A (8 pts) Write functions that answer the following questions: 1. Total number of distinct HTTP response codes 2. Median content_size - Hint: use numpy to find median - Note: need to ignore values in the content_size column that are not a number - If the median number is a float number, you will need to cast it into an integer number - We need type-casting, so avoid using round, floor, ceiling, or any other math functions. 3. Top N (e.g., 10) most frequent hosts - Note: The result should be ordered from top 1 to N 4. Top N (e.g., 10) most frequent urls - Note: The result should be ordered from top 1 to N 5. Top N (e.g., 5) urls that received error response codes, (i.e., non 200 response codes) - Note: The result should be ordered from top 1 to N 6. Total number of requests with 404 responses 7. Number of unique daily (in UTC time) hosts - Hint: Convert timestamp string into datetime type in UTC timezone - Note: The result should be ordered from the earliest date to the latest date 8. Average number of daily (in UTC time) requests per host - Hint: Convert timestamp string into datetime type in UTC timezone - Note: The result should be ordered from the earliest date to the latest date - if a number is a float number, you will need to cast it into an integer number Problem B (4 pts) Implement a function that can write the answers to Problem A into a JSON file. The format should match the following example: \{ "get_num_of_distinct_resp_code": 1, "get_median_content_size": 2, "get_most_freq_hosts":["/answer", "to", "q3"], "get_most_freq_urls": ["/answer", "to", "q4"], "get_top_urls_recv_err": ["/answer", "to", "q5"], "get_num_of_req_recv_404": 6, "get_num_of_unique_hosts_daily": [7, 0,0], "get_avg_num_of_req_per_host_daily": [8, 0, 0]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Database Systems For Integration Of Media And User Environments 98

Authors: Yahiko Kambayashi, Akifumi Makinouchi, Shunsuke Uemura, Katsumi Tanaka, Yoshifumi Masunaga

1st Edition

9810234368, 978-9810234362

More Books

Students also viewed these Databases questions

Question

2. What process will you put in place to address conflicts?

Answered: 1 week ago