Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Currently the script associated with Example 5 in Week 5 shows a list of how many unique IP addresses are present in the log. However,

Currently the script associated with Example 5 in Week 5 shows a list of how many unique IP addresses are present in the log. However, such list may not be sufficient without a count associated with the number of unexpected accesses to a particular resource. For this assignment you are tasked to modify the script from Week 5 (which is shown below) to show the following:

A list of IP addresses in ascending order,

How many times an IP address is reported in the log, and

Whether an IP address is in a list of known IP addresses.

Script Example From Week 5 Below

import os

from pathlib import Path

import re

# Getting the directory that contains the script, so all file operations will take place in that directory

script_home_dir = os.path.dirname(os.path.abspath(__file__))

sample_file = 'HDFS_2k.log'

# This list will host the content of the file

file_content = []

# Reading the file

with open(Path(script_home_dir, sample_file), 'r') as my_file:

file_content = my_file.readlines()

# Trying to match only valid IP addresses (0-255) - Source: https://ihateregex.io/expr/ip/

digits_pattern = re.compile(r'(\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}')

print(f" Search pattern: {digits_pattern}")

ip_addresses_in_log = []

# Extracting multiple IP addresses from each line

# Working on each line to match the pattern, using a different method that can get all matches

for i, line in enumerate(file_content):

line = line.strip()

search_outcome = digits_pattern.finditer(line) # This is the new way to attempting to find matches within the line

print(f'Line {i}: {line}')

if search_outcome != None:

# print(search_outcome)

for n, ip in enumerate(search_outcome):

ip_addr = ip.group()

print(f'--- Match {n}: {ip_addr}')

if ip_addr not in ip_addresses_in_log:

ip_addresses_in_log.append(ip_addr)

# Limiting the loop to the first few lines - Remove the if block below to run through the entire log file

if i > 100:

break

# ----------------------------------------

print(" Distinct IP addresses in the log:")

for ip_addr in ip_addresses_in_log:

print(f'IP: {ip_addr}')

An example of the output is the following:

IP Address

Count

Expected

10.50.100.150

72

Yes

10.50.100.152

100

Yes

10.50.100.155

46

No

Please note that the table above is just a depiction and I am not expecting an actual table.

The source of the analysis should be the HDFS_2k.log file, which was used as a Data File 2 for Week 5, attached to this assignment. The list of known IP addresses is also attached to this assignment.

Notes

Your scripts should be fully commented, including who is the author, purpose of the script, and date

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Information And Database Systems 12th Asian Conference ACIIDS 2020 Phuket Thailand March 23 26 2020 Proceedings

Authors: Pawel Sitek ,Marcin Pietranik ,Marek Krotkiewicz ,Chutimet Srinilta

1st Edition

9811533792, 978-9811533792

More Books

Students also viewed these Databases questions

Question

Describe the sources of long term financing.

Answered: 1 week ago

Question

1. What might have led to the misinformation?

Answered: 1 week ago

Question

2. How will you handle the situation?

Answered: 1 week ago

Question

3. Write a policy statement to address these issues.

Answered: 1 week ago