Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

(PYTHON) I am trying to convert a text file with lines like this: 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] GET /history/apollo/ HTTP/1.0 200 6245 into a

(PYTHON)

I am trying to convert a text file with lines like this:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245 

into a pandas data frame like this:

host timestamp method url version response_code content_size
199.72.81.55 01/Jul/1995:00:00:01 -0400 GET /history/apollo/ HTTP/1.0 200 6245
unicomp6.unicomp.net 01/Jul/1995:00:00:06 -0400 GET /shuttle/countdown/ HTTP/1.0 200 3985

I am really close with this method:

df = pandas.read_csv(src_log_filepath, sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)", names=["host", "timestamp", "method", "url", "version", "response_code", "content_size"])

Except for It does not separate the "url" contents from what should go into the "version" column.So it would look like this in the url column and the version column would just be NaN

url version
/history/apollo/ HTTP/1.0 NaN

Everything else is fine though. But when I try to add "|\s(?=HTTP)" into the "sep" arg it fixes this issue but then the rest of the data columns get messed up. Where the host column and the timestamp column will now have the IP for some reason:

Example host: 10.223.157.186 15/Jul/2009:14:58:59 -0700

Example timestamp: 10.223.157.186 GET

Some how adding "|\s(?=HTTP)" into "sep" causes this.

sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)|\s(?=HTTP)" it would look like this

Why does this happen and how can I separate the URL from the method without this occurring?

(Some requirements for the assignment ask me to clean up the string before I put it into the table. that's why my regex is so weird.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions