Question
(PYTHON) I am trying to convert a text file with lines like this: 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] GET /history/apollo/ HTTP/1.0 200 6245 into a
(PYTHON)
I am trying to convert a text file with lines like this:
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
into a pandas data frame like this:
host | timestamp | method | url | version | response_code | content_size |
199.72.81.55 | 01/Jul/1995:00:00:01 -0400 | GET | /history/apollo/ | HTTP/1.0 | 200 | 6245 |
unicomp6.unicomp.net | 01/Jul/1995:00:00:06 -0400 | GET | /shuttle/countdown/ | HTTP/1.0 | 200 | 3985 |
I am really close with this method:
df = pandas.read_csv(src_log_filepath, sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)", names=["host", "timestamp", "method", "url", "version", "response_code", "content_size"])
Except for It does not separate the "url" contents from what should go into the "version" column.So it would look like this in the url column and the version column would just be NaN
url | version |
/history/apollo/ HTTP/1.0 | NaN |
Everything else is fine though. But when I try to add "|\s(?=HTTP)" into the "sep" arg it fixes this issue but then the rest of the data columns get messed up. Where the host column and the timestamp column will now have the IP for some reason:
Example host: 10.223.157.186 15/Jul/2009:14:58:59 -0700
Example timestamp: 10.223.157.186 GET
Some how adding "|\s(?=HTTP)" into "sep" causes this.
sep="\s-\s-\s\[|\s(?=/)|\]\s\"|\"(?=\s)|\s(?=\d+)|\s(?=HTTP)" it would look like this
Why does this happen and how can I separate the URL from the method without this occurring?
(Some requirements for the assignment ask me to clean up the string before I put it into the table. that's why my regex is so weird.)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started