Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

USE PYTHON and can ONLY IMPORT PANDAS Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to

USE PYTHON and can ONLY IMPORT PANDAS

Hi I am having trouble cleaning a specific column in my dataframe that is not allowing me to finish my project.

I have a column that states the hours of a different schools. The data isn't clean so there are different variants of how this is read. Below is a sample size of the data in this column.

School_Hours
08:00 AM-03:00 PM
08:15 AM-03:15 PM
08:30 AM-03:30 PM
7:45 AM - 2:45 PM
7:30 AM-2:30 PM
7:45 AM - 2:45 PM
8:30 AM - 2:55 PM
08:30 AM-03:30 PM
8:30 AM-3:30 PM
08:00 AM-03:00 PM
8:00 am-3:30 pm
9:00 AM - 4:15 PM
7:00 am-3:00 pm
08:30 AM-03:30 PM
08:00 AM-03:00 PM
08:45 AM-03:45 PM
8:00 AM-3:30 PM
8:00 AM-3:30 PM
7:45 AM-2:45 PM
07:45 AM-02:45 PM
8:00 am-3:30 pm
9:00 AM - 4:08 PM
7:50 am-3:30 pm
8:00 AM - 3:30 PM
08:30 AM-03:30 PM
7:15a.m.-2:45p.m.
8:00 am-3:30 pm
9:00 AM - 4:00 Pm
08:00 AM-03:00 PM
8:00 AM - 3:13 PM
08:15 AM-03:15 PM
M, T, W, Th: 7:45 AM-3:05 PM F: 7:45 AM-2:07 PM
08:45 AM-03:45 PM
7:30 AM-3:00 PM
7:45 AM - 2:45 PM
08:00 AM-03:00 PM
7:45 AM - 3:00 PM
08:45 AM-03:45 PM
07:45 AM-02:45 PM
8:00 am-3:00 pm

:

I want to take this column from another dataframe called df and add to my dataframe school_df as a new column. The added column should grab the start time of schools rounded down to the hour. For ex) 8:45am would be 8, and 7:30am would be 7. For all blanks/nulls will be the mean of the column.

------------------------------------------------------------------------------------------------------------------------------------

This is my current script for the column:

school_df['Starting Hour'] = df['School_Hours'].str.extract("(^\d*)")
school_df['Starting Hour'] = school_df['Starting Hour'].str.replace('0', '')

This is the unique results that I get for the column:

['8', '7', '9', '', nan]

---------------------------------------------------------------------------------------------------------------------------------------------------

I would prefer to not have to replace the 0. If you can get a script that grabs the first non-zero digit in the column that would be the best but I couldn't get that to work. The expected results should be 8, 7, and 9. The nan and space should be equal to the mean of the column. The column should also be an int d-type.

Thanks

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions