Question
Introduction to Data Analytics Python-Jupyter Lab II Code: import pandas as pd import numpy as np from checker.binder import binder; binder.bind(globals()) from
Introduction to Data Analytics
Python-Jupyter Lab II
Code:
import pandas as pd
import numpy as np
from checker.binder import binder; binder.bind(globals())
from intro_data_analytics.check_scrubbing import *
df = pd.read_csv('data/inu_neko_orderline.csv')
df
Question 4: Fistful of Dollars I
Validate the prices in prod_price. Remove any rows that are obviously wrong.
There are some non-numeric values in the prod_price column. How do I remove them? I see lots of approaches from reference websites, but have not found an approach that works for me.
I know I need to identify the values that are not floats and remove them.
Sample Data:
0 10300097 719638485153 1001019 2021-01-01 07:35:21.439873 1 10300093 73201504044 1001015 2021-01-01 09:33:37.499660 2 10300093 719638485153 1001015 2021-01-01 09:33:37.499660 3 10300093 441530839394 1001015 2021-01-01 09:33:37.499660 4 10300093 733426809698 1001015 2021-01-01 09:33:37.499660 ... ... ... ... ... 38619 10327860 287663658863 1022098 2021-06-30 15:37:12.821020 38620 10327960 140160459467 1022157 2021-06-30 15:45:09.872732 38621 10328009 425361189561 1022189 2021-06-30 15:57:44.295104 38622 10328089 733426809698 1022236 2021-06-30 15:59:29.801593 38623 10328109 717036112695 1011924 2021-06-30 17:30:52.205912 trans_year trans_month trans_day trans_hour trans_quantity \ 0 2021 1 1 1 1 1 2021 1 1 1 1 2 2021 1 1 1 1 3 2021 1 1 1 2 4 2021 1 1 1 1 ... ... ... ... ... ... 38619 2021 6 30 30 1 38620 2021 6 30 30 2 38621 2021 6 30 30 2 38622 2021 6 30 30 1 38623 2021 6 30 30 1 cust_age cust_state prod_price prod_title prod_category \ 0 20 New York 72.99 Cat Cave bedding 1 34 New York 18.95 Purrfect Puree treat 2 34 New York 72.99 Cat Cave bedding 3 34 New York 28.45 Ball and String toy 4 34 New York 18.95 Yum Fish-Dish food ... ... ... ... ... ... 38619 25 New York 9.95 All Veggie Yummies treat 38620 31 Pennsylvania 48.95 Snoozer Essentails bedding 38621 53 New Jersey 15.99 Snack-em Fish treat 38622 23 Tennessee 18.95 Yum Fish-Dish food 38623 24 Pennsylvania 60.99 Reddy Beddy bedding prod_animal_type prod_size total_sales 0 cat NaN 0 1 cat NaN 0 2 cat NaN 0 3 cat NaN 0 4 cat NaN 0 ... ... ... ... 38619 dog NaN 0 38620 dog NaN 0 38621 cat NaN 0 38622 cat NaN 0 38623 dog medium 0 [38224 rows x 17 columns]> In [*]:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
To remove the nonnumeric values in the prodprice column you can follow these steps ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started