Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python - I am stuck/confused with the yellow highlights on this assignment. You can see my attempts below. For instance, with the header being infer,

Python - I am stuck/confused with the yellow highlights on this assignment. You can see my attempts below. For instance, with the header being infer, that means header = None? And how do I filter a column in a csv file?!?! I have provided a small sample of the file below. And how do I create a new DataFrame? does this mean create a new function like def? and each value corresponds to a column that I wish to migrate? I don't understand...and multiplying the income by 0.42514 ?!??! are they referring to a parameter? I am so confused. Please help

image text in transcribedimage text in transcribed

SCRIPT:

import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # For Jupyter Notebook only

#df = df.SomeFunction() #new_df = df.SomeFunction()

all_alpha = pd.read_csv('C:\\Users\\xxxxx\\Desktop\\Notebook\\all_alpha_19.csv', header = None)

df_filter_row = df_desc_sort[df_desc_sort['Fuel'] = 'Gasoline' OR 'Diesel']

print(df_filter_row())

df_filter_col = df_filter_row[['Stnd'] = 'T3B125']

new_df = old_df[cols] .reset_index(drop=True)

SAMPLE OF THE ALPHA FILE

Model Displ Cyl Trans Drive Fuel Cert Region Stnd Stnd Description Underhood ID Veh Class Air Pollution Score City MPG Hwy MPG Cmb MPG Greenhouse Gas Score SmartWay Comb CO2
ACURA ILX 2.4 4 AMS-8 2WD Gasoline CA L3ULEV125 California LEV-III ULEV125 KHNXV02.4KH3 small car 3 24 34 28 6 No 316
ACURA ILX 2.4 4 AMS-8 2WD Gasoline FA T3B125 Federal Tier 3 Bin 125 KHNXV02.4KH3 small car 3 24 34 28 6 No 316
ACURA MDX 3 6 AMS-7 4WD Gasoline CA L3ULEV125 California LEV-III ULEV125 KHNXV03.0AH3 small SUV 3 26 27 27 6 No 330
Part I - Preparing the DataFrame (ETL) This first part of the assignment will focus on "ETL", which stands for Extract, Transform and Load. This is a key part of a data analyst's job. Ensure the "all_alpha_19.csv" file is in the same folder as your Python script. Use the following setup code at the top of your script: import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # For Jupyter Notebook only Next, use pd.read_csv) to load the file into a DataFrame called "df". The only argument you need is "header" which should be set to 'infer'. We only want records that were generated using a specific fuel efficiency standard (Federal Tier 3 Bin 125) for gasoline and diesel engines. Use either column filters or a query to filter the results to include only rows where: o o 'Stnd' is equal to 'T3B125 AND o 'Fuel' is either 'Gasoline' or 'Diesel Now, create a new DataFrame from the filtered data, but only include the following columns: o 'Modell + Car manufacturer and model o 'Displ' + Engine displacement (size of engine) o "Fuel Type of fuel o 'City MPG'Number of miles the car gets per gallon of fuel in the city o 'Hwy MPG' Number of miles the car gets per gallon of fuel on the highway 'Cmb MPG' Combined number of miles the car gets per gallon of fuel in the city and highway o 'Greenhouse Gas Score' Calculated score indicating the car's efficiency (higher is better) Note: one of the easiest ways to create a new DataFrame using only some of the columns from another is to: Create a list of strings (called cols) where each value corresponds to a column name you wish to migrate Create the new DataFrame using the following syntax: new_df = old_df [cols] You will also want to reset the index to avoid an extra un-named column in the new frame that equals the index from the old one. You can chain this to the above call with: .reset_index (drop=True) Use the astype() function of the DataFrame to convert the three MPG columns to float. Next, because only a small part of the world uses miles and gallons, we are going to add three new columns (CityKML', 'HwyKML' and 'CmbKML') to store the fuel efficiency in kilometers per liter. To do this we need a conversion function called mpg_to_kml: def mpg_to_km1(mpg): o This method will multiply the incoming value by 0.42514 and return the result Now, using the assign() method of the DataFrame, add each of those new columns to the frame. o The assign() method allows you to use a lambda or a named function to calculate the value of the new column's cells. It will automatically apply the given function to each row of the frame so you don't need to write a loop. Finally, save the DataFrame to a CSV file called "car_data.csv". o Use your favorite CSV editor to verify the contents (Excel, Notepad++, etc.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Expert Systems Applications 15th International Conference Dexa 2004 Zaragoza Spain August 30 September 3 2004 Proceedings Lncs 3180

Authors: Fernando Galindo ,Makoto Takizawa ,Roland Traunmuller

2004th Edition

3540229361, 978-3540229360

More Books

Students also viewed these Databases questions

Question

107 MA ammeter 56 resistor ? V voltmeter

Answered: 1 week ago

Question

Generally If Drug A is an inducer of Drug B , Drug B levels will

Answered: 1 week ago