Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

Python - I am stuck/confused with the yellow highlights on this assignment. You can see my attempts below. For instance, with the header being infer,

Python - I am stuck/confused with the yellow highlights on this assignment. You can see my attempts below. For instance, with the header being infer, that means header = None? And how do I filter a column in a csv file?!?! I have provided a small sample of the file below. And how do I create a new DataFrame? does this mean create a new function like def? and each value corresponds to a column that I wish to migrate? I don't understand...and multiplying the income by 0.42514 ?!??! are they referring to a parameter? I am so confused. Please help

image text in transcribed

SCRIPT:

import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # For Jupyter Notebook only

#df = df.SomeFunction() #new_df = df.SomeFunction()

all_alpha = pd.read_csv('C:\\Users\\xxxxx\\Desktop\\Notebook\\all_alpha_19.csv', header = None)

df_filter_row = df_desc_sort[df_desc_sort['Fuel'] = 'Gasoline' OR 'Diesel']

print(df_filter_row())

df_filter_col = df_filter_row[['Stnd'] = 'T3B125']

new_df = old_df[cols] .reset_index(drop=True)

SAMPLE OF THE ALPHA FILE

Model

Displ

Cyl

Trans

Drive

Fuel

Cert Region

Stnd

Stnd Description

Underhood ID

Veh Class

Air Pollution Score

City MPG

Hwy MPG

Cmb MPG

Greenhouse Gas Score

SmartWay

Comb CO2

ACURA ILX

2.4

AMS-8

2WD

Gasoline

L3ULEV125

California LEV-III ULEV125

KHNXV02.4KH3

small car

316

ACURA ILX

2.4

AMS-8

2WD

Gasoline

T3B125

Federal Tier 3 Bin 125

KHNXV02.4KH3

small car

316

ACURA MDX

AMS-7

4WD

Gasoline

L3ULEV125

California LEV-III ULEV125

KHNXV03.0AH3

small SUV

330

Part I - Preparing the DataFrame (ETL) This first part of the assignment will focus on "ETL", which stands for Extract, Transform and Load. This is a key part of a data analyst's job. Ensure the "all_alpha_19.csv" file is in the same folder as your Python script. Use the following setup code at the top of your script: import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # For Jupyter Notebook only Next, use pd.read_csv) to load the file into a DataFrame called "df". The only argument you need is "header" which should be set to 'infer'. We only want records that were generated using a specific fuel efficiency standard (Federal Tier 3 Bin 125) for gasoline and diesel engines. Use either column filters or a query to filter the results to include only rows where: o o 'Stnd' is equal to 'T3B125 AND o 'Fuel' is either 'Gasoline' or 'Diesel Now, create a new DataFrame from the filtered data, but only include the following columns: o 'Modell + Car manufacturer and model o 'Displ' + Engine displacement (size of engine) o "Fuel Type of fuel o 'City MPG'Number of miles the car gets per gallon of fuel in the city o 'Hwy MPG' Number of miles the car gets per gallon of fuel on the highway 'Cmb MPG' Combined number of miles the car gets per gallon of fuel in the city and highway o 'Greenhouse Gas Score' Calculated score indicating the car's efficiency (higher is better) Note: one of the easiest ways to create a new DataFrame using only some of the columns from another is to: Create a list of strings (called cols) where each value corresponds to a column name you wish to migrate Create the new DataFrame using the following syntax: new_df = old_df [cols] You will also want to reset the index to avoid an extra un-named column in the new frame that equals the index from the old one. You can chain this to the above call with: .reset_index (drop=True) Use the astype() function of the DataFrame to convert the three MPG columns to float. Next, because only a small part of the world uses miles and gallons, we are going to add three new columns (CityKML', 'HwyKML' and 'CmbKML') to store the fuel efficiency in kilometers per liter. To do this we need a conversion function called mpg_to_kml: def mpg_to_km1(mpg): o This method will multiply the incoming value by 0.42514 and return the result Now, using the assign() method of the DataFrame, add each of those new columns to the frame. o The assign() method allows you to use a lambda or a named function to calculate the value of the new column's cells. It will automatically apply the given function to each row of the frame so you don't need to write a loop. Finally, save the DataFrame to a CSV file called "car_data.csv". o Use your favorite CSV editor to verify the contents (Excel, Notepad++, etc.)