Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python code for all the commented steps thank you Feature Engineering Anyone who follows the game of baseball knows that, as Major League Baseball (MLB)

image text in transcribedimage text in transcribedPython code for all the commented steps thank you

Feature Engineering Anyone who follows the game of baseball knows that, as Major League Baseball (MLB) progressed, different eras emerged where the amount of runs per game increased or decreased significantly. The dead ball era of the early 1900 s is an example of a low scoring era and the steroid era at the turn of the 21 st century is an example of a high scoring era. Hence, in this analysis, we want to exclude all the game data before the year 1900 . [11] 1 \#Filtering the years before year 1900 3 \#\#\#\# Complete your code below 4 \#\#\#\# You can filter by the value of a certain column 'col' in a dataframe 'df' as: 5 \#\#\#\# df [df[col] > some_value] 6 \#\#\#\# then you want to save the filtered results as a new dataframe "df" If you are into baseball, you will know Runs per Game (RPG) is an important factor in the game results (winning or not). Since we do not have that feature in the dataset, we are going to create that feaure from the features we have. In the dataset df, we have the runs ( df[ ' R ' ]) and games ( df [ 'G']) features. We are going to use them to create RPG. Firstly, we are doing the analysis on a yearly basis, so we need to aggregate the runs ( df[ ' R ' ] ) and games ( df [ 'G' ]) features to a yearly basis. [12] 1 \# Aggregate runs and games data to a yearly basis 2 3 \#\#\#\# complete the code below 4 \#\#\#\# Aggregation is a very important technique in dataframes 5 \#\#\#\# 'pandas' provides a good method called 'groupby() for that 6 \#\#\#\# since we want to aggregate to a yearly basis, we are going to use the feature 'yearID' 7 \#\#\#\# and combine the 'groupby()' results with 'sum()' we will get the aggregated results 8 9 \#\#\#\# create a new pandas series called 'runs_per_year', 10 \#\#\#\# which is aggregating runs ('df['R']') on a yearly basis 11 12 13 \#\#\#\# do the same for games ('df['G']') 14 15 16 \#\#\#\# We need to combine these two series into a dataframe ('rpg_df') to calculate RPG 17 \#\#\#\# We can do that by using the 'pd.concat()' method provided by 'pandas' 18 \#\#\#\# Since we want the two series are columns, we need to set the parameter 'axis=1' in 'concat()' 19 20 21 \#\#\#\# Now we can calculate RPG: which is very simple: rpg = runs_per_year/games_per_year 22 23 24 \#\#\#\# let's double check the values of 'rpg_df' by looking at its first 10 values 25

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

101 Database Exercises Text Workbook

Authors: McGraw-Hill

2nd Edition

0028007484, 978-0028007489

Students also viewed these Databases questions