Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Python code for all the commented steps thank you Feature Engineering Anyone who follows the game of baseball knows that, as Major League Baseball (MLB)
Python code for all the commented steps thank you
Feature Engineering Anyone who follows the game of baseball knows that, as Major League Baseball (MLB) progressed, different eras emerged where the amount of runs per game increased or decreased significantly. The dead ball era of the early 1900 s is an example of a low scoring era and the steroid era at the turn of the 21 st century is an example of a high scoring era. Hence, in this analysis, we want to exclude all the game data before the year 1900 . [11] 1 \#Filtering the years before year 1900 3 \#\#\#\# Complete your code below 4 \#\#\#\# You can filter by the value of a certain column 'col' in a dataframe 'df' as: 5 \#\#\#\# df [df[col] > some_value] 6 \#\#\#\# then you want to save the filtered results as a new dataframe "df" If you are into baseball, you will know Runs per Game (RPG) is an important factor in the game results (winning or not). Since we do not have that feature in the dataset, we are going to create that feaure from the features we have. In the dataset df, we have the runs ( df[ ' R ' ]) and games ( df [ 'G']) features. We are going to use them to create RPG. Firstly, we are doing the analysis on a yearly basis, so we need to aggregate the runs ( df[ ' R ' ] ) and games ( df [ 'G' ]) features to a yearly basis. [12] 1 \# Aggregate runs and games data to a yearly basis 2 3 \#\#\#\# complete the code below 4 \#\#\#\# Aggregation is a very important technique in dataframes 5 \#\#\#\# 'pandas' provides a good method called 'groupby() for that 6 \#\#\#\# since we want to aggregate to a yearly basis, we are going to use the feature 'yearID' 7 \#\#\#\# and combine the 'groupby()' results with 'sum()' we will get the aggregated results 8 9 \#\#\#\# create a new pandas series called 'runs_per_year', 10 \#\#\#\# which is aggregating runs ('df['R']') on a yearly basis 11 12 13 \#\#\#\# do the same for games ('df['G']') 14 15 16 \#\#\#\# We need to combine these two series into a dataframe ('rpg_df') to calculate RPG 17 \#\#\#\# We can do that by using the 'pd.concat()' method provided by 'pandas' 18 \#\#\#\# Since we want the two series are columns, we need to set the parameter 'axis=1' in 'concat()' 19 20 21 \#\#\#\# Now we can calculate RPG: which is very simple: rpg = runs_per_year/games_per_year 22 23 24 \#\#\#\# let's double check the values of 'rpg_df' by looking at its first 10 values 25
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started