Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 11, 2024

I am working on project one for MAT243 Applied Statistics. It is working in Codio in Jupyter notebook. I got all my codes to work

I am working on project one for MAT243 Applied Statistics. It is working in Codio in Jupyter notebook. I got all my codes to work except step 9. Can you help me withthe code? I keep getting syntax errors. Also I am confused on step 8 and which option is correct?

Project One: Data Visualization, Descriptive Statistics, Confidence Intervals

You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the team and your management have requested that you use descriptive statistics and data visualization techniques to study distributions of key performance metrics that are included in the data set. These data-driven analytics will help make key decisions to improve the performance of the team. You will use the Python programming language to perform the statistical analyses and then a report of your findings to present for the team's management. Since the managers are not data analysts, you will need to interpret your findings and describe their practical implications.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Step 1: Data Preparation & the Assigned Team

This step uploads the data set from a CSV file. It also selects the assigned team for this analysis. Do not make any changes to the code block below.

Theassigned teamis theChicago Bullsfrom the years1996-1998

Click the block of code below and hit theRunbutton above.

In[9]:

import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt from IPython.display import display, HTML  nba_orig_df = pd.read_csv('nbaallelo.csv') nba_orig_df = nba_orig_df[(nba_orig_df['lg_id']=='NBA') & (nba_orig_df['is_playoffs']==0)] columns_to_keep = ['game_id','year_id','fran_id','pts','opp_pts','elo_n','opp_elo_n', 'game_location', 'game_result'] nba_orig_df = nba_orig_df[columns_to_keep]  # The dataframe for the assigned team is called assigned_team_df.  # The assigned team is the Chicago Bulls from 1996-1998. assigned_years_league_df = nba_orig_df[(nba_orig_df['year_id'].between(1996, 1998))] assigned_team_df = assigned_years_league_df[(assigned_years_league_df['fran_id']=='Bulls')] assigned_team_df = assigned_team_df.reset_index(drop=True)  display(HTML(assigned_team_df.head().to_html())) print("printed only the first five observations...") print("Number of rows in the data set =", len(assigned_team_df))

game_idyear_idfran_idptsopp_ptselo_nopp_elo_ngame_locationgame_result0199511030CHI1996Bulls105911598.29241531.7449HW1199511040CHI1996Bulls107851604.39401458.6415HW2199511070CHI1996Bulls1171081605.79831310.9349HW3199511090CLE1996Bulls106881618.87011452.8268AW4199511110CHI1996Bulls1101061621.15911490.2861HW

printed only the first five observations... Number of rows in the data set = 246

Step 2: Pick Your Team

In this step, you will pick your team. The range of years that you will study for your team is2013-2015. Make the following edits to the code block below:

ReplacejQuery22402064513193889097_1573167931336TEAMjQuery224026042564711554006_1573168145871with your choice of team from one of the following team names.
Bucks, Bulls, Cavaliers, Celtics, Clippers, Grizzlies, Hawks, Heat, Jazz, Kings, Knicks, Lakers, Magic, Mavericks, Nets, Nuggets, Pacers, Pelicans, Pistons, Raptors, Rockets, Sixers, Spurs, Suns, Thunder, Timberwolves, Trailblazers, Warriors, Wizards
Remember to enter the team name within single quotes. For example, if you picked the Suns, then jQuery22406180058666473843_1573168330202TEAMjQuery22408868132426976608_1573168437651 should be replaced with 'Suns'.

After you are done with your edits, click the block of code below and hit theRunbutton above.

In[10]:

# Range of years: 2013-2015 (Note: The line below selects ALL teams within the three-year period 2013-2015. This is not your team's dataframe. your_years_leagues_df = nba_orig_df[(nba_orig_df['year_id'].between(2013, 2015))]  # The dataframe for your team is called your_team_df. # ---- TODO: make your edits here ---- your_team_df = your_years_leagues_df[(your_years_leagues_df['fran_id']=='Jazz')] your_team_df = your_team_df.reset_index(drop=True)  display(HTML(your_team_df.head().to_html())) print("printed only the first five observations...") print("Number of rows in the data set =", len(your_team_df))

game_idyear_idfran_idptsopp_ptselo_nopp_elo_ngame_locationgame_result0201210310UTA2013Jazz113941542.91831523.3170HW1201211020NOH2013Jazz86881538.19841453.9698AL2201211030SAS2013Jazz1001101534.78931686.3522AL3201211050MEM2013Jazz941031529.53501586.0551AL4201211070UTA2013Jazz95861535.96741521.1603HW

printed only the first five observations... Number of rows in the data set = 246

Step 3: Data Visualization: Points Scored by Your Team

The coach has requested that you provide a visual that shows the distribution of points scored by your team in the years 2013-2015. The code below provides two possible options. PickONEof these two plots to include in your summary report. Choose the plot that you think provides the best visual for the distribution of points scored by your team. In your summary report, you must explain why you think your visual is the best choice.

Click the block of code below and hit theRunbutton above.

NOTE: If the plots are not created, click the code section and hit theRunbutton again.

In[11]:

import seaborn as sns  # Histogram fig, ax = plt.subplots() plt.hist(your_team_df['pts'], bins=20) plt.title('Histogram of points scored by Your Team in 2013 to 2015', fontsize=18) ax.set_xlabel('Points') ax.set_ylabel('Frequency') plt.show() #print("")  # Scatterplot plt.title('Scatterplot of points scored by Your Team in 2013 to 2015', fontsize=18) sns.regplot(your_team_df['year_id'], your_team_df['pts'], ci=None) plt.show() print("")

Step 4: Data Visualization: Points Scored by the Assigned Team

The coach has also requested that you provide a visual that shows a distribution of points scored by the Bulls from years 1996-1998. The code below provides two possible options. PickONEof these two plots to include in your summary report. Choose the plot that you think provides the best visual for the distribution of points scored by your team. In your summary report, you will explain why you think your visual is the best choice.

Click the block of code below and hit theRunbutton above.

NOTE: If the plots are not created, click the code section and hit theRunbutton again.

In[12]:

import seaborn as sns  # Histogram fig, ax = plt.subplots() plt.hist(assigned_team_df['pts'], bins=20) plt.title('Histogram of points scored by the Bulls in 1996 to 1998', fontsize=18) ax.set_xlabel('Points') ax.set_ylabel('Frequency') plt.show()  # Scatterplot plt.title('Scatterplot of points scored by the Bulls in 1996 to 1998', fontsize=18) sns.regplot(assigned_team_df['year_id'], assigned_team_df['pts'], ci=None) plt.show()

Step 5: Data Visualization: Comparing the Two Teams

Now the coach wants you to prepare one plot that provides a visual of the differences in the distribution of points scored by the assigned team and your team. The code below provides two possible visuals. Choose the plot that allows for the best comparison of the data distributions.

Click the block of code below and hit theRunbutton above.

NOTE: If the plots are not created, click the code section and hit theRunbutton again.

In[13]:

import seaborn as sns  # Side-by-side boxplots both_teams_df = pd.concat((assigned_team_df, your_team_df)) plt.title('Boxplot to compare points distribution', fontsize=18) sns.boxplot(x='fran_id',y='pts',data=both_teams_df) plt.show() print("")  # Histograms fig, ax = plt.subplots() plt.hist(assigned_team_df['pts'], 20, alpha=0.5, label='Assigned Team') plt.hist(your_team_df['pts'], 20, alpha=0.5, label='Your Team') plt.title('Histogram to compare points distribution', fontsize=18) plt.xlabel('Points') plt.legend(loc='upper right') plt.show() print("")

Step 6: Descriptive Statistics: Relative Skill of Your Team

The management of your team wants you to run descriptive statistics on the relative skill of your team from 2013-2015. In this project, you will use the variable 'elo_n' to respresent the relative skill of the teams. Calculate descriptive statistics including the mean, median, variance, and standard deviation for the relative skill of your team. Make the following edits to the code block below:

Replace??MEAN_FUNCTION??with the name of Python function that calculates the mean.
Replace??MEDIAN_FUNCTION??with the name of Python function that calculates the median.
Replace??VAR_FUNCTION??with the name of Python function that calculates the variance.
Replace??STD_FUNCTION??with the name of Python function that calculates the standard deviation.

After you are done with your edits, click the block of code below and hit theRunbutton above.

In[14]:

print("Your Team's Relative Skill in 2013 to 2015") print("-------------------------------------------------------")  # ---- TODO: make your edits here ---- mean = your_team_df['elo_n'].mean() median = your_team_df['elo_n'].median() variance = your_team_df['elo_n'].var() stdeviation = your_team_df['elo_n'].std()  print('Mean =', round(mean,2)) print('Median =', round(median,2)) print('Variance =', round(variance,2)) print('Standard Deviation =', round(stdeviation,2)) Your Team's Relative Skill in 2013 to 2015 ------------------------------------------------------- Mean = 1462.85 Median = 1455.42 Variance = 3920.12 Standard Deviation = 62.61

Step 7 - Descriptive Statistics - Relative Skill of the Assigned Team

The management also wants you to run descriptive statistics for the relative skill of the Bulls from 1996-1998. Calculate descriptive statistics including the mean, median, variance, and standard deviation for the relative skill of the assigned team.

You are to use this code block yourself.

Use Step 6 to help you enter code block. Here is some information that will help you with this code block.

The dataframe for the assigned team is called assigned_team_df.
The variable 'elo_n' respresent the relative skill of the teams.
Your statistics should be rounded to two decimal places.

code in the code block section below. After you are done, click this block of code and hit theRunbutton above. Reach out to your instructor if you need more help with this step.

In[15]:

# Write your code in this code block.  print("Assigned Team's Relative Skill in 1996 to 1998") print("-------------------------------------------------------")  # ---- TODO: make your edits here ---- mean = assigned_team_df['elo_n'].mean() median = assigned_team_df['elo_n'].median() variance = assigned_team_df['elo_n'].var() stdeviation = assigned_team_df['elo_n'].std()  print('Mean =', round(mean,2)) print('Median =', round(median,2)) print('Variance =', round(variance,2)) print('Standard Deviation =', round(stdeviation,2))   Assigned Team's Relative Skill in 1996 to 1998 ------------------------------------------------------- Mean = 1739.8 Median = 1751.23 Variance = 2651.55 Standard Deviation = 51.49

Step 8: Confidence Intervals for the Average Relative Skill of All Teams in Your Team's Years

The management wants to you to calculate a 95% confidence interval for the average relative skill of all teams in 2013-2015. To construct a confidence interval, you will need the mean and standard error of the relative skill level in these years. The code block below calculates the mean and the standard deviation. Your edits will calculate the standard error and the confidence interval. Make the following edits to the code block below:

Replace??SD_VARIABLE??with the variable name representing the standard deviation of relative skill of all teams from your years.(Hint: thestandard deviationvariable is in the code block below)
Replace??CL??with the confidence level of the confidence interval.
Replace??MEAN_VARIABLE??with the variable name representing the mean relative skill of all teams from your years.(Hint: themeanvariable is in the code block below)
Replace??SE_VARIABLE??with the variable name representing the standard error.(Hint: thestandard errorvariable is in the code block below)

The management also wants you to calculate the probability that a team in the league has a relative skill level less than that of the team that you picked. Assuming that the relative skill of teams is Normally distributed, Python methods for a Normal distribution can be used to answer this question. The code block below uses two of these Python methods. Your task is to identify the correct Python method and report the probability.

After you are done with your edits, click the block of code below and hit theRunbutton above.

In[16]:

print("Confidence Interval for Average Relative Skill in the years 2013 to 2015") print("------------------------------------------------------------------------------------------------------------")  # Mean relative skill of all teams from the years 2013-2015 mean = your_years_leagues_df['elo_n'].mean()  # Standard deviation of the relative skill of all teams from the years 2013-2015 stdev = your_years_leagues_df['elo_n'].std()  n = len(your_years_leagues_df)  #Confidence interval # ---- TODO: make your edits here ---- stderr = (stdev)/(n ** 0.5) conf_int_95 = st.norm.interval(0.95,mean,stderr)  print("95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 2013 to 2015 =", conf_int_95) print("95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = (", round(conf_int_95[0], 2),",", round(conf_int_95[1], 2),")")   print(" ") print("Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 2013 to 2015") print("----------------------------------------------------------------------------------------------------------------------------------------------------------")  mean_elo_your_team = your_team_df['elo_n'].mean()  choice1 = st.norm.sf(mean_elo_your_team, mean, stdev) choice2 = st.norm.cdf(mean_elo_your_team, mean, stdev)  # Pick the correct answer. print("Which of the two choices is correct?") print("Choice 1 =", round(choice1,4)) print("Choice 2 =", round(choice2,4)) Confidence Interval for Average Relative Skill in the years 2013 to 2015 ------------------------------------------------------------------------------------------------------------ 95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = (1502.0236894390478, 1507.1824625533618) 95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = ( 1502.02 , 1507.18 ) Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 2013 to 2015 ---------------------------------------------------------------------------------------------------------------------------------------------------------- Which of the two choices is correct? Choice 1 = 0.6441 Choice 2 = 0.3559

Step 9 - Confidence Intervals for the Average Relative Skill of All Teams in the Assigned Team's Years

The management also wants to you to calculate a 95% confidence interval for the average relative skill of all teams in the years 1996-1998. Calculate this confidence interval.

Use Step 8 to help you with this code block. Here is some information that will help you

The dataframe for the years 1996-1998 is called assigned_years_league_df
The variable 'elo_n' represents the relative skill of teams.
Start by calculating the mean and the standard deviation of relative skill (ELO) in years 1996-1998.
Calculate n that represents the sample size.
Calculate the standard error which is equal to the standard deviation of Relative Skill (ELO) divided by the square root of the sample size n.
Assuming that the population standard deviation is known, use Python methods for the Normal distribution to calculate the confidence interval.
Your statistics should be rounded to two decimal places.

The management also wants you to calculate the probability that a team had a relative skill level less than the Bulls in years 1996-1998. Assuming that the relative skill of teams is Normally distributed, calculate this probability.

Use Step 8 to help you.

Calculate the mean relative skill of the Bulls. Note that the dataframe for the Bulls is called assigned_team_df. The variable 'elo_n' represents the relative skill.
Use Python methods for a Normal distribution to calculate this probability.
The probability value should be rounded to four decimal places.