Question

1 Approved Answer

Posted on Jul 08, 2024

BACKGROUND: You are a data analyst for a basketball team and have access to a large set of historical data that you can use to

BACKGROUND: You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the team and your management have requested that you use descriptive statistics and data visualization techniques to study distributions of key performance metrics that are included in the data set. These data-driven analytics will help make key decisions to improve the performance of the team. You will use the Python programming language to perform the statistical analyses and then prepare a report of your findings to present for the team's management. Since the managers are not data analysts, you will need to interpret your findings and describe their practical implications.

See Step 8 and 9 in the Python script to address the following items:

I have provided the entire script below the last time I asked for help my qustion was wasted with responses of needing more information but then I ask question again and that time provided all the info but I got no help. I just need help understanding confidence intervals. All the other information should be irrelevant .. IN STEP 8 I have added ******NEED HELP HERE*** to help you spot where I am asking for help. I also provided the output to step 8 of the python code below this.

STEP 8: What is the probability that a given team (Celtics) in the league has a relative skill level less than that of the team that you picked? Is it unusual that a team has a skill level less than your team?

Table 4. Confidence Interval for Average Relative Skill of Teams in Your Team's Years

Confidence Level (%) Confidence Interval

 95% ( 1502.02, 1507.18 )

Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 2013 to 2015 ---------------------------------------------------------------------------------------------------------------------------------------------------------- Which of the two choices is correct? **************** NEED HELP HERE ************** Choice 1 = 0.6639 Choice 2 = 0.336

STEP 9: Confidnece Interval of Bulls

Discuss how your interval would be different if you had used a different confidence level.

How does this confidence interval compare with the previous one? What does this signify in terms of the average relative skill of teams in the range of years that you picked versus the average relative skill of teams in the assigned team's range of years?

Confidence Level (%) Confidence Interval

 95% ( 126.44 , 131.69 )

PYTHON SCRIPT:

STEP: 1

import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt from IPython.display import display, HTML nba_orig_df = pd.read_csv('nbaallelo.csv') nba_orig_df = nba_orig_df[(nba_orig_df['lg_id']=='NBA') & (nba_orig_df['is_playoffs']==0)] columns_to_keep = ['game_id','year_id','fran_id','pts','opp_pts','elo_n','opp_elo_n', 'game_location', 'game_result'] nba_orig_df = nba_orig_df[columns_to_keep] # The dataframe for the assigned team is called assigned_team_df.  # The assigned team is the Chicago Bulls from 1996-1998. assigned_years_league_df = nba_orig_df[(nba_orig_df['year_id'].between(1996, 1998))] assigned_team_df = assigned_years_league_df[(assigned_years_league_df['fran_id']=='Bulls')] assigned_team_df = assigned_team_df.reset_index(drop=True) display(HTML(assigned_team_df.head().to_html())) print("printed only the first five observations...") print("Number of rows in the data set =", len(assigned_team_df))

STEP 2: Pick Your Team (Celtics)

# Range of years: 2013-2015 (Note: The line below selects ALL teams within the three-year period 2013-2015. This is not your team's dataframe. your_years_leagues_df = nba_orig_df[(nba_orig_df['year_id'].between(2013, 2015))] # The dataframe for your team is called your_team_df. # ---- TODO: make your edits here ---- your_team_df = your_years_leagues_df[(your_years_leagues_df['fran_id']=='Celtics')] your_team_df = your_team_df.reset_index(drop=True) display(HTML(your_team_df.head().to_html())) print("printed only the first five observations...") print("Number of rows in the data set =", len(your_team_df))

Step 3 - 4 : Chosen Data Visualization was Histogram for Celtics and Bulls

Points Scored By Celtics (Histogram)

import seaborn as sns # Histogram fig, ax = plt.subplots() plt.hist(your_team_df['pts'], bins=20) plt.title('Histogram of points scored by Your Team in 2013 to 2015', fontsize=18) ax.set_xlabel('Points') ax.set_ylabel('Frequency') plt.show() print("") # Scatterplot plt.title('Scatterplot of points scored by Your Team in 2013 to 2015', fontsize=18) sns.regplot(your_team_df['year_id'], your_team_df['pts'], ci=None) plt.show()

Points Scored by Bulls (Histogram)

import seaborn as sns # Histogram fig, ax = plt.subplots() plt.hist(assigned_team_df['pts'], bins=20) plt.title('Histogram of points scored by the Bulls in 1996 to 1998', fontsize=18) ax.set_xlabel('Points') ax.set_ylabel('Frequency') plt.show() # Scatterplot plt.title('Scatterplot of points scored by the Bulls in 1996 to 1998', fontsize=18) sns.regplot(assigned_team_df['year_id'], assigned_team_df['pts'], ci=None) plt.show()

Step 5 Comparing Both Teams (chosen data visualization BOXPLOT)

import seaborn as sns # Side-by-side boxplots both_teams_df = pd.concat((assigned_team_df, your_team_df)) plt.title('Boxplot to compare points distribution', fontsize=18) sns.boxplot(x='fran_id',y='pts',data=both_teams_df) plt.show() print("") # Histograms fig, ax = plt.subplots() plt.hist(assigned_team_df['pts'], 20, alpha=0.5, label='Assigned Team') plt.hist(your_team_df['pts'], 20, alpha=0.5, label='Your Team') plt.title('Histogram to compare points distribution', fontsize=18) plt.xlabel('Points') plt.legend(loc='upper right') plt.show()

Step 6: Descriptive Statistics: Relative Skill of Your Team

print("Your Team's Relative Skill in 2013 to 2015") print("-------------------------------------------------------") # ---- TODO: make your edits here ---- mean = your_team_df['elo_n'].sum() median = your_team_df['elo_n'].median() variance = your_team_df['elo_n'].var() stdeviation = your_team_df['elo_n'].std() print('Mean =', round(mean,2)) print('Median =', round(median,2)) print('Variance =', round(variance,2)) print('Standard Deviation =', round(stdeviation,2))

OUTPUT:

Your Team's Relative Skill in 2013 to 2015 ------------------------------------------------------- Mean = 356910.07 Median = 1451.99 Variance = 4246.4 Standard Deviation = 65.16

Step 7 - Descriptive Statistics - Relative Skill of the Assigned Team

# Write your code in this code block.  print("Assigned Team's Relative Skills in 2013 to 2015") print("-----------------------------------------------------------") mean = assigned_team_df['elo_n'].sum() median = assigned_team_df['elo_n'].median() variance = assigned_team_df['elo_n'].var() stdeviation = assigned_team_df['elo_n'].std() print('Mean =', round(mean,2)) print('Median =', round(median,2)) print('Variance =', round(variance,2)) print('Standard Deviation =', round(stdeviation,2))

OUTPUT:

Assigned Team's Relative Skills in 2013 to 2015 ----------------------------------------------------------- Mean = 427990.67 Median = 1751.23 Variance = 2651.55 Standard Deviation = 51.49

Step 8: Confidence Intervals for the Average Relative Skill of All Teams in Your Team's Years

print("Confidence Interval for Average Relative Skill in the years 2013 to 2015") print("------------------------------------------------------------------------------------------------------------") # Mean relative skill of all teams from the years 2013-2015 mean = your_years_leagues_df['elo_n'].mean() # Standard deviation of the relative skill of all teams from the years 2013-2015 stdev = your_years_leagues_df['elo_n'].std() n = len(your_years_leagues_df) #Confidence interval # ---- TODO: make your edits here ---- stderr = stdev/(n ** 0.5) conf_int_95 = st.norm.interval(0.95, mean, stderr) print("95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 2013 to 2015 =", conf_int_95) print("95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = (", round(conf_int_95[0], 2),",", round(conf_int_95[1], 2),")") print(" ") print("Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 2013 to 2015") print("----------------------------------------------------------------------------------------------------------------------------------------------------------") mean_elo_your_team = your_team_df['elo_n'].mean() choice1 = st.norm.sf(mean_elo_your_team, mean, stdev) choice2 = st.norm.cdf(mean_elo_your_team, mean, stdev) # Pick the correct answer. print("Which of the two choices is correct?") print("Choice 1 =", round(choice1,4)) print("Choice 2 =", round(choice2,4))

OUTPUT:

Confidence Interval for Average Relative Skill in the years 2013 to 2015 ------------------------------------------------------------------------------------------------------------ 95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = (1502.0236894390478, 1507.1824625533618) 95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 2013 to 2015 = ( 1502.02 , 1507.18 ) **************************** NEED HELP HERE *************************** Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 2013 to 2015 ---------------------------------------------------------------------------------------------------------------------------------------------------------- Which of the two choices is correct? **************** NEED HELP HERE ************** Choice 1 = 0.6639 Choice 2 = 0.336

Step 9 - Confidence Intervals for the Average Relative Skill of All Teams in the Assigned Team's Years

# Write your code in this code block section print("Confidence Intervals for the Average Relative Skill of All Teams in the Years 1996 to 1998") print("-----------------------------------------------------------------------------------------------------------") # Mean Relative Years of All Teams in years 1996 to 1998 mean = assigned_years_league_df['elo_n'].std() n = len(assigned_years_league_df) # Confidence Interval  stderr = stdev/(n ** 0.5) conf_int_95 = st.norm.interval(0.95, mean, stderr) print("95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 1996 to 1998 =", conf_int_95) print("95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 1996 to 1998 = (", round(conf_int_95[0], 2),",", round(conf_int_95[1], 2),")") print(" ") print("Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 1996 to 1998") print("----------------------------------------------------------------------------------------------------------------------------------------------------------") mean_elo_assigned_team = assigned_team_df['elo_n'].mean() answer1 = st.norm.cdf(mean_elo_assigned_team, mean, stdev) # Pick Correct Answer print("Answer = ", round(answer1,4))

OUTPUT:

Confidence Intervals for the Average Relative Skill of All Teams in the Years 1996 to 1998 ----------------------------------------------------------------------------------------------------------- 95% confidence interval (unrounded) for Average Relative Skill (ELO) in the years 1996 to 1998 = (126.4431245655568, 131.68937720511104) 95% confidence interval (rounded) for Average Relative Skill (ELO) in the years 1996 to 1998 = ( 126.44 , 131.69 ) Probability a team has Average Relative Skill LESS than the Average Relative Skill (ELO) of your team in the years 1996 to 1998 ---------------------------------------------------------------------------------------------------------------------------------------------------------- Answer = 1.0