Merrick Stevens is a sports analyst working for ACE Sports Management, a sports agency that represents over

Question:

Merrick Stevens is a sports analyst working for ACE Sports Management, a sports agency that represents over 200 athletes. Merrick is tasked with analyzing sports-related data and developing a predictive model for the National Basketball Association (NBA). He uses the NBA data set that contains information on 30 competing NBA teams and 455 players. The player statistics are for several seasons as well as for their career. Because a player’s salary is based on his performance over multiple seasons, Merrick decides to only look at the career regular season data rather than data of a particular season.

Given the large number of predictor variables that may explain a player’s salary, Merrick decides to investigate whether principal component analysis may be advantageous as a first step in model building.


An NBA player’s salary is determined by a wide range of variables. Prior to constructing a model that can be used to predict an NBA player’s salary, data are collected for 23 possible predictor variables for 455 NBA players. These variables include physicality variables, such as a player’s age, height, and weight, as well as performance variables, such as the number of games played, the number of baskets made, the number of three pointers made, and so forth. 

High correlations between many of the predictor variables suggest that information redundancy exists in the data. In order to eliminate potential multicollinearity problems and improve model stability in the resulting salary model, dimension reduction using principal component analysis (PCA) is performed on the 23 predictor variables.

Prior to performing the PCA analysis, the data are standardized in order to remove the impact of data scales. The PCA analysis is not restricted with respect to the number of principle components to produce; thus, given that the analysis used 23 predictor variables, 23 principal components are estimated. Table 8.18 shows a portion of the PCA output with respect to weights and variances. The first principal component accounts for 44.1218% of the total variance, and Points (Average points per game) has the largest weight for the first principal component. Referencing the Cumulative Variance % row in Table 8.18, it is found that the first seven principal components account for almost 90% of the total variance in the original data.  

TABLE 8.18 Principal Component Weights and Variances 


In future analyses, it is advisable to use the first seven principal components as the predictor variables for building models that predict an NBA player’s salary. A portion of the principal component scores for the first seven principal components are displayed in Table 8.19. By using the seven principle components instead of the original 23 predictor variables, information redundancy has been removed. In addition, a large number of highly correlated predictor variables have been replaced with a smaller set of uncorrelated principal components that retain at least 90% of the information in the original data.

TABLE 8.19 Principal Component Scores

 

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Business Analytics Communicating With Numbers

ISBN: 9781260785005

1st Edition

Authors: Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, Leida Chen

Question Posted: