Hitters data set contains information on Major League Baseball from the 1986 and 198:! seasons. Among other information, it contains 1983? annual salary of baseball players [in thousands of dollars) on opening dayr of the season. Our goal is the predict salaries of the players. This data set is available from the lSLR package. 1. Remove the observations with unknown salary information. How many observations were removed in this process? 2. Generate logtransform the salaries. Can you justifythis transformation? 3. Create a scatterplot with Hits on the yaxis and Years on the xoris using all the observations. Color code the observations using the log Salary variable. What patterns do you notice on this chart, if any? 4. Run a linear regression model of Log Salary on all the predictors using the entire dataset. Use regsubsetsO function to perform best subset selection tom the regression model. Identify the best model using BIC. 1Which predictor variables are included in this {best} model? 5. Now create a training data set consisting of So percent of the observations, and a test data set consisting of the remaining observations. 6. Generate a regression tree of log Salary using only Years and Hits variables from the training data set. 1Which players are likely to receive highest salaries according to this model? Write down the rule and elaborate on it. 3?. Now create a regession tree using all the variables in the training data set. Perform boosting on the training set with 1,000 trees for a range of values of the shrinkage parameter it. Produce a plot with different shrinkage values on the K- anis and the corresponding training set MSE on the yaxis. 8. Produce a plot with different shrinkage values on the Kariis and the corresponding test set MSE on the yaxis. 9. 1Which variables appear to be the most important predictors in the boosted model? 10. Now apply bagging to the training set. 1What is the test set MSE for this approach