Question
The goal of this lab is to build three different regression models to predict the number of wins of a Major League Baseball team. Use
The goal of this lab is to build three different regression models to predict the number of wins of a Major League Baseball team.
Use the following code to load in the Teams dataset from the Lahman database. Recall that you can query the help file for a data set by running ?Teams at the console.
1. Subset the Teams data set to only include years from 2000 to present day (this is the data set that you'll use for the remainder of this lab). What are the dimensions of this filtered data set?
2.Plot the distribution of wins. Describe the shape of the distribution and compare it to your speculations from part 1 of the lab.
3.. Plot the relationship between runs and wins. Describe the relationship (form, direction, strength of association, presence of outliers) and compare it to your speculations from part 1 of the lab.
4. Plot the relationship between runs allowed and wins. Describe the relationship. How does it compare to the relationship between runs and wins?
5. Fit a simple linear model to predict wins by runs and call it model_1. Write out the equation for the linear model (using the estimated coefficients) and interpret the R^2 in the context of the problem in at least one sentence.
6. What is the average number of season runs and wins? Based on the previous model, how many games would you predict a team that scored the average number of runs would win?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started