Question
In Questions 7 and 8, you'll look again at female heights fromGaltonFamilies. Definefemale_heights, a set of mother and daughter heights sampled fromGaltonFamilies, as follows: set.seed(1989)
In Questions 7 and 8, you'll look again at female heights fromGaltonFamilies.
Definefemale_heights, a set of mother and daughter heights sampled fromGaltonFamilies, as follows:
set.seed(1989) #if you are using R 3.5 or earlier
set.seed(1989, sample.kind="Rounding") #if you are using R 3.6 or later
library(HistData)
data("GaltonFamilies")
options(digits = 3) # report 3 significant digits
female_heights <- GaltonFamilies %>%
filter(gender == "female") %>%
group_by(family) %>%
sample_n(1) %>%
ungroup() %>%
select(mother, childHeight) %>%
rename(daughter = childHeight)
Question 7 - Fit a linear regression model predicting the mothers' heights using daughters' heights. What is the slope of the model? What the intercept of the model?
Question 8 - Predict mothers' heights using the model. What is the predicted height of the first mother in the dataset? What is the actual height of the first mother in the dataset?
We have shown how BB and singles have similar predictive power for scoring runs. Another way to compare the usefulness of these baseball metrics is by assessing how stable they are across the years.Because we have to pick players based on their previous performances, we will prefer metrics that are more stable. In these exercises, we will compare the stability of singles and BBs.
Before we get started, we want to generate two tables: one for 2002 and another for the average of 1999-2001 seasons. We want to define per plate appearance statistics, keeping only players with more than 100 plate appearances. Here is how we create the 2002 table:
library(Lahman) bat_02 <- Batting %>% filter(yearID == 2002) %>% mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>% filter(pa >= 100) %>% select(playerID, singles, bb)
Question 9 - Now compute a similar table but with rates computed over 1999-2001. Keep only rows from 1999-2001 where players have 100 or more plate appearances, calculate each player's single rate and BB rate per season, then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
How many players had a single ratemean_singlesof greater than 0.2 per plate appearance over 1999-2001?
How many players had a BB ratemean_bbof greater than 0.2 per plate appearance over 1999-2001?
Question 10 - Useinner_join()to combine thebat_02table with the table of 1999-2001 rate averages you created in the previous question. What is the correlation between 2002 singles rates and 1999-2001 average singles rates? What is the correlation between 2002 BB rates and 1999-2001 average BB rates?
Question 11 - Make scatterplots ofmean_singlesversussinglesandmean_bbversusbb. Are either of these distributions bivariate normal?
Neither distribution is bivariate normal.
singlesandmean_singlesare bivariate normal, butbbandmean_bbare not.
bbandmean_bbare bivariate normal, butsinglesandmean_singlesare not.
Both distributions are bivariate normal
Question 12 - Fit a linear model to predict 2002singlesgiven 1999-2001mean_singles. What is the coefficient ofmean_singles, the slope of the fit? Fit a linear model to predict 2002bbgiven 1999-2001mean_bb. What is the coefficient ofmean_bb, the slope of the fit?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started