Answered step by step
Verified Expert Solution
Question
1 Approved Answer
2 Last name: Type down your lastname here First name: Type down your firstname here Student ID: 0000000 Course section: STA302H1F-Summer 2017 Due Date: June
2 Last name: Type down your lastname here First name: Type down your firstname here Student ID: 0000000 Course section: STA302H1F-Summer 2017 Due Date: June 3, 2017, 23:00 Q1 (20 pts) - Correlation and SLR. Q1-(a) (6 pts): Find the correlation between percentage of field goals made and percentage of fields goals made in the previous year. Is this estimated correlation significant different from zero ? Explain how this result supports the claim in The New York Times article. Answer: a2 = read.csv("/Users/chauy/desktop/FieldGoals03to06.csv",header=T) #str(q2data) # check the type of each column (variable) in the data set #head(q2data,10) # have a look of the first 10 data lines # Write your R code in the following more . . . . Q1-(b) (8 pts): Carry out a simple linear regression using the variables percentage of fields goals made this year and percentage of field goals made in the previous year. Answer: List of table results 2 R slope, b1 estimate of 2 P-value for H0 : 0 = 0 P-value for H0 : 1 = 0 ? ? ? ? ? # code more . . . . Q1-(c) (6 pts): Give a 95% confidence interval for the slope of the regression line in Q1-(b). Explain how the confidence interval is consistent with the conclusions of Q1-(a) and Q1-(b). Answer: # code more . . . . 1 Q2 (5 pts) Conclusions from regression analysis are valid only if the right model was fit to the data. Why is the regression model fit in Q1-(b) not an appropriate model? In particular, you should consider how it violates the Gauss-Markov conditions. You do not need to look at plots of the residuals for this question. Instead comment on the Gauss- Markov conditions in the context of the data being considered. Answer: 2 Q3 (10 pts) Q3-(a): In 2003, Mike Vanderjagt had the highest percentage of field goals made (100%) and Jay Feely had the lowest percentage (70.3%). For each of these two players, carry out a regression to examine the relationship between the percentage of fields goals made in a year and the percentage of field goals made in the previous year. (Note that this is 2 regressions, each using only 4 data points.) What do you conclude ? Answer: Player Mike Vanderjagt Jay Feeley Estimate of slope (b1 ) p-value for test with H0 : 1 = 0 estimate 2 (b1 ) ? ? ? ? ? ? # Example: run a SLR for name="David Akers" DA =lm(FGt~FGtM1, data=a2[a2$Name=="Jay Feely",]) summary(DA) ## ## Call: ## lm(formula = FGt ~ FGtM1, data = a2[a2$Name == "Jay Feely", ]) ## ## Residuals: ## 17 18 19 20 ## -6.1994 -0.9046 6.3171 0.7869 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 97.9850 51.5890 1.899 0.198 ## FGtM1 -0.2686 0.6606 -0.407 0.724 ## ## Residual standard error: 6.316 on 2 degrees of freedom ## Multiple R-squared: 0.07634, Adjusted R-squared: -0.3855 ## F-statistic: 0.1653 on 1 and 2 DF, p-value: 0.7237 # For MV = Mike Vanderjagt # For JF = Jay Feely Q3-(b): We can test for a difference between the slopes of the regressions for Mike Vanderjagt and Jay Feely using a t-test, similar to the two-sample t-test for the difference between two means. We can estimate the difference in their slopes by b1,M V b1,JF where b1,M V and b1,JF are the estimated slopes for Mike Vanderjagt and Jay Feely, respectively. You also need to find an estimate of the standard deviation of b1,M V b1,JF . Under the regression model assumptions and assuming that there is no difference in the slopes, the estimate of the difference in slopes divided by the estimate of the standard deviation of the differences will have approximately a t- distribution with 2 degrees of freedom (using Satterthwaite's approximation). What do you conclude from this t-test ? (To estimate the p-value, you can use a t-table.) Answer: 3 Q4 (10 pts) R output from a multiple regression is given next page. This regression uses all the data, but fits 19 separate lines, one for each player. In this regression, the lines were forced to be parallel so the coefficient of FGtM1, the percentage of field goals made in the previous year, is the same for all players. Q4-(a): (5 points) Find the p-value for the test with null hypothesis that the coefficient of FGtM1 is equal to 0. What do you conclude about the relationship between field goals made this year and percentage of field goals made the previous year ? Answer: Q4-(b): (5 points) Explain, in words, why the test considered in part Q4-(a) is more powerful than the tests about the slopes considered in Q3-(a). Answer: 4 --title: Assignment 2 author: | | Last name: Type down your lastname here | First name: Type down your firstname here | Student ID: 0000000 | Course section: STA302H1F-Summer 2017 date: 'Due Date: June 3, 2017, 23:00' output: pdf_document header-includes: \\usepackage{color,amsmath} fontsize: 10pt --- # \\textcolor{red}{ Q1 (20 pts) - Correlation and SLR.} \\textcolor{blue}{Q1-(a) (6 pts): Find the correlation between percentage of field goals made and percentage of fields goals made in the previous year. Is this estimated correlation significant different from zero ? Explain how this result supports the claim in The New York Times article.} \\textbf{Answer:} ```{r,echo=T, eval=T,cache=T, message=F,warning=F} a2 = read.csv("/Users/chauy/desktop/FieldGoals03to06.csv",header=T) #str(q2data) # check the type of each column (variable) in the data set #head(q2data,10) # have a look of the first 10 data lines # Write your R code in the following ``` more .... \\textcolor{blue}{Q1-(b) (8 pts): Carry out a simple linear regression using the variables percentage of fields goals made this year and percentage of field goals made in the previous year.} \\textbf{Answer:} List of table | results :----------------------------------|:-----------------:| $R^2$ | ? slope, $b_1$ | ? estimate of $\\sigma^2$ | ? P-value for $H_0: \\beta_0=0$ |? P-value for $H_0: \\beta_1=0$ | ? ```{r,echo=T, eval=T,cache=T, message=F,warning=F} # code ``` more .... \\textcolor{blue}{Q1-(c) (6 pts): Give a 95\\% confidence interval for the slope of the regression line in Q1-(b). Explain how the confidence interval is consistent with the conclusions of Q1-(a) and Q1-(b).} \\textbf{Answer:} ```{r,echo=T, eval=T,cache=T, message=F,warning=F} # code ``` more .... \ ewpage # \\textcolor{red}{ Q2 (5 pts)} \\textcolor{blue}{ Conclusions from regression analysis are valid only if the right model was fit to the data. Why is the regression model fit in Q1-(b) not an appropriate model? In particular, you should consider how it violates the Gauss-Markov conditions. You do not need to look at plots of the residuals for this question. Instead comment on the Gauss- Markov conditions in the context of the data being considered. } \\textbf{Answer:} \ ewpage # \\textcolor{red}{ Q3 (10 pts)} \\textcolor{blue}{Q3-(a): In 2003, Mike Vanderjagt had the highest percentage of field goals made (100\\%) and Jay Feely had the lowest percentage (70.3\\%). For each of these two players, carry out a regression to examine the relationship between the percentage of fields goals made in a year and the percentage of field goals made in the previous year. (Note that this is 2 regressions, each using only 4 data points.) What do you conclude ? } \\textbf{Answer:} Player | Estimate of slope ($b_1$) | p-value for test with $H_0: \\beta_1=0$| estimate $\\sigma^2(b_1)$ :--------------|:-----------------:|:----------------:|:--------------:| Mike Vanderjagt | ? |? | ? Jay Feeley | ? | ? | ? ```{r,echo=T, eval=T,cache=T, message=F,warning=F} # Example: run a SLR for name="David Akers" DA =lm(FGt~FGtM1, data=a2[a2$Name=="Jay Feely",]) summary(DA) # For MV = Mike Vanderjagt # For JF = Jay Feely ``` \\textcolor{blue}{Q3-(b): We can test for a difference between the slopes of the regressions for Mike Vanderjagt and Jay Feely using a t-test, similar to the two-sample t-test for the difference between two means. We can estimate the difference in their slopes by $b_{1,MV} - b_{1,JF}$ where $b_{1,MV}$ and $b_{1,JF}$ are the estimated slopes for Mike Vanderjagt and Jay Feely, respectively. You also need to find an estimate of the standard deviation of $b_{1,MV} - b_{1,JF}$. Under the regression model assumptions and assuming that there is no difference in the slopes, the estimate of the difference in slopes divided by the estimate of the standard deviation of the differences will have approximately a t- distribution with 2 degrees of freedom (using Satterthwaite's approximation). What do you conclude from this t-test ? (To estimate the p-value, you can use a t-table.) } \\textbf{Answer:} \ ewpage # \\textcolor{red}{ Q4 (10 pts)} R output from a multiple regression is given next page. This regression uses all the data, but fits 19 separate lines, one for each player. In this regression, the lines were forced to be parallel so the coefficient of FGtM1, the percentage of field goals made in the previous year, is the same for all players. \\textcolor{blue}{ Q4-(a): (5 points) Find the p-value for the test with null hypothesis that the coefficient of FGtM1 is equal to 0. What do you conclude about the relationship between field goals made this year and percentage of field goals made the previous year ? } \\textbf{Answer:} \\textcolor{blue}{ Q4-(b): (5 points) Explain, in words, why the test considered in part Q4-(a) is more powerful than the tests about the slopes considered in Q3-(a). } \\textbf{Answer:} qattachments_ae8de82bc77855b49e17ba5b528fd9a4aa41163c Name Adam Vinatieri Adam Vinatieri Adam Vinatieri Adam Vinatieri David Akers David Akers David Akers David Akers Jason Elam Jason Elam Jason Elam Jason Elam Jason Hanson Jason Hanson Jason Hanson Jason Hanson Jay Feely Jay Feely Jay Feely Jay Feely Jeff Reed Jeff Reed Jeff Reed Jeff Reed Jeff Wilkins Jeff Wilkins Jeff Wilkins Jeff Wilkins John Carney John Carney John Carney John Carney John Hall John Hall John Hall John Hall Kris Brown Kris Brown Kris Brown Kris Brown Matt Stover Matt Stover Matt Stover Matt Stover Mike Vanderjagt Mike Vanderjagt Mike Vanderjagt Mike Vanderjagt Neil Rackers Neil Rackers Neil Rackers Neil Rackers Yeart Teamt FGAt FGt Team(t-1) FGAtM1 FGtM1 FGAtM2 FGtM2 2003 NE 34 73.5 NE 30 90 0 0 2004 NE 33 93.9 NE 34 73.5 30 90 2005 NE 25 80 NE 33 93.9 34 73.5 2006 IND 19 89.4 NE 25 80 33 93.9 2003 PHI 29 82.7 PHI 34 88.2 0 0 2004 PHI 32 84.3 PHI 29 82.7 34 88.2 2005 PHI 22 72.7 PHI 32 84.3 29 82.7 2006 PHI 12 83.3 PHI 22 72.7 32 84.3 2003 DEN 31 87 DEN 36 72.2 0 0 2004 DEN 34 85.2 DEN 31 87 36 72.2 2005 DEN 32 75 DEN 34 85.2 31 87 2006 DEN 15 86.6 DEN 32 75 34 85.2 2003 DET 23 95.6 DET 28 82.1 0 0 2004 DET 28 85.7 DET 23 95.6 28 82.1 2005 DET 24 79.1 DET 28 85.7 23 95.6 2006 DET 17 82.3 DET 24 79.1 28 85.7 2003 ATL 27 70.3 ATL 40 80 0 0 2004 ATL 23 78.2 ATL 27 70.3 40 80 2005 NYG 42 83.3 ATL 23 78.2 27 70.3 2006 NYG 17 76.4 NYG 42 83.3 23 78.2 2003 PIT 32 71.8 PIT 19 89.4 0 0 2004 PIT 33 84.8 PIT 32 71.8 19 89.4 2005 PIT 29 82.7 PIT 33 84.8 32 71.8 2006 PIT 16 68.7 PIT 29 82.7 33 84.8 2003 STL 42 92.8 STL 25 76 0 0 2004 STL 24 79.1 STL 42 92.8 25 76 2005 STL 31 87 STL 24 79.1 42 92.8 2006 STL 26 88.4 STL 31 87 24 79.1 2003 NO 30 73.3 NO 35 88.5 0 0 2004 NO 27 81.4 NO 30 73.3 35 88.5 2005 NO 32 78.1 NO 27 81.4 30 73.3 2006 NO 17 88.2 NO 32 78.1 27 81.4 2003 WAS 33 75.7 NYJ 31 77.4 0 0 2004 WAS 11 72.7 WAS 33 75.7 31 77.4 2005 WAS 14 85.7 WAS 11 72.7 33 75.7 2006 WAS 11 81.8 WAS 14 85.7 11 72.7 2003 HOU 22 81.8 HOU 24 70.8 0 0 2004 HOU 24 70.8 HOU 22 81.8 24 70.8 2005 HOU 34 76.4 HOU 24 70.8 22 81.8 2006 HOU 15 73.3 HOU 34 76.4 24 70.8 2003 BAL 38 86.8 BAL 25 84 0 0 2004 BAL 32 90.6 BAL 38 86.8 25 84 2005 BAL 34 88.2 BAL 32 90.6 38 86.8 2006 BAL 16 100 BAL 34 88.2 32 90.6 2003 IND 37 100 IND 31 74.1 0 0 2004 IND 25 80 IND 37 100 31 74.1 2005 IND 25 92 IND 25 80 37 100 2006 DAL 15 80 IND 25 92 25 80 2003 ARZ 12 75 CIN 18 83.3 0 0 2004 ARZ 29 75.8 ARZ 12 75 18 83.3 2005 ARZ 42 95.2 ARZ 29 75.8 12 75 2006 ARZ 19 68.4 ARZ 42 95.2 29 75.8 Page 1 qattachments_ae8de82bc77855b49e17ba5b528fd9a4aa41163c Olindo Mare Olindo Mare Olindo Mare Olindo Mare Phil Dawson Phil Dawson Phil Dawson Phil Dawson Rian Lindell Rian Lindell Rian Lindell Rian Lindell Ryan Longwell Ryan Longwell Ryan Longwell Ryan Longwell Sebastian Janikowski Sebastian Janikowski Sebastian Janikowski Sebastian Janikowski Shayne Graham Shayne Graham Shayne Graham Shayne Graham 2003 MIA 2004 MIA 2005 MIA 2006 MIA 2003 CLE 2004 CLE 2005 CLE 2006 CLE 2003 BUF 2004 BUF 2005 BUF 2006 BUF 2003 GB 2004 GB 2005 GB 2006 MIN 2003 OAK 2004 OAK 2005 OAK 2006 OAK 2003 CIN 2004 CIN 2005 CIN 2006 CIN 29 16 30 22 21 29 29 17 24 28 35 16 26 28 27 18 25 28 30 13 25 31 32 19 75.8 MIA 75 MIA 83.3 MIA 63.6 MIA 85.7 CLE 82.7 CLE 93.1 CLE 88.2 CLE 70.8 SEA 85.7 BUF 82.8 BUF 87.5 BUF 88.4 GB 85.7 GB 74 GB 83.3 GB 88 OAK 89.2 OAK 66.6 OAK 84.6 OAK 88 CAR 87 CIN 87.5 CIN 84.2 CIN Page 2 31 29 16 30 28 21 29 29 29 24 28 35 34 26 28 27 33 25 28 30 18 25 31 32 77.4 75.8 75 83.3 78.5 85.7 82.7 93.1 79.3 70.8 85.7 82.8 82.3 88.4 85.7 74 78.7 88 89.2 66.6 72.2 88 87 87.5 0 31 29 16 0 28 21 29 0 29 24 28 0 34 26 28 0 33 25 28 0 18 25 31 0 77.4 75.8 75 0 78.5 85.7 82.7 0 79.3 70.8 85.7 0 82.3 88.4 85.7 0 78.7 88 89.2 0 72.2 88 87
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started