3 The dataset anscombe txt posted contains well known data sets made up by Anscombe (1973) It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories The dataset contains 4 covariate response pairs, for variables (x (1), Y (1)), (x (2), Y (2)), (x (3), Y (3)), and (x (4), Y (4)) The variables x (1) x (2) x (3), and their observations are given by the column x123 in the dataset The variable x (4) is given by the column x4 The responses Y (1), , Y (4) are given by the columns y1, , y4 in the dataset Answer the following questions without using lm() in R Calculate using formulas (a) Draw a scatter plot for each covariate response pair For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same x ranges (b) Find the sample correlation coefficient between each covariate response pair, then use two methods to find the R2 (There is a built in function in R for sample correlation coef ficient ) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R2 Does R2 tell you everything about the overall fit of the model What are other remarkable features about these covariate response pairs Discuss based on your graphs in (a) (c) For the pair (x (4), Y (4)), there is an outlier (in fact it is also an influential point) What happens if you remove the outlier and still try to fit a simple linear regression model Find the estimates for intercept and slope also find R2 3 The dataset anscombe txt posted contains well known data sets made up by Anscombe (1973) It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories The dataset contains 4 covariate response pairs, for variables (z(1),y()) (z(2),y(2)), (x3),y(3)), and (z(4),y() The variables (1) n(2) zo(3), and their observations are given by the column x123 in the dataset The variable (4) is given by the column x4 The responses Y(), ,Y() are given by the columns yl , y4 in the dataset Answer the following questions without using Im() in R Calculate using formulas (a) Draw a scatter plot for each covariate response pair For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same r ranges (b) Find the sample correlation coefficient between each covariate response pair, then use two methods to find the R (There is a built in function in R for sample correlation coef ficient ) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R Does R tell you everything about the overall fit of the model What are other remarkable features about these covariate response pairs Discuss based on your graphs in (a) (c) For the pair (4 Y(), there is an outlier (in fact it is also an influential point) What happens if you remove the outlier and still try to fit a simple linear regression model Find the estimates for intercept and slope also find R anscombe

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 11, 2024

3. The dataset anscombe.txt posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe

3. The dataset anscombe.txt posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories. The dataset contains 4 covariate-response pairs, for variables (x (1), Y (1)), (x (2), Y (2)), (x (3), Y (3)), and (x (4), Y (4)). The variables x (1) = x (2) = x (3), and their observations are given by the column x123 in the dataset. The variable x (4) is given by the column x4. The responses Y (1), . . . , Y (4) are given by the columns y1, ..., y4 in the dataset. Answer the following questions without using lm() in R. Calculate using formulas. (a) Draw a scatter plot for each covariate-response pair. For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot. For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same x ranges. (b) Find the sample correlation coefficient between each covariate-response pair, then use two methods to find the R2 . (There is a built-in function in R for sample correlation coef- ficient.) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R2 . Does R2 tell you everything about the overall fit of the model? What are other remarkable features about these covariate-response pairs? Discuss based on your graphs in (a). (c) For the pair (x (4), Y (4)), there is an outlier (in fact it is also an influential point). What happens if you remove the outlier and still try to fit a simple linear regression model? Find the estimates for intercept and slope; also find R2 .

image text in transcribed

3. The dataset "anscombe.txt" posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories. The dataset contains 4 covariate-response pairs, for variables (z(1),y()). (z(2),y(2)), (x3),y(3)), and (z(4),y(). The variables (1) = n(2) = zo(3), and their observations are given by the column "x123" in the dataset. The variable (4) is given by the column "x4". The responses Y(),...,Y() are given by the columns "yl", "y4" in the dataset. Answer the following questions without using Im() in R. Calculate using formulas. (a) Draw a scatter plot for each covariate response pair. For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot. For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same r ranges. (b) Find the sample correlation coefficient between each covariate response pair, then use two methods to find the R. (There is a built-in function in R for sample correlation coef- ficient.) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R. Does R tell you everything about the overall fit of the model? What are other remarkable features about these covariate response pairs? Discuss based on your graphs in (a). (c) For the pair (4.Y(), there is an outlier (in fact it is also an influential point). What happens if you remove the outlier and still try to fit a simple linear regression model? Find the estimates for intercept and slope: also find R. anscombe