Question
3. The dataset anscombe.txt posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe
3. The dataset anscombe.txt posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories. The dataset contains 4 covariate-response pairs, for variables (x (1), Y (1)), (x (2), Y (2)), (x (3), Y (3)), and (x (4), Y (4)). The variables x (1) = x (2) = x (3), and their observations are given by the column x123 in the dataset. The variable x (4) is given by the column x4. The responses Y (1), . . . , Y (4) are given by the columns y1, ..., y4 in the dataset. Answer the following questions without using lm() in R. Calculate using formulas. (a) Draw a scatter plot for each covariate-response pair. For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot. For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same x ranges. (b) Find the sample correlation coefficient between each covariate-response pair, then use two methods to find the R2 . (There is a built-in function in R for sample correlation coef- ficient.) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R2 . Does R2 tell you everything about the overall fit of the model? What are other remarkable features about these covariate-response pairs? Discuss based on your graphs in (a). (c) For the pair (x (4), Y (4)), there is an outlier (in fact it is also an influential point). What happens if you remove the outlier and still try to fit a simple linear regression model? Find the estimates for intercept and slope; also find R2 .
3. The dataset "anscombe.txt" posted contains well-known data sets made up by Anscombe (1973). It illustrates that summary statistics, and fitted regression models, cannot describe all features about the data, while graphs often reveal untold stories. The dataset contains 4 covariate-response pairs, for variables (z(1),y()). (z(2),y(2)), (x3),y(3)), and (z(4),y(). The variables (1) = n(2) = zo(3), and their observations are given by the column "x123" in the dataset. The variable (4) is given by the column "x4". The responses Y(),...,Y() are given by the columns "yl", "y4" in the dataset. Answer the following questions without using Im() in R. Calculate using formulas. (a) Draw a scatter plot for each covariate response pair. For each pair, build a simple linear regression model by estimating the intercept and slope and add the line to the corresponding scatter plot. For the purpose of comparison, make sure you define for all 4 plots the same y ranges, and the same r ranges. (b) Find the sample correlation coefficient between each covariate response pair, then use two methods to find the R. (There is a built-in function in R for sample correlation coef- ficient.) Compare the 4 fitted models in terms of the estimated intercepts and slopes and the R. Does R tell you everything about the overall fit of the model? What are other remarkable features about these covariate response pairs? Discuss based on your graphs in (a). (c) For the pair (4.Y(), there is an outlier (in fact it is also an influential point). What happens if you remove the outlier and still try to fit a simple linear regression model? Find the estimates for intercept and slope: also find R. anscombe
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started