Answered step by step
Verified Expert Solution
Question
1 Approved Answer
2. (14 points) In a simple linear regression, the r2 or squared correlation between :13 and y, is often used as a measure of how
2. (14 points) In a simple linear regression, the r2 or squared correlation between :13 and y, is often used as a measure of how well the linear model ts the data. That is, r2 measures how strong is the linear relationship between the response y and the single predictor variable 51;. It is natural to ask how we can generalize this concept to the setting of a regression on multiple predictor variables X1, . . . ,Xp. That is, we would like to ask, how strong is the linear relationship between the response y and the full set of predictors X1, . . . , Xp. The usual way that people generalize this concept is the multiple R2, which we dened in class as the variance of the tted values divided by the variance of the response: 0' '6)[0 R2: 0' \"t? We mentioned in lecture that it is often referred to as the fraction or percentage of \"variance explained by the regression,\" but we did not explain why people call it that. In fact, the multiple R2 is a popular measure of how well the regression ts the data, because it has several interpretations: 1. R2 is the correlation of the prediction vector Q with the response y, so if it is very high, the response lines up almost exactly with the predictions, but if it is very low, the response has almost no association with the predictions. This is shown in part (e). 2. 1 R2 is the variance of the residuals as a fraction of the original cry, so if it is large, the residuals are very small compared to the variation of y. This is shown in part (d). 3. In the special case of OLS regression with only a single predictor variable, we can still compute R2 and it is exactly the same as the squared correlation 7'2 between :1: and y. This is shown in optional part (f). For all parts of this question, you can freely assume that any vectors like y, e, X j, and so on are non-constant if we are calculating correlations (correlations with constant vectors are undened). (a) (3 points) We have been using the notation (If, and a: in class to denote the variance of vectors :1; and y, which are dened as the average squared deviation from the :E and g, respectively: 1 1 2__ '_2 2__ ._2 05,\" E, (33, :13), tryn E, (y, y). This is closely related to the idea of the variance of a random variable, which we will cover later in the course, but for this problem you do not need to think about random variables, we are only referring to this specic sum, which measures how \"spread out\" the coordinates of a vector are around the average. To get practice working with the variance, show that if 373- : a + bmi, that the mean is Q = a + bf and the variance is 2 _ 2 2 09 I) am
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started