Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Activity Solution: Multiple Linear Regression Consider three dierent datasets (labelled A, B, C), each of which consists of n = 25 measurements of (Y, X1

Activity Solution: Multiple Linear Regression Consider three dierent datasets (labelled A, B, C), each of which consists of n = 25 measurements of (Y, X1 , X2 ), depicted as follows: 1 2 1. Clickers question: Think about how the bivariate scatterplots would appear in each case, that is, plots of Y against X1 and Y against X2 . (a) AD, BE, CF. (b) AE, BD, CF. (c) AF, BD, CE. (d) AF, BE, CD. (e) AE, BF, CD. 2. Clickers question: Give some thought to how useful it is to look at (Y, X1 ) and (Y, X2 ) separately when trying to understand the relationship between Y and (X1 , X2 ), and how easy or hard it is to visualize relationships amongst three variables. (a) Viewing the Y versus X1 scatterplot and the Y versus X2 scatterplot will generally give full understanding of the relationship between Y and (X1 , X2 ). (b) It is easy to visualize the relationship between Y and (X1 , X2 ). (c) Both (a) and (b). 3 (d) Neither (a) nor (b). 3. Clickers question: A multiple linear regression model has the form Y = 0 + 1 X1 + 2 X2 + ... + p Xp + where the error term (or \"noise\") is thought of as a random variable which is independent of (X1 , . . . , Xp ), has mean zero, and variance 2 . With p = 1 explanatory variable, this reverts back to simple linear regression met earlier. In usual regression parlance, Y might be referred to as the response variable, or the outcome variable, while (X1 , . . . , Xp ) might be referred to as the explanatory variables or the predictors. Let us focus on the situation with p = 2 explanatory variables, so the data, which are measurements of (Y, X1 , X2 ) on n study units, can be written as (yi , x1i , x2i ), for i = 1, . . . , n. Your task is to estimate the coecients ( 0 , 1 , 2 ) in the multiple linear regression model. Knowing what you do about single linear regression (i.e., the p = 1 case), discuss amongst your group how the coecients might be estimated. Which of the following makes most sense: take 0 , 1 , 2 to be the values of ( 0 , 1 , 2 ) which makes B: C: n i=1 (yi n i=1 (yi n i=1 A: [ 0 + 1 x1i ])2 + (yi [ 0 + 2 x2i ])2 [ 0 + 1 x1i + 2 x2i ])2 2 yi [ 0 + 1 x1i + 2 x2i ]2 as small as possible. 4. As a quick example, take the datasets shown in plots B and D above. Part of the R output when tting the multiple linear regression model is as follows. lm(formula = y ~ x1 + x2) Coefficients: Estimate Std. Error t value (Intercept) 0.1125249 0.0161816 6.954 x1 -0.0004883 0.0865046 -0.006 x2 0.7935812 0.0928211 8.550 4 Pr(>|t|) 5.57e-07 *** 0.996 1.93e-08 *** It is worth taking a moment to think about the estimated coecients in relation to the plots. In particular, are you surprised that 1 is very close to zero, yet the scatterplot of Y on X1 shows a strong positive association? A simple linear regression model with just X1 as the predictor variable would show that variable as having signicant eect on the response Y, yet in the presence of X2 the eect of X1 is relatively weak. That is, X1 explains very little of the variation in Y not accounted for by X2 . 5. Clickers question: Now we focus on how we can interpret the regression coecients when there is more than one explanatory variable. Suppose that Y is systolic blood pressure (in units of mm Hg), X1 is height (in cm), and X2 is weight (in kg). We are going to record (Y, X1 , X2 ) on a random sample of n adults, and t a multiple linear regression of the form Y = 0 + 1 X1 + 2 X2 + to the data. What is a good interpretation to go with our estimate of 2 ? (a) We estimate that average SBP increases by 2 units for every one kg increase in weight. (b) We estimate that for people of average weight, the average SBP increases by 2 units for every one cm increase in height. (c) We estimate that for people of average height, the average SBP increases by 2 units for every one kg increase in weight. (d) We estimate that for people of common height, the aver age SBP increases by 2 units for every one kg increase in weight. 6. Clickers question: As another scenario, for a given city you have data on daily temperature (TMP), daily level of air pollution (POL), and daily number of Emergency Room visits for asthmarelated illness (ERV). Also, say that ERV and TMP are positively associated (so if you regress ERV on TMP, the 95% condence interval for the slope parameter is completely above zero), and similarly ERV and POL are positively associated. You nd that POL and TMP are positively associated as well so the variables are all positively associated with each other. You t a multiple linear regression model of the form ERV = 0 + 1 P OL + 2 T M P + . 5 You obtain a 95% condence interval for 1 with a lower endpoint that is positive, while the 95% condence interval for 2 crosses zero, and is quite narrow. Discuss amongst your group how to interpret these ndings. (a) You now feel it is more plausible that higher air pollution causes more ER visits. (b) You now feel it is less plausible that higher temperature causes more ER visits. (c) Both of (a) and (b). (d) Neither (a) nor (b). (e) You now feel it is more plausible that neither higher air pollution nor higher temperature cause more ER visits. 7. What have you done in completing this activity? Why were you asked to do this? What have you learned? We initially explored the diculty in identifying relationships between a response variable and two explanatory variables via scatterplots. The idea of simple linear regression encountered earlier where there is just a single predictor variable X is extended in a natural fashion to cases where there are p explanatory variables X1 , . . . , Xp . Two examples with p = 2 were considered, where the response is positively correlated with both the predictor variables. It was recognized that in both cases one predictor variable was in eect redundant, adding practically nothing to the model when the other variable is included. BD 6

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probability and Random Processes With Applications to Signal Processing and Communications

Authors: Scott Miller, Donald Childers

2nd edition

123869811, 978-0121726515, 121726517, 978-0130200716, 978-0123869814

More Books

Students also viewed these Mathematics questions

Question

What are the factors that influence make or buy decisions ?

Answered: 1 week ago