7.3 Assessing the Fit of a Line with R-squared Learning Goal: Use the Coefficient of Determination, r2, to assess the fit of a linear model. Introduction: When we use a line to make predictions about a response variable, we consider only one other variable, the explanatory variable. For example, we used a line to predict Consumer Report ratings based on sugar content in cereals. But there may be other variables that also influence the Consumer Report ratings, such as fiber or protein. In this activity, we investigate a way to understand how much the explanatory variable (sugar) accounts for changes we see in the response variable (ratings). Specifically, r2 tells us what percent of the variability in the response is explained by the changes in the explanatory variable (i.e. the regression line). Example: Here are data from 77 breakfast cereals. 100 1. In the top graph we see a sideways dotplot of the Consumer Report ratings. The ratings have a fairly 80 symmetric distribution about the mean rating of 43, 60 - with an overall range of about 60 (minus the outlier). 40 - 20 - 9 We want to see if sugar content impacts the variability of the ratings about the mean of 43. 100 In the second graph we have added sugar as a second BO variable. But instead of making a scatterplot, we 60 grouped the cereals into bins. Notice how the ratings Rating 40 43 are above the mean when there is not much sugar, but 088 of below the mean with more sugar. Also, inside the 2nd, 20 3rd, and 4th bins, the variability in ratings is similar; the spread is about 20 rating points in each of these bins. 8-11 12-15 16-19 This suggests that sugar content explains some of the variation in ratings. Sugars (g/serving) The 3rd graph is the scatterplot. We see a fairly strong 100 7 correlation of r = -0.76. 80 The proportion of the total variation in ratings about Rating the mean that is explained by changes in sugar amount 40 -43 is (-0.76)2 = 0.58. In other words, 20 8 580 of the variation in ratings is explained by the sugar content. The other 420 of the variation in ratings is due to other variables, such as protein or fiber. 5 10 15 Sugars (a/serving)