There are some that must be met in order to apply the concepts of prediction we have been discussing. Cine is that using the prediction equation requires the underlying relationship to be . Therefore, if the scatterplot for an original correlation analysis is , this procedure would not be appropriate. A second assumption is that the dots in the original scatterplot will be dispersed equally about all segments of the prediction iine. This is known as . The urd assumption is that for any given value of X, the corresponding distribution of F values is distributed. The nal assumption is that the original data set of paired observations must be fairly large, usualIy in the hundreds. The square of the r2, indicates the proportion of the total variance in one variable that is predictable from its relationship with other variable. In order to understand I2 (the correlation coefficient squared}, think for a moment ab out variability in a single distribution, which we studied in Chapter 5. In a single variable, we described variability with the measure of the standard deviation. We studied the variahily graphically by looking at the shape of the frequency polygon. In a scatterplot, two variables are depicted graphically. Imagine a frequency polygon along the horizontal axis of the scatterplot. This shows the shape of the distribution for variable X. Imagine also a frequency polygon along the vertical axis of the scatterplot. This shows the shape of the distribution for variable Y. As we examine the relationship between the two variables, we know that some of the variability in Y must be due to the variability in X. Let's substihrte real data and think this through. 1"variable X represents SAT scores. 1ir'ariable Y represents college GPA. These variables constitute a strong positive relationship, approximately .51 There must be many reasons why GPA in college would vary. Some of that variability is probably due to preparation for college as reected by SAT scores. The remaining variability might be due to such factors as whether the student works, how the adjustment is made to ]iving away from home, and how many hours per week are devoted to studying or partying. The actual amount of variability hi college GPA, variable Y, which can be explained by the variability in X, SAT scores, is reected by the value of r2. Thus we compute .5? squared = .3249, or .32. We interpret this value by saying that 32 percent of the variability of Y can be explained by the variability in X. The remaining 153% of the variability of Y would be explained by a combination of many other factors or variables, probably some of those mentioned earlier. The value of r2 supplies a direct measure of the of the relationship