Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1 2. Data was collected where a weightlifter was asked to do as many repetitions as possible using different amounts of weight. Below is a table that shows how much weight was on the bar, and how many repetitions the weightlifter could do: Weight Reps 200 42 300 27 400 12 500 3 a. Calculate the correlation for this data. What does this value tell you about the relationship between these two variables? b. Determine the least squares regression line for this data. Interpret the values for the y-intercept and the slope within this scenario. c. Calculate r2 for this data and describe what it represents. d. Using the regression line from part (b), calculate the predicted number of repetitions for this weight lifter if the weight is 400 pounds, and then calculate and interpret the residual for that weight using the data. 3. Given the linear regression equation: y = 1.6 + 3.5x1 - 7.9x2 + 2.0x3 a. Which variable is the response variable? How many explanatory variables are there? b. If x1 = 2, x2 = 1 and x3 = 5, what is the predicted value for y? c. Supposed the n = 12 data points were used to construct the given regression equation above, and that the standard error for the coefficient x1 is 0.419. Construct a 90% confidence interval for the coefficient of x1. d. Using the information from part (c) and 5% level of significance, test the claim that the coefficient of x1 is different from 0. What does your conclusion mean in relation to x1 predicting y? 4. Suppose a researcher is analyzing the relationship between gender and favorite type of movie out of drama, science fiction and comedy. Here is the data using a random sample: Drama Science Copyright2011 Cengage Learning. All Rights Reserved Comedy Total 1 Problem Set Male Female Total Fiction 152 102 254 28 213 241 218 189 407 398 504 902 Test whether gender and type of favorite movie are independent at the .05 level of significance. Show all five steps of this test. 5. Suppose you wanted to test whether M&M's made the same amount of each color. You could run a goodness of fit test to see if each color had the same proportion. Suppose you took a sample of M&M's and below is the breakdown by color: Color Blue OBSERVED Counts 15 Orange 14 Green 10 Yellow 11 Red 4 Brown 6 Total 60 Test whether each color has the same proportion. Show all five steps of this test at the 10% level of significance. Copyright2011 Cengage Learning. All Rights Reserved 2 Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil Japan Korea Mexico Singapore Spain United Kingdom United States Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for materials in your areas of interest. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This book is dedicated to the memory of a great teacher, mathematician, and friend Burton W. Jones Professor Emeritus, University of Colorado Understandable Statistics: Concepts and Methods, Tenth Edition Charles Henry Brase, Corrinne Pellillo Brase Editor in Chief: Michelle Julet Publisher: Richard Stratton Senior Sponsoring Editor: Molly Taylor Senior Editorial Assistant: Shaylin Walsh Media Editor: Andrew Coppola Marketing Manager: Ashley Pickering Marketing Communications Manager: Mary Anne Payumo Content Project Manager: Jill Clark 2012, 2009, 2006 Brooks/Cole, Cengage Learning ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher. For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706 For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be emailed to permissionrequest@cengage.com. Art Director: Linda Helcher Senior Manufacturing Buyer: Diane Gibbons Senior Rights Acquisition Specialist, Text: Katie Huha Rights Acquisition Specialist, Images: Mandy Groszko Text Permissions Editor: Sue Howard Production Service: Elm Street Publishing Services Library of Congress Control Number: 2009942998 Student Edition: ISBN-13: 978-0-8400-4838-7 ISBN-10: 0-8400-4838-6 Annotated Instructor's Edition: ISBN-13: 978-0-8400-5456-2 ISBN-10: 0-8400-5456-4 Cover Designer: RHDG Cover Image: Anup Shah Compositor: Integra Software Services, Ltd. Pvt. Brooks/Cole 20 Channel Center Street Boston, MA 02210 USA Cengage Learning is a leading provider of customized learning solutions with ofce locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil and Japan. Locate your local ofce at international.cengage.com/region Cengage Learning products are represented in Canada by Nelson Education, Ltd. For your course and learning solutions, visit www.cengage.com. Purchase any of our products at your local college store or at our preferred online store www.cengagebrain.com. Printed in the United States of America 1 2 3 4 5 6 7 14 13 12 11 10 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9 9.1 Scatter Diagrams and Linear Correlation 9.2 Linear Regression and the Coefcient of Determination 9.3 Inferences for Correlation and Regression Portrait of Rene Descartes c. 1649 (after) Frans Hals/Louvre/ Giraudon/ The Bridgeman Art Library Asia Images Group/Getty Images 9.4 Multiple Regression When it is not in our power to determine what is true, we ought to follow what is most probable. REN DESCARTES It is important to realize that statistics and probability do not deal in the realm of certainty. If there is any realm of human knowledge where genuine certainty exists, you may be sure that our statistical methods are not needed there. In most human endeavors, and in almost all of the natural world around us, the element of chance happenings cannot be avoided. When we cannot expect something with true certainty, we must rely on probability to be our guide. In this chapter, we will study regression, correlation, and forecasting. One of the tools we use is a scatter plot. Ren Decartes (1596-1650) was the rst mathematician to systematically use rectangular coordinate plots. For this reason, such a coordinate axis is called a Cartesian axis. For online student resources, visit the Brase/Brase, Understandable Statistics, 10th edition web site at http://www.cengage.com/statistics/brase 500 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. P R EVI EW QU ESTIONS How can you use a scatter diagram to visually estimate the degree of linear correlation of two random variables? (SECTION 9.1) David Young-Wolff/PhotoEdit Correlation and Regression How do you compute the correlation coefcient and what does it tell you about the strength of the linear relationship between two random variables? (SECTION 9.1) What is the least-squares criterion? How do you nd the equation of the least-squares line? (SECTION 9.2) What is the coefcient of determination and what does it tell you about explained variation of y in a random sample of data pairs (x, y)? (SECTION 9.2) How do you determine if the sample correlation coefcient is statistically signicant? (SECTION 9.3) How do you nd a condence interval for predictions based on the least-squares model? (SECTION 9.3) How do you test the slope b of the population least-squares line? How do you construct a condence interval for b? (SECTION 9.3) What if you have more than two random variables? How do you construct a linear regression model for three, four, or more random variables? (SECTION 9.4) Changing Populations and Crime Rate Is the crime rate higher in neighborhoods where people might not know each other very well? Is there a relationship between crime rate and population change? If so, can we make predictions based on such a relationship? Is the relationship statistically signicant? Is it possible to predict crime rates from population changes? Denver is a city that has had a lot of growth and consequently a lot of population change in recent years. Sociologists studying population changes and crime rates could nd a wealth of information in Denver statistics. Let x be a random variable representing percentage change in neighborhood population in the past few years, and let y be a random variable representing crime rate (crimes per 1000 population). A random sample of iofoto, 2010/Used under license from Shutterstock.com FOCUS PROBLEM 501 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 502 Chapter 9 CORRELATION AND REGRESSION six Denver neighborhoods gave the following information (Source: Neighborhood Facts, The Piton Foundation). To nd out more about the Piton Foundation, visit the Brase/Brase statistics site at http://www.cengage.com/statistics/brase and nd the link to the Piton Foundation. x 29 2 11 17 7 6 y 173 35 132 127 69 53 Using information presented in this chapter, you will be able to analyze the relationship between the variables x and y using the following tools. Scatter diagram Sample correlation coefcient and coefcient of determination Least-squares line equation Predictions for y using the least-squares line Tests of population correlation coefcient and of slope of least-squares line Condence intervals for slope and for predictions (See Problem 10 in the Chapter Review Problems.) SECTION 9.1 Scatter Diagrams and Linear Correlation FOCUS POINTS Make a scatter diagram. Visually estimate the location of the \"best-tting\" line for a scatter diagram. Use sample data to compute the sample correlation coefcient r. Investigate the meaning of the correlation coefcient r. Studies of correlation and regression of two variables usually begin with a graph of paired data values (x, y). We call such a graph a scatter diagram. Paired data values A scatter diagram is a graph in which data pairs (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. We call x the explanatory variable and y the response variable. Scatter diagram Explanatory variable Response variable By looking at a scatter diagram of data pairs, you can observe whether there seems to be a linear relationship between the x and y values. EX AM P LE 1 Scatter diagram Phosphorous is a chemical used in many household and industrial cleaning compounds. Unfortunately, phosphorous tends to nd its way into surface water, where it can kill sh, plants, and other wetland creatures. Phosphorous-reduction programs are required by law and are monitored by the Environmental Protection Agency (EPA) (Reference: EPA Case Study 832-R-93-005). A random sample of eight sites in a California wetlands study gave the following information about phosphorous reduction in drainage water. In this study, x is a random variable that represents phosphorous concentration (in 100 mg/l) at the inlet of a passive biotreatment facility, and y is a random Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 503 Scatter Diagrams and Linear Correlation FIGURE 9-1 At outlet y Phosphorous Reduction (100 mg/l) 7 6 5 4 3 5 6 7 8 At inlet x variable that represents total phosphorous concentration (in 100 mg/l) at the outlet of the passive biotreatment facility. x 5.2 7.3 6.7 5.9 6.1 8.3 5.5 7.0 y 3.3 5.9 4.8 4.5 4.0 7.1 3.6 6.1 (a) Make a scatter diagram for these data. SOLUTION: Figure 9-1 shows points corresponding to the given data pairs. Cindy Kassab/Corbis Edge/Corbis These plotted points constitute the scatter diagram. To make the diagram, rst scan the data and decide on an appropriate scale for each axis. Figure 9-1 shows the scatter diagram (points) along with a line segment showing the basic trend. Notice a \"jump scale\" on both axes. (b) Interpretation Comment on the relationship between x and y shown in Figure 9-1. SOLUTION: By inspecting the gure, we see that smaller values of x are associated with smaller values of y and larger values of x tend to be associated with larger values of y. Roughly speaking, the general trend seems to be reasonably well represented by an upward-sloping line segment, as shown in the diagram. Introduction to linear correlation No linear correlation Of course, it is possible to draw many curves close to the points in Figure 9-1, but a straight line is the simplest and most widely used in elementary studies of paired data. We can draw many lines in Figure 9-1, but in some sense, the \"best\" line should be the one that comes closest to each of the points of the scatter diagram. To single out one line as the \"best-tting line,\" we must nd a mathematical criterion for this line and a formula representing the line. This will be done in Section 9.2 using the method of least squares. Another problem precedes that of nding the \"best-tting line.\" That is the problem of determining how well the points of the scatter diagram are suited for tting any line. Certainly, if the points are a very poor t to any line, there is little use in trying to nd the \"best\" line. If the points of a scatter diagram are located so that no line is realistically a \"good\" t, we then say that the points possess no linear correlation. We see some examples of scatter diagrams for which there is no linear correlation in Figure 9-2. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 504 Chapter 9 CORRELATION AND REGRESSION FIGURE 9-2 Scatter Diagrams with No Linear Correlation (a) GUIDED EXERCISE 1 Scatter diagram A large industrial plant has seven divisions that do the same type of work. A safety inspector visits each division of 20 workers quarterly. The number x of work-hours devoted to safety training and the number y of work-hours lost due to industry-related accidents are recorded for each separate division in Table 9-1. TABLE 9-1 Safety Report Division x y 1 10.0 80 2 19.5 65 3 30.0 68 4 45.0 55 5 50.0 35 6 65.0 10 7 80.0 12 (a) Make a scatter diagram for these pairs. Place the x values on the horizontal axis and the y values on the vertical axis. FIGURE 9-3 Scatter Diagram for Safety Report (b) As the number of hours spent on safety training increases, what happens to the number of hours lost due to industry-related accidents? In general, as the number of hours in safety training goes up, the number of hours lost due to accidents goes down. (c) Interpretation Does a line t the data reasonably well? A line ts reasonably well. (d) Draw a line that you think \"ts best.\" Use a downward-sloping line that lies close to the points. Later, you will nd the equation of the line that is a \"best t.\" Perfect linear correlation Positive correlation If the points seem close to a straight line, we say the linear correlation is moderate to high, depending on how close the points lie to the line. If all the points do, in fact, lie on a line, then we have perfect linear correlation. In Figure 9-4, we see some diagrams with perfect linear correlation. In statistical applications, perfect linear correlation almost never occurs. The variables x and y are said to have positive correlation if low values of x are associated with low values of y and high values of x are associated with high Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 Scatter Diagrams and Linear Correlation 505 FIGURE 9-4 Scatter Diagrams with Moderate and Perfect Linear Correlation Moderate linear correlation (a) Scatter diagrams for the same data look different from one another when they are graphed using different scales. Problem 19 at the end of this section explores how changing scales affect the look of a scatter diagram. (b) Perfect linear correlation (c) (d) values of y. Figure 9-4 parts (a) and (c) show scatter diagrams in which the variables are positively correlated. On the other hand, if low values of x are associated with high values of y and high values of x are associated with low values of y, the variables are said to be negatively correlated. Figure 9-4 parts (b) and (d) show variables that are negatively correlated. Negative correlation Scatter diagram and linear correlation GUIDED EXERCISE 2 Examine the scatter diagrams in Figure 9-5 and then answer the following questions. FIGURE 9-5 Scatter Diagrams (a) (b) (c) (a) Which diagram has no linear correlation? Figure 9-5(c) has no linear correlation. No straightline t should be attempted. (b) Which has perfect linear correlation? Figure 9-5(a) has perfect linear correlation and can be tted exactly by a straight line. (c) Which can be reasonably tted by a straight line? Figure 9-5(b) can be reasonably tted by a straight line. TE C H N OTE S The TI-84Plus/TI-83Plus/TI-nspire calculators, Excel 2007, and Minitab all produce scatter plots. For each technology, enter the x values in one column and the corresponding y values in another column. The displays on the next page show the data Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 506 Chapter 9 CORRELATION AND REGRESSION from Guided Exercise 1 regarding safety training and hours lost because of accidents. Notice that the scatter plots do not necessarily show the origin. TI-84Plus/TI-83Plus/TI-nspire (with TI-84Plus keypad) Enter the data into two columns. Use Stat Plot and choose the rst type. Use option 9: ZoomStat under Zoom. To check the scale, look at the settings displayed under Window. Excel 2007 Enter the data into two columns. On the home screen, click the Insert tab. In the Chart Group, select Scatter and choose the rst type. In the next ribbon, the Chart Layout Group offers options for including titles and axes labels. Right clicking on data points provides other options such as data labels. Changing the size of the diagram box changes the scale on the axes. Minitab Enter the data into two columns. Use the menu selections Stat Regression Fitted Line Plot. The best-t line is automatically plotted on the scatter diagram. TI-84Plus/TI-83Plus/TI-nspire Display Excel Display Hours Lost 100 Minitab Display Safety Report 80 60 40 20 0 20 40 60 80 Hours in Training 100 Sample Correlation Coefcient r Sample correlation coefcient r Looking at a scatter diagram to see whether a line best describes the relationship between the values of data pairs is useful. In fact, whenever you are looking for a relationship between two variables, making a scatter diagram is a good rst step. There is a mathematical measurement that describes the strength of the linear association between two variables. This measure is the sample correlation coefcient r. The full name for r is the Pearson product-moment correlation coefcient, named in honor of the English statistician Karl Pearson (1857-1936), who is credited with formulating r. The sample correlation coefcient r is a numerical measurement that assesses the strength of a linear relationship between two variables x and y. 1. r is a unitless measurement between 1 and 1. In symbols, 1 r 1. If r 1, there is perfect positive linear correlation. If r 1, there is perfect negative linear correlation. If r 0, there is no linear correlation. The closer r is to 1 or 1, the better a line describes the relationship between the two variables x and y. 2. Positive values of r imply that as x increases, y tends to increase. Negative values of r imply that as x increases, y tends to decrease. 3. The value of r is the same regardless of which variable is the explanatory variable and which is the response variable. In other words, the value of r is the same for the pairs (x, y) and the corresponding pairs (y, x). 4. The value of r does not change when either variable is converted to different units. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 Scatter Diagrams and Linear Correlation 507 We'll develop the dening formula for r and then give a more convenient computation formula. Development of Formula for r If there is a positive linear relation between variables x and y, then high values of x are paired with high values of y, and low values of x are paired with low values of y. [See Figure 9-6(a).] In the case of negative linear correlation, high values of x are paired with low values of y, and low values of x are paired with high values of y. This relation is pictured in Figure 9-6(b). If there is little or no linear correlation between x and y, however, then we will nd both high and low x values sometimes paired with high y values and sometimes paired with low y values. This relation is shown in Figure 9-6(c). These observations lead us to the development of the formula for the sample correlation coefcient r. Taking high to mean \"above the mean,\" we can express the relationships pictured in Figure 9-6 by considering the products (x x)(y y) If both x and y are high, both factors will be positive, and the product will be positive as well. The sign of this product will depend on the relative values of x and y compared with their respective means. (x x)(y y) e is positive if x and y are both \"high\" is positive if x and y are both \"low\" is negative if x is \"low,\" but y is \"high\" is negative if x is \"high,\" but y is \"low\" In the case of positive linear correlation, most of the products (x x)(y y) will be positive, and so will the sum over all the data pairs (x x)(y y) For negative linear correlation, the products will tend to be negative, so the sum also will be negative. On the other hand, in the case of little, if any, linear correlation, the sum will tend to be zero. One trouble with the preceding sum is that it increases or decreases, depending on the units of x and y. Because we want r to be unitless, we standardize both x and y of a data pair by dividing each factor (x x) by the sample standard deviation sx and each factor (y y) by sy. Finally, we take an average of all the FIGURE 9-6 Patterns for Linear Correlation Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 508 Chapter 9 CORRELATION AND REGRESSION products. For technical reasons, we take the average by dividing by n 1 instead of by n. This process leads us to the desired measurement, r. r (x x) (y y) 1 # n1 sx sy (1) Computation Formula for r The dening formula for r shows how the mean and standard deviation of each variable in the data pair enter into the formulation of r. However, the dening formula is technically difcult to work with because of all the subtractions and products. A computation formula for r uses the raw data values of x and y directly. P ROCEDU R E HOW TO COMPUTE THE SAMPLE CORRELATION COEFFICIENT r Requirements Obtain a random sample of n data pairs (x, y). The data pairs should have a bivariate normal distribution. This means that for a xed value of x, the y values should have a normal distribution (or at least a mound-shaped and symmetric distribution), and for a xed y, the x values should have their own (approximately) normal distribution. Procedure 1. Using the data pairs, compute x, y, x2, y2, and xy. When x and y values of the data pairs are exchanged, the sample correlation coefcient r remains the same. Problem 20 explores this result. 2. With n sample size, x, y, x2, y2, and xy, you are ready to compute the sample correlation coefcient r using the computation formula r 2nx (x)2 2ny2 (y)2 nxy (x)(y) 2 (2) Be careful! The notation x2 means rst square x and then calculate the sum, whereas (x)2 means rst sum the x values and then square the result. Interpretation It can be shown mathematically that r is always a number between 1 and 1 (1 r 1). Table 9-2 gives a quick summary of some basic facts about r. For most applications, you will use a calculator or computer software to compute r directly. However, to build some familiarity with the structure of the sample correlation coefcient, it is useful to do some calculations for yourself. Example 2 and Guided Exercise 3 show how to use the computation formula to compute r. David Muench/Terra/CORBIS EX AM P LE 2 Computing r Sand driven by wind creates large, beautiful dunes at the Great Sand Dunes National Monument, Colorado. Of course, the same natural forces also create large dunes in the Great Sahara and Arabia. Is there a linear correlation between wind velocity and sand drift rate? Let x be a random variable representing wind velocity (in 10 cm/sec) and let y be a random variable representing drift rate of sand (in 100 gm/cm/sec). A test site at the Great Sand Dunes National Monument gave the following information about x and y (Reference: Hydrologic, Geologic, and Biologic Research at Great Sand Dunes National Monument, Proceedings of the National Park Service Research Symposium). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 TABLE 9-2 509 Scatter Diagrams and Linear Correlation Some Facts about the Correlation Coefcient If r Is Then The Scatter Diagram Might Look Something Like 0 There is no linear relation among the points of the scatter diagram. 1 or 1 There is a perfect linear relation between x and y values; all points lie on the least-squares line. r 1 r1 Between 0 and 1 (0 6 r 6 1) The x and y values have a positive correlation. By this, we mean that large x values are associated with large y values, and small x values are associated with small y values. As we go from left to right, the least-squares line goes up. Between 1 and 0 (1 6 r 6 0) The x and y values have a negative correlation. By this, we mean that large x values are associated with small y values, and small x values are associated with large y values. As we go from left to right, the least-squares line goes down. x 70 115 105 82 93 125 88 y 3 45 21 7 16 62 12 (a) Construct a scatter diagram. Do you expect r to be positive? SOLUTION: Figure 9-7 displays the scatter diagram. From the scatter diagram, FIGURE 9-7 Wind Velocity (10 cm/sec) and Drift Rate of Sand (100 gm/cm/sec) Drift rate y it appears that as x values increase, y values also tend to increase. Therefore, r should be positive. 60 50 40 30 20 10 0 70 80 90 100 110 120 130 Wind speed x Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 510 Chapter 9 CORRELATION AND REGRESSION TABLE 9-3 Computation Table x x2 y y2 xy 70 3 4900 9 210 115 45 13,225 2025 5175 105 21 11,025 441 2205 82 7 6724 49 574 93 16 8649 256 1488 125 62 15,625 3844 7750 88 12 7744 144 1056 x 678 y 166 y 2 6768 x y 18,458 x 2 67,892 (b) Compute r using the computation formula (formula 2). SOLUTION: To nd r, we need to compute x, x2, y, y2, and xy. It is convenient to organize the data in a table of ve columns (Table 9-3) and then sum the entries in each column. Of course, many calculators give these sums directly. Using the computation formula for r, the sums from Table 9-3, and n 7, we have r 2nx (x)2 2ny2 (y)2 nxy (x)(y) 27(67,892) (678) 27(6768) (166) (2) 2 7(18,458) (678)(166) 2 2 16,658 0.949 (124.74)(140.78) Note: Using a calculator to compute r directly gives 0.949 to three places after the decimal. (c) Interpretation What does the value of r tell you? SOLUTION: Since r is very close to 1, we have an indication of a strong positive linear correlation between wind velocity and drift rate of sand. In other words, we expect that higher wind speeds tend to mean greater drift rates. Because r is so close to 1, the association between the variables appears to be linear. Because it is quite a task to compute r for even seven data pairs, the use of columns as in Example 2 is extremely helpful. Your value for r should always be between 1 and 1, inclusive. Use a scatter diagram to get a rough idea of the value of r. If your computed value of r is outside the allowable range, or if it disagrees quite a bit with the scatter diagram, recheck your calculations. Be sure you distinguish between expressions such as (x2) and (x)2. Negligible rounding errors may occur, depending on how you (or your calculator) round. GUIDED EXERCISE 3 Computing r In one of the Boston city parks, there has been a problem with muggings in the summer months. A police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police ofcers on duty in the park and y represents the number of reported muggings on that day. x 10 15 16 1 4 6 18 12 14 7 y 5 2 1 9 7 8 1 5 3 6 Continued Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 GUIDED EXERCISE 3 511 Scatter Diagrams and Linear Correlation continued (a) Construct a scatter diagram of x and y values. Figure 9-8 shows the scatter diagram. Muggings FIGURE 9-8 Scatter Diagram for Number of Police Ofcers versus Number of Muggings y 8 6 4 2 2 4 6 8 10 12 14 16 18 x Police officers (b) From the scatter diagram, do you think the computed value of r will be positive, negative, or zero? Explain. r will be negative. The general trend is that large x values are associated with small y values, and vice versa. From left to right, the least-squares line goes down. (c) Complete TABLE 9-4. TABLE 9-5 Completion of Table 9-4 x y x2 y2 xy 10 5 100 25 50 15 2 225 4 30 16 1 256 1 16 1 9 1 81 9 4 7 16 49 28 6 8 _____ _____ _____ 18 1 _____ _____ _____ 12 5 _____ _____ _____ 14 3 _____ _____ _____ 36 42 xy ___ 7 6 49 x 103 y 47 x2 ___ (x)2 ___ y2 ___ x x2 y y2 xy 48 6 8 36 64 18 1 324 1 18 12 14 5 3 144 196 25 9 x2 1347 y2 295 60 42 xy 343 (x)2 10,609 (y)2 2209 (y)2 ___ (d) Compute r. Alternatively, nd the value of r directly by using a calculator or computer software. r (e) Interpretation What does the value of r tell you about the relationship between the number of police ofcers and the number of muggings in the park? 2nx (x)2 2ny2 (y)2 nxy (x)(y) 210(1347) (103)2 210(295) (47)2 2 10(343) (103)(47) 1411 0.969 (53.49)(27.22) There is a strong negative linear relationship between the number of police ofcers and the number of muggings. It seems that the more ofcers there are in the park, the fewer the number of muggings. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 512 Chapter 9 CORRELATION AND REGRESSION LO O K I N G F O R WA R D If the scatter diagram and the value of the sample correlation coefcient r indicate a linear relationship between the data pairs, how do we nd a suitable linear equation for the data? This process, called linear regression, is presented in the next section, Section 9.2. TE C H N OTE S Most calculators that support two-variable statistics provide the value of the sample correlation coefcient r directly. Statistical software provides r, r2, or both. TI-84Plus/TI-83Plus/TI-nspire (with TI-84Plus keypad) First use CATALOG, nd DiagnosticOn, and press Enter twice. Then, when you use STAT, CALC, option 8:LinReg(a bx), the value of r will be given (data from Example 2). In the next section, we will discuss the line y a bx and the meaning of r2. Excel 2007 Excel gives the value of the sample correlation coefcient r in several outputs. One way to nd the value of r is to click the Insert function fx . Then in the dialogue box, select Statistical for the category and Correl for the function. Minitab Use the menu selection Stat Basic Statistics Correlation. LO O K I N G F O R WA R D When the data are ranks (without ties) instead of measurements, the Pearson product-moment correlation coefcient can be reduced to a simpler equation called the Spearman rank correlation coefcient. This coefcient is used with nonparametric methods and is discussed in Section 11.3. CR ITICAL TH I N KI NG Sample correlation compared to population correlation Cautions about Correlation The correlation coefficient can be thought of as a measure of how well a linear model fits the data points on a scatter diagram. The closer r is to 1 or 1, the better a line \"fits\" the data. Values of r close to 0 indicate a poor fit to any line. Usually a scatter diagram does not contain all possible data points that could be gathered. Most scatter diagrams represent only a random sample of data pairs taken from a very large population of all possible pairs. Because r is computed on the basis of a random sample of (x, y) pairs, we expect the values of r to vary from one sample to the next (much as the sample mean x varies from sample to sample). This brings up the question of the signicance of r. Or, put another way, what are the chances that our random sample of data pairs indicates a high correlation when, in fact, the population's x and y values are not so strongly correlated? Right now, let's just say that the signicance of r is a separate issue that will be treated in Section 9.3, where we test the population correlation coefcient r (Greek letter rho, pronounced \"row\"). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 Problem 21 demonstrates an informal process for determining whether or not r is signicant. Problem 22 explores the effect of sample size on the signicance of r. Problem 24 uses the value of r between two dependent variables to nd the mean and standard deviation of a linear combination of the two variables. Extrapolation Causation Lurking variables Scatter Diagrams and Linear Correlation 513 r sample correlation coefcient computed from a random sample of (x, y) data pairs. r population correlation coefcient computed from all population data pairs (x, y). There is a less formal way to address the signicance of r using a table of \"critical values\" or \"cut-off values\" based on the r distribution and the number of data pairs. Problem 21 at the end of this section discusses this method. The value of the sample correlation coefcient r and the strength of the linear relationship between variables is computed based on the sample data. The situation may change for measurements larger than or smaller than the data values included in the sample. For instance, for infants, there may be a high positive correlation between age in months and weight. However, that correlation might not apply for people ages 20 to 30 years. The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. The fact that two variables tend to increase or decrease together does not mean that a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to other (either known or unknown) variables. Such variables are called lurking variables. In ordered pairs (x, y), x is called the explanatory variable and y is called the response variable. When r indicates a linear correlation between x and y, changes in values of y tend to respond to changes in values of x according to a linear model. A lurking variable is a variable that is neither an explanatory nor a response variable. Yet, a lurking variable may be responsible for changes in both x and y. EX AM P LE 3 Causation and lurking variables Over a period of years, the population of a certain town increased. It was observed that during this period the correlation between x, the number of people attending church, and y, the number of people in the city jail, was r 0.90. Does going to church cause people to go to jail? Is there a lurking variable that might cause both variables x and y to increase? SOLUTION: We hope church attendance does not cause people to go to jail! During this period, there was an increase in population. Therefore, it is not too surprising that both the number of people attending church and the number of people in jail increased. The high correlation between x and y is likely due to the lurking variable of population increase. Correlation between averages Problem 23 at the end of this section explores the correlation of averages. The correlation between two variables consisting of averages is usually higher than the correlation between two variables representing corresponding raw data. One reason is that the use of averages reduces the variation that exists between individual measurements (see Section 6.5 and the central limit theorem). A high correlation based on two variables consisting of averages does not necessarily imply a high correlation between two variables consisting of individual measurements. See Problem 23 at the end of this section. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 514 Chapter 9 CORRELATION AND REGRESSION VI EWPOI NT Low on Credit, High on Cost!!! How do you measure automobile insurance risk? One way is to use a little statistics and customer credit ratings. Insurers say statistics show that drivers who have a history of bad credit are more likely to be in serious car accidents. According to a high-level executive at Allstate Insurance Company, nancial instability is an extremely powerful predictor of future insurance losses. In short, there seems to be a strong correlation between bad credit ratings and auto insurance claims. Consequently, insurance companies want to charge higher premiums to customers with bad credit ratings. Consumer advocates object strongly because they say bad credit does not cause automobile accidents, and more than 20 states prohibit or restrict the use of credit ratings to determine auto insurance premiums. Insurance companies respond by saying that your best defense is to pay your bills on time! SECTION 9.1 P ROB LEM S Note: Answers may vary due to rounding. 1. Statistical Literacy When drawing a scatter diagram, along which axis is the explanatory variable placed? Along which axis is the response variable placed? 2. Statistical Literacy Suppose two variables are positively correlated. Does the response variable increase or decrease as the explanatory variable increases? 3. Statistical Literacy Suppose two variables are negatively correlated. Does the response variable increase or decrease as the explanatory variable increases? 4. Statistical Literacy Describe the relationship between two variables when the correlation coefcient r is (a) near 1. (b) near 0. (c) near 1. 5. Critical Thinking: Linear Correlation Look at the following diagrams. Does each diagram show high linear correlation, moderate or low linear correlation, or no linear correlation? (a) 6. Critical Thinking: Linear Correlation Look at the following diagrams. Does each diagram show high linear correlation, moderate or low linear correlation, or no linear correlation? (a) 7. Critical Thinking: Lurking Variables Over the past few years, there has been a strong positive correlation between the annual consumption of diet soda drinks and the number of trafc accidents. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 515 Scatter Diagrams and Linear Correlation (a) Do you think increasing consumption of diet soda drinks causes trafc accidents? Explain. (b) What lurking variables might be causing the increase in one or both of the variables? Explain. 8. Critical Thinking: Lurking Variables Over the past decade, there has been a strong positive correlation between teacher salaries and prescription drug costs. (a) Do you think paying teachers more causes prescription drugs to cost more? Explain. (b) What lurking variables might be causing the increase in one or both of the variables? Explain. 9. Critical Thinking: Lurking Variables Over the past 50 years, there has been a strong negative correlation between average annual income and the record time to run 1 mile. In other words, average annual incomes have been rising while the record time to run 1 mile has been decreasing. (a) Do you think increasing incomes cause decreasing times to run the mile? Explain. (b) What lurking variables might be causing the increase in one or both of the variables? Explain. 10. Critical Thinking: Lurking Variables Over the past 30 years in the United States, there has been a strong negative correlation between the number of infant deaths at birth and the number of people over age 65. (a) Is the fact that people are living longer causing a decrease in infant mortalities at birth? (b) What lurking variables might be causing the increase in one or both of the variables? Explain. 11. Interpretation Trevor conducted a study and found that the correlation between the price of a gallon of gasoline and gasoline consumption has a linear correlation coefcient of 0.7. What does this result say about the relationship between price of gasoline and consumption? The study included gasoline prices ranging from $2.70 to $5.30 per gallon. Is it reliable to apply the results of this study to prices of gasoline higher than $5.30 per gallon? Explain. 12. Interpretation Do people who spend more time on social networking sites spend more time using Twitter? Megan conducted a study and found that the correlation between the times spent on the two activities was 0.8. What does this result say about the relationship between times spent on the two activities? If someone spends more time than average on a social networking site, can you automatically conclude that he or she spends more time than average using Twitter? Explain. 13. Veterinary Science: Shetland Ponies How much should a healthy Shetland pony weigh? Let x be the age of the pony (in months), and let y be the average weight of the pony (in kilograms). The following information is based on data taken from The Merck Veterinary Manual (a reference used in most veterinary colleges). x 3 6 12 18 24 y 60 95 140 170 185 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x 63, x2 1089, y 650, y2 95,350, and xy 9930. Compute r. As x increases from 3 to 24 months, does the value of r imply that y should tend to increase or decrease? Explain. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 516 Chapter 9 CORRELATION AND REGRESSION 14. Health Insurance: Administrative Cost The following data are based on information from Domestic Affairs. Let x be the average number of employees in a group health insurance plan, and let y be the average administrative cost as a percentage of claims. x 3 7 15 35 75 y 40 35 30 25 18 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x 135, x2 7133, y 148, y2 4674, and xy 3040. Compute r. As x increases from 3 to 75, does the value of r imply that y should tend to increase or decrease? Explain. 15. Meteorology: Cyclones Can a low barometer reading be used to predict maximum wind speed of an approaching tropical cyclone? Data for this problem are based on information taken from Weatherwise (Vol. 46, No. 1), a publication of the American Meteorological Society. For a random sample of tropical cyclones, let x be the lowest pressure (in millibars) as a cyclone approaches, and let y be the maximum wind speed (in miles per hour) of the cyclone. x 1004 975 992 935 985 932 y 40 100 65 145 80 150 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x 5823, x2 5,655,779, y 580, y2 65,750, and xy 556,315. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain. 16. Geology: Earthquakes Is the magnitude of an earthquake related to the depth below the surface at which the quake occurs? Let x be the magnitude of an earthquake (on the Richter scale), and let y be the depth (in kilometers) of the quake below the surface at the epicenter. The following is based on information taken from the National Earthquake Information Service of the U.S. Geological Survey. Additional data may be found by visiting the Brase/Brase statistics site at http://www.cengage.com/statistics/brase and nding the link to Earthquakes. x 2.9 4.2 3.3 4.5 2.6 3.2 3.4 y 5.0 10.0 11.2 10.0 7.9 3.9 5.5 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or strong? positive or negative? (c) Use a calculator to verify that x 24.1, x2 85.75, y 53.5, y2 458.31, and xy 190.18. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain. 17. Baseball: Batting Averages and Home Runs In baseball, is there a linear correlation between batting average and home run percentage? Let x represent the batting average of a professional baseball player, and let y represent the player's home run percentage (number of home runs per 100 times at bat). A random sample of n 7 professional baseball players gave the following information (Reference: The Baseball Encyclopedia, Macmillan Publishing Company). Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 517 Scatter Diagrams and Linear Correlation x 0.243 0.259 0.286 0.263 0.268 0.339 0.299 y 1.4 3.6 5.5 3.8 3.5 7.3 5.0 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or high? positive or negative? (c) Use a calculator to verify that x 1.957, x2 0.553, y 30.1, y2 150.15, and xy 8.753. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain. 18. University Crime: FBI Report Do larger universities tend to have more property crime? University crime statistics are affected by a variety of factors. The surrounding community, accessibility given to outside visitors, and many other factors inuence crime rates. Let x be a variable that represents student enrollment (in thousands) on a university campus, and let y be a variable that represents the number of burglaries in a year on the university campus. A random sample of n 8 universities in California gave the following information about enrollments and annual burglary incidents (Reference: Crime in the United States, Federal Bureau of Investigation). x 12.5 30.0 24.5 14.3 y 26 73 39 23 7.5 15 27.7 16.2 20.1 30 15 25 (a) Make a scatter diagram and draw the line you think best ts the data. (b) Would you say the correlation is low, moderate, or high? positive or negative? (c) Using a calculator, verify that x 152.8, x2 3350.98, y 246, y2 10,030, and xy 5488.4. Compute r. As x increases, does the value of r imply that y should tend to increase or decrease? Explain. 19. Expand Your Knowledge: Effect of Scale on Scatter Diagram The initial visual impact of a scatter diagram depends on the scales used on the x and y axes. Consider the following data: x 1 2 3 4 5 6 y 1 4 6 3 6 7 (a) Make a scatter diagram using the same scale on both the x and y axes (i.e., make sure the unit lengths on the two axes are equal). (b) Make a scatter diagram using a scale on the y axis that is twice as long as that on the x axis. (c) Make a scatter diagram using a scale on the y axis that is half as long as that on the x axis. (d) On each of the three graphs, draw the straight line that you think best ts the data points. How do the slopes (or directions) of the three lines appear to change? Note: The actual slopes will be the same; they just appear different because of the choice of scale factors. 20. Expand Your Knowledge: Effect on r of Exchanging x and y Values Examine the computation formula for r, the sample correlation coefcient [formulas (1) and (2) of this section]. (a) In the formula for r, if we exchange the symbols x and y, do we get a different result or do we get the same (equivalent) result? Explain. (b) If we have a set of x and y data values and we exchange corresponding x and y values to get a new data set, should the sample correlation coefcient be the same for both sets of data? Explain. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 518 Chapter 9 CORRELATION AND REGRESSION (c) Compute the sample correlation coefcient r for each of the following data sets and show that r is the same for both. x 1 3 4 x 2 1 6 y 2 1 6 y 1 3 4 21. Expand Your Knowledge: Using a Table to Test r The correlation coefcient r is a sample statistic. What does it tell us about the value of the population correlation coefcient r (Greek letter rho)? We will build the formal structure of hypothesis tests of r in Section 9.3. However, there is a quick way to determine if the sample evidence based on r is strong enough to conclude that there is some population correlation between the variables. In other words, we can use the value of r to determine if r 0. We do this by comparing the value 0r 0 to an entry in Table 9-6. The value of a in the table gives us the probability of concluding that r 0 when, in fact, r 0 and there is no population correlation. We have two choices for a: a 0.05 or a 0.01. P ROCEDU R E HOW TO USE TABLE 9-6 TO TEST r 1. First compute r from a random sample of n data pairs (x, y). 2. Find the table entry in the row headed by n and the column headed by your choice of a. Your choice of a is the risk you are willing to take of mistakenly concluding that r 0 when, in fact, r 0. 3. Compare 0r 0 to the table entry. (a) If 0r 0 table entry, then there is sufcient evidence to conclude that r 0, and we say that r is signicant. In other words, we conclude that there is some population correlation between the two variables x and y. (b) If 0r 0 6 table entry, then the evidence is insufcient to conclude that r 0, and we say that r is not signicant. We do not have enough evidence to conclude that there is any correlation between the two variables x and y. TABLE 9-6 Critical Values for Correlation Coefcient r a 0.05 a 0.01 n a 0.05 a 0.01 n 3 1.00 1.00 13 0.53 0.68 23 0.41 0.53 4 0.95 0.99 14 0.53 0.66 24 0.40 0.52 5 0.88 0.96 15 0.51 0.64 25 0.40 0.51 6 0.81 0.92 16 0.50 0.61 26 0.39 0.50 7 0.75 0.87 17 0.48 0.61 27 0.38 0.49 8 0.71 0.83 18 0.47 0.59 28 0.37 0.48 9 0.67 0.80 19 0.46 0.58 29 0.37 0.47 10 0.63 0.76 20 0.44 0.56 30 0.36 0.46 11 0.60 0.73 21 0.43 0.55 12 0.58 0.71 22 0.42 0.54 n a 0.05 a 0.01 (a) Look at Problem 13 regarding the variables x age of a Shetland pony and y weight of that pony. Is the value of 0r 0 large enough to conclude that weight and age of Shetland ponies are correlated? Use a 0.05. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 9.1 519 Scatter Diagrams and Linear Correlation (b) Look at Problem 15 regarding the variables x lowest barometric pressure as a cyclone approaches and y maximum wind speed of the cyclone. Is the value of 0r 0 large enough to conclude that lowest barometric pressure and wind speed of a cyclone are correlated? Use a 0.01. 22. Expand Your Knowledge: Sample Size and Signicance of Correlation In this problem, we use Table 9-6 to explore the signicance of r based on different sample sizes. See Problem 21. (a) Is a sample correlation coefcient r 0.820 signicant at the a 0.01 level based on a sample size of n 7 data pairs? What about n 9 data pairs? (b) Is a sample correlation coefcient r 0.40 signicant at the a 0.05 level based on a sample size of n 20 data pairs? What about n 27 data pairs? (c) Is it true that in order to be signicant, an r value must be larger than 0.90? larger than 0.70? larger than 0.50? What does sample size have to do with the signicance of r? Explain. 23. Expand Your Knowledge: Correlation of Averages Fuming because you are stuck in trafc? Roadway congestion is a costly item, in both time wasted and fuel wasted. Let x represent the average annual hours per person spent in trafc delays and let y represent the average annual gallons of fuel wasted per person in trafc delays. A random sample of eight cities showed the following data (Reference: Statistical Abstract of the United States, 122nd Edition). x (hr) 28 5 20 35 20 23 18 5 y (gal) 48 3 34 55 34 38 28 9 (a) Draw a scatter diagram for the data. Verify that x 154, x2 3712, y 249, y2 9959, and xy 6067. Compute r. The data in part (a) represent average annual hours lost per person and average annual gallons of fuel wasted per person in trafc delays. Suppose that instead of using average data for different cities, you selected one person at random from each city and measured the annual number of hours lost x for that person and the annual gallons of fuel wasted y for the same person. x (hr) 20 4 18 42 15 25 2 35 y (gal) 60 8 12 50 21 30 4 70 (b) Compute x and y for both sets of data pairs and compare the averages. Compute the sample standard deviations sx and sy for both sets of data pairs and compare the standard deviations. In which set are the standard deviations for x and y larger? Look at the dening formula for r, Equation 1. Why do smaller standard deviations sx and sy tend to increase the value of r? (c) Make a scatter diagram for the second set of data pairs. Verify that x 161, x2 4583, y 255, y2 12,565, and xy 7071. Compute r. (d) Compare r from part (a) with r from part (c). Do the data for averages have a higher correlation coefcient than the data for individual measurements? List some reasons why you think hours lost per individual and fuel wasted per individual might vary more than the same quantities averaged over all the people in a city. 24. Expand Your Knowledge: Dependent Variables In Section 5.1, we studied linear combinations of independent random variables. What happens if the variables are not independent? A lot of mathematics can be used to prove the following: Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 520 Chapter 9 CORRELATION AND REGRESSION Let x and y be random variables with means mx and my, variances s2 and s2, x y and population correlation coefcient r (the Greek letter rho). Let a and b be any constants and let w ax by. Then, mw amx bmy s2 a2s2 b2s2 2absxsyr w x y In this formula, r is the population correlation coefcient, theoretically computed using the population of all (x, y) data pairs. The expression sxsyr is called the covariance of x and y. If x and y are independent, then r 0 and the formula for s2 reduces to the appropriate formula for independent variables (see w Section 5.1). In most real-world applications, the population parameters are not known, so we use sample estimates with the understanding that our conclusions are also estimates. Do you have to be rich to invest in bonds and real estate? No, mutual fund shares are available to you even if you aren't rich. Let x represent annual percentage return (after expenses) on the Vanguard Total Bond Index Fund, and let y represent annual percentage return on the Fidelity Real Estate Investment Fund. Over a long period of time, we have the following population estimates (based on Morningstar Mutual Fund Report). Covariance mx 7.32 sx 6.59 my 13.19 sy 18.56 r 0.424 (a) Do you think the variables x and y are independent? Explain. (b) Suppose you decide to put 60% of your investment in bonds and 40% in real estate. This means you will use a weighted average w 0.6x 0.4y. Estimate your expected percentage return mw and risk sw. (c) Repeat part (b) if w 0.4x 0.6y. (d) Compare your results in parts (b) and (c). Which investment has the higher expected return? Which has the greater risk as measured by sw? SECTION 9.2 Linear Regression and the Coefficient of Determination FOCUS POINTS State the least-squares criteri

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elementary Statisitcs

Authors: Barry Monk

2nd edition

1259345297, 978-0077836351, 77836359, 978-1259295911, 1259295915, 978-1259292484, 1259292487, 978-1259345296

More Books

Students also viewed these Mathematics questions