Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The best definition for nonparametric statistics I can think of is when data does not fit inside the normal parameters of a normal distribution, it

The best definition for nonparametric statistics I can think of is when data does not fit inside the normal parameters of a normal distribution, it calls for the use of this statistical method. Simply put, nonparametric tests are used when the data cannot be represented by a parametric distribution. If you are trying to conduct a parametric test but your sample size is too small, it would likely instead call for the use of a nonparametric test. Another reason for the use of a nonparametric statistical test is when you have data outliers and you are trying to rack and stack that data. It is also good for finding the exact p-value from the sample, which is unlike parametric tests. However, it is safe to assume that most people are more familiar with parametric tests since they are pulling data from a larger population. The results are likely to be more accurate. The first test that comes to mind is a sign test. The sign test is used to test the null hypothesis that the median of a distribution is equal to the value. This can also be referred to as a distribution free test. The best example I can give of a nonparametric statistical test is with data ranking since that is one of the functions of what nonparametric tests do, especially if the data is skewed or has outliers. Data ranking could be in the form of a histogram to track movie ratings or test scores. Best of luck to you all! Hope everyone passes the course! 1) Nonparametric Statistics is where you don't have to find a distribution, more like a ranking, comparing two things. 2) I think it would be an appropriate choice because how would you just need an average number, which you could just rank it rather than give an exact number. Especially if you aren't out there to catch each offspring that the whale has. 3) If I were to choose a specific type of test, I would use the Spearman Rank Correlation. I chose this because I don't think you could catch every offspring from a whale if you're not there every day. I feel like having a ranked number rather than having a specific number would work just as easily. Then you could compare 50 years ago to the average you get now. I'm not quite sure on how they catch each offspring they have unless they're constantly out there watching them, which might get a tad boring. I personally don't feel that global warming has affected the whales whatsoever. 4) Something I could use in my daily life might be how many upset tax clients we have versus how many are satisfied. I get tons of calls each day with someone either griping or someone who is just peachy. We could do a rank and figure out how we might need to change things in the office. Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil Japan Korea Mexico Singapore Spain United Kingdom United States Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for materials in your areas of interest. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This book is dedicated to the memory of a great teacher, mathematician, and friend Burton W. Jones Professor Emeritus, University of Colorado Understandable Statistics: Concepts and Methods, Tenth Edition Charles Henry Brase, Corrinne Pellillo Brase Editor in Chief: Michelle Julet Publisher: Richard Stratton Senior Sponsoring Editor: Molly Taylor Senior Editorial Assistant: Shaylin Walsh Media Editor: Andrew Coppola Marketing Manager: Ashley Pickering Marketing Communications Manager: Mary Anne Payumo Content Project Manager: Jill Clark 2012, 2009, 2006 Brooks/Cole, Cengage Learning ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher. For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706 For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be emailed to permissionrequest@cengage.com. Art Director: Linda Helcher Senior Manufacturing Buyer: Diane Gibbons Senior Rights Acquisition Specialist, Text: Katie Huha Rights Acquisition Specialist, Images: Mandy Groszko Text Permissions Editor: Sue Howard Production Service: Elm Street Publishing Services Library of Congress Control Number: 2009942998 Student Edition: ISBN-13: 978-0-8400-4838-7 ISBN-10: 0-8400-4838-6 Annotated Instructor's Edition: ISBN-13: 978-0-8400-5456-2 ISBN-10: 0-8400-5456-4 Cover Designer: RHDG Cover Image: Anup Shah Compositor: Integra Software Services, Ltd. Pvt. Brooks/Cole 20 Channel Center Street Boston, MA 02210 USA Cengage Learning is a leading provider of customized learning solutions with ofce locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil and Japan. Locate your local ofce at international.cengage.com/region Cengage Learning products are represented in Canada by Nelson Education, Ltd. For your course and learning solutions, visit www.cengage.com. Purchase any of our products at your local college store or at our preferred online store www.cengagebrain.com. Printed in the United States of America 1 2 3 4 5 6 7 14 13 12 11 10 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. PART I: Inferences Using the Chi-Square Distribution Overview of the Chi-Square Distribution 10.1 Chi-Square: Tests of Independence and of Homogeneity 10.2 Chi-Square: Goodness of Fit 10.3 Testing and Estimating a Single Variance or Standard Deviation PART II: Inferences Using the F Distribution Overview of the F Distribution 10.4 Testing Two Variances 10.5 One-Way ANOVA: Comparing Several Sample Means 10.6 Introduction to Two-Way ANOVA \"So what!\" Norman Rockwell Anonymous John Bryson/Time Life Pictures/Getty Images 10 We have all heard the exclamation, \"So what!\" Philologists (people who study cultural linguistics) tell us that this expression is a shortened version of \"So what is the difference!\" They also tell us that there are similar popular or slang expressions about differences in all languages and cultures. Norman Rockwell (1894-1978) painted everyday people and situations. In this cover, \"Girl with Black Eye,\" for the Saturday Evening Post (May 23, 1953), a young lady is about to have a conference with her school principal. So what! It is human nature to challenge the claim that something is better, worse, or just simply different. In this chapter, we will focus on this very human theme by studying a variety of topics regarding questions of whether or not differences exist between two population variances or among several population means. For online student resources, visit the Brase/Brase, Understandable Statistics, 10th edition web site at http://www.cengage.com/statistics/brase. 590 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chi-Square and F Distributions How do you decide if random variables are dependent or independent? (SECTION 10.1) How do you decide if different populations share the same proportions of specied characteristics? (SECTION 10.1) How do you decide if two distributions are not only dependent, but actually the same distribution? (SECTION 10.2) How do you compute condence intervals and tests for s? (SECTION 10.3) What is one-way ANOVA? Where is it used? (SECTION 10.5) What about two-way ANOVA? Where is it used? (SECTION 10.6) The Image Works How do you test two variances s2 and s2 ? (SECTION 10.4) 1 2 Mesa Verde National Park FOCUS PROBLEM Archaeology in Bandelier National Monument Archaeologists at Washington State University did an extensive summer excavation at Burnt Mesa Pueblo in Bandelier National Monument. Their work is published in the book Bandelier Archaeological Excavation Project: Summer 1990 Excavations at Burnt Mesa Pueblo and Casa del Rito, edited by T. A. Kohler. One question the archaeologists asked was: Is raw material used by prehistoric Indians for stone tool manufacture independent of the archaeological excavation site? Two different excavation sites at Burnt Mesa Pueblo gave the information in the table below. Use a chi-square test with 5% level of signicance to test the claim that raw material used for construction of stone tools and excavation site are independent. (See Problem 17 of Section 10.1.) Material Site A Site B Row Total Basalt 731 584 1315 Obsidian 102 93 195 Pedernal chert 510 525 1035 Other Column Total 85 94 179 1428 1296 2724 Robert Brenner/PhotoEdit Stone Tool Construction Material, Burnt Mesa Pueblo Archaeological excavation site 591 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 592 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS PART I: I N F E R E N C E S U S I N G TH E C H I-S Q UAR E D I STR I B UTI O N Overview of the Chi-Square Distribution So far, we have used several probability distributions for hypothesis testing and condence intervals, with the most frequently used being the normal distribution and the Student's t distribution. In this chapter, we will use two other probability distributions, namely, the chi-square distribution (where chi is pronounced like the rst two letters in the word kite) and the F distribution. In Part I, we will see applications of the chi-square distribution, whereas in Part II, we will see some important applications of the F distribution. Chi is a Greek letter denoted by the symbol x, so chi-square is denoted by the symbol x2. Because the distribution is of chi-square values, the x2 values begin at 0 and then are all positive. The graph of the x2 distribution is not symmetrical, and like the Student's t distribution, it depends on the number of degrees of freedom. Figure 10-1 shows x2 distributions for several degrees of freedom (d.f.). As the degrees of freedom increase, the graph of the chi-square distribution becomes more bell-like and begins to look more and more symmetric. The mode (high point) of a chi-square distribution with n degrees of freedom occurs over n 2 (for n 3). Table 7 of Appendix II shows critical values of chi-square distributions for which a designated area falls to the right of the critical value. Table 10-1 gives an excerpt from Table 7. Notice that the row headers are degrees of freedom, and the column headers are areas in the right tail of the distribution. For instance, according to the table, for a x2 distribution with 3 degrees of freedom, the area occurring to the right of x2 0.072 is 0.995. For a x2 distribution with 4 degrees of freedom, the area falling to the right of x2 13.28 is 0.010. In the next three sections, we will see how to apply the chi-square distribution to different applications. FIGURE 10-1 d.f. 3 The x2 Distribution d.f. 5 d.f. 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Excerpt from Table 7 (Appendix II): The 2 Distribution TABLE 10-1 Area of the Right Tail d.f. Right-tail area 2 0.995 0.990 0.975 ... 0.010 0.005 o o o o o o 3 0.072 0.115 0.216 11.34 12.84 4 0.207 0.297 0.484 13.28 14.86 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 S EC T I O N 10 . 1 593 Chi-Square: Tests of Independence and of Homogeneity Chi-Square: Tests of Independence and of Homogeneity FOCUS POINTS Set up a test to investigate independence of random variables. Use contingency tables to compute the sample x2 statistic. Find or estimate the P-value of the sample x2 statistic and complete the test. Conduct a test of homogeneity of populations. Innovative Machines Incorporated has developed two new letter arrangements for computer keyboards. The company wishes to see if there is any relationship between the arrangement of letters on the keyboard and the number of hours it takes a new typing student to learn to type at 20 words per minute. Or, from another point of view, is the time it takes a student to learn to type independent of the arrangement of the letters on a keyboard? To answer questions of this type, we test the hypotheses Hypotheses Test of independence Chi-square distribution Observed frequency O Contingency table H0: Keyboard arrangement and learning times are independent. H1: Keyboard arrangement and learning times are not independent. In problems of this sort, we are testing the independence of two factors. The probability distribution we use to make the decision is the chi-square distribution. Recall from the overview of the chi-square distribution that chi is pronounced like the rst two letters of the word kite and is a Greek letter denoted by the symbol x. Thus, chi-square is denoted by x2. Innovative Machines' rst task is to gather data. Suppose the company took a random sample of 300 beginning typing students and randomly assigned them to learn to type on one of three keyboards. The learning times for this sample are shown in Table 10-2. These learning times are the observed frequencies O. Table 10-2 is called a contingency table. The shaded boxes that contain observed frequencies are called cells. The row and column totals are not considered to be cells. This contingency table is of size 3 3 (read \"three-by-three\") because there are three rows of cells and three columns. When giving the size of a contingency table, we always list the number of rows rst. To determine the size of a contingency table, count the number of rows containing data and the number of columns containing data. The size is Number of rows Number of columns where the symbol \"\" is read \"by.\" The number of rows is always given rst. TABLE 10-2 Keyboard Keyboard versus Time to Learn to Type at 20 wpm 21-40 h 41-60 h 61-80 h Row Total A #1 25 #2 30 #3 25 80 B #4 30 #5 71 #6 19 120 Standard #7 35 #8 49 #9 16 100 Column Total 90 150 60 300 Sample size Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 594 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS GUIDED EXERCISE 1 Size of contingency table Give the sizes of the contingency tables in Figures 10-2(a) and (b). Also, count the number of cells in each table. (Remember, each pink shaded box is a cell.) (a) FIGURE 10-2(a) Contingency Table (b) FIGURE 10-2(b) Contingency Table Row total Column total (a) There are two rows and four columns, so this is a 2 4 table. There are eight cells. (b) Here we have three rows and two columns, so this is a 3 2 table with six cells. Expected frequency E We are testing the null hypothesis that the keyboard arrangement and the time it takes a student to learn to type are independent. We use this hypothesis to determine the expected frequency of each cell. For instance, to compute the expected frequency of cell 1 in Table 10-2, we observe that cell 1 consists of all the students in the sample who learned to type on keyboard A and who mastered the skill at the 20-words-per-minute level in 21 to 40 hours. By the assumption (null hypothesis) that the two events are independent, we use the multiplication law to obtain the probability that a student is in cell 1. P(cell 1) P(keyboard A and skill in 2140 h) P(keyboard A) # P(skill in 2140 h) Because there are 300 students in the sample and 80 used keyboard A, P(keyboard A) 80 300 Also, 90 of the 300 students learned to type in 21-40 hours, so Stephen Coburn, 2010/Used under license from Shutterstock.com P(skill in 2140 h) 90 300 Using these two probabilities and the assumption of independence, P(keyboard A and skill in 2140 h) 80 # 90 300 300 Finally, because there are 300 students in the sample, we have the expected frequency E for cell 1. E P(student in cell 1) # (no. of students in sample) 80 # 90 # 80 # 90 300 24 300 300 300 We can repeat this process for each cell. However, the last step yields an easier formula for the expected frequency E. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 595 Chi-Square: Tests of Independence and of Homogeneity Formula for expected frequency E Row total Column total E (Row total)(Column total) Sample size Note: If the expected value is not a whole number, do not round it to the nearest whole number. Let's use this formula in Example 1 to nd the expected frequency for cell 2. EX AM P LE 1 Expected frequency Find the expected frequency for cell 2 of contingency Table 10-2. SOLUTION: Cell 2 is in row 1 and column 2. The row total is 80, and the column total is 150. The size of the sample is still 300. E GUIDED EXERCISE 2 (Row total)(Column total) Sample size (80)(150) 40 300 Expected frequency Table 10-3 contains the observed frequencies O and expected frequencies E for the contingency table giving keyboard arrangement and number of hours it takes a student to learn to type at 20 words per minute. Fill in the missing expected frequencies. TABLE 10-3 For cell 3, we have Complete Contingency Table of Keyboard Arrangement and Time to Learn to Type Keyboard 41-60 h #1 A 21-40 h #2 61-80 h #3 O 25 O 30 O 25 E 24 E 40 #4 #5 O 30 O 71 O 19 E E Computing the sample test statistic 2 #8 O 49 O 16 E 50 60 E E 90 100 300 Sample Size (120)(150) 60 300 For cell 6, we have E 20 150 120 (80)(60) 16 300 For cell 5, we have #9 O 35 E Column Total 80 E #7 Standard Row Total #6 E 36 B E (120)(60) 24 300 For cell 7, we have E (100)(90) 30 300 Now we are ready to compute the sample statistic x2 for the typing students. The x2 value is a measure of the sum of the differences between observed frequency O and expected frequency E in each cell. These differences are listed in Table 10-4. As you can see, if we sum the differences between the observed frequencies and the expected frequencies of the cells, we get the value zero. This total certainly does not reect the fact that there were differences between the observed and expected frequencies. To obtain a measure whose sum does reect the magnitude of the differences, we square the differences and work with the quantities (O E)2. But instead of using the terms (O E)2, we use the values (O E)2/E. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 596 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS TABLE 10-4 Differences Between Observed and Expected Frequencies Cell Observed O Expected E 1 2 3 4 5 6 7 8 9 25 30 25 30 71 19 35 49 16 Difference (O E) 24 40 16 36 60 24 30 50 20 1 10 9 6 11 5 5 1 4 (O E) 0 We use this expression because a small difference between the observed and expected frequencies is not nearly as important when the expected frequency is large as it is when the expected frequency is small. For instance, for both cells 1 and 8, the squared difference (O E)2 is 1. However, this difference is more meaningful in cell 1, where the expected frequency is 24, than it is in cell 8, where the expected frequency is 50. When we divide the quantity (O E)2 by E, we take the size of the difference with respect to the size of the expected value. We use the sum of these values to form the sample statistic x2: Sample test statistic 2 x2 (O E)2 E where the sum is over all cells in the contingency table. COMMENT If you look up the word irony in a dictionary, you will nd one of its meanings to be \"the difference between actual (or observed) results and expected results.\" Because irony is so prevalent in much of our human experience, it is not surprising that statisticians have incorporated a related chi-square distribution into their work. Sample x2 GUIDED EXERCISE 3 The last two rows of Table 10-5 are (a) Complete Table 10-5. TABLE 10-5 Data of Table 10-4 Cell O E OE (O E)2 (O E)2/E Cell O E OE (O E)2 (O E)2E 1 2 3 4 5 6 7 8 9 25 30 25 30 71 19 35 49 16 24 40 16 36 60 24 30 50 20 1 10 9 6 11 5 5 ______ ______ 1 100 81 36 121 25 25 ______ ______ 0.04 2.50 5.06 1.00 2.02 1.04 0.83 ______ ______ 8 49 50 1 1 0.02 9 16 20 4 16 0.80 2 (O E)2 E (b) Compute the statistic x for this sample. (O E)2 E total of last column 13.31 _____ Since x 2 (O E)2 E , then x 2 13.31. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 597 Chi-Square: Tests of Independence and of Homogeneity Notice that when the observed frequency and the expected frequency are very close, the quantity (O E)2 is close to zero, and so the statistic x2 is near zero. As the difference increases, the statistic x2 also increases. To determine how large the sample statistic can be before we must reject the null hypothesis of independence, we nd the P-value of the statistic in the chi-square distribution, Table 7 of Appendix II, and compare it to the specied level of signicance a. The P-value depends on the number of degrees of freedom. To test independence, the degrees of freedom d.f. are determined by the following formula. Degrees of freedom for test of independence Degrees of freedom for test of independence Degrees of freedom (Number of rows 1) # (Number of columns 1) d.f. (R 1)(C 1) or where R number of cell rows C number of cell columns GUIDED EXERCISE 4 Degrees of freedom Determine the number of degrees of freedom in the example of keyboard arrangements (see Table 10-2). Recall that the contingency table had three rows and three columns. Finding the P-value for tests of independence d.f. (R 1)(C 1) (3 1)(3 1) (2)(2) 4 To test the hypothesis that the letter arrangement on a keyboard and the time it takes to learn to type at 20 words per minute are independent at the a 0.05 level of signicance, we estimate the P-value shown in Figure 10-3 below for the sample test statistic x2 13.31 (calculated in Guided Exercise 3). We then compare the P-value to the specied level of signicance a. For tests of independence, we always use a right-tailed test on the chi-square distribution. This is because we are testing to see if the x2 measure of the difference between the observed and expected frequencies is too large to be due to chance alone. In Guided Exercise 4, we found that the degrees of freedom for the example of keyboard arrangements is 4. From Table 7 of Appendix II, in the row headed by d.f. 4, we see that the sample x2 13.31 falls between the entries 13.28 and 14.86. FIGURE 10-3 P-value P-value Right-tail Area Sample 2 = 13.31 0.010 0.005 d.f. 4 13.28 14.86 c Sample x2 13.31 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 598 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS The corresponding P-value falls between 0.005 and 0.010. From technology, we get P-value 0.0098. ) ( Since the P-value is less than the level of signicance a 0.05, we reject the null hypothesis of independence and conclude that keyboard arrangement and learning time are not independent. Tests of independence for two statistical variables involve a number of steps. A summary of the procedure follows. P ROCEDU R E HOW TO TEST FOR INDEPENDENCE OF TWO STATISTICAL VARIABLES Setup Construct a contingency table in which the rows represent one statistical variable and the columns represent the other. Obtain a random sample of observations, which are assigned to the cells described by the rows and columns. These assignments are called the observed values O from the sample. Procedure 1. Set the level of signicance a and use the hypotheses H0: The variables are independent. H1: The variables are not independent. 2. For each cell, compute the expected frequency E (do not round but give as a decimal number). E (Row total)(Column total) Sample size Requirement You need a sample size large enough so that, for each cell, E 5. Now each cell has two numbers, the observed frequency O from the sample and the expected frequency E. Next, compute the sample chi-square test statistic x2 (O E)2 with degrees of freedom d.f. (R 1)(C 1) E where the sum is over all cells in the contingency table and R number of rows in contingency table C number of columns in contingency table 3. Use the chi-square distribution (Table 7 of Appendix II) and a righttailed test to nd (or estimate) the P-value corresponding to the test statistic. 4. Conclude the test. If P-value a, then reject H0. If P-value 7 a, then do not reject H0. 5. Interpret your conclusion in the context of the application. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 GUIDED EXERCISE 5 599 Chi-Square: Tests of Independence and of Homogeneity Testing independence Super Vending Machines Company is to install soda pop machines in elementary schools and high schools. The market analysts wish to know if avor preference and school level are independent. A random sample of 200 students was taken. Their school level and soda pop preferences are given in Table 10-6. Is independence indicated at the a 0.01 level of signicance? STEP 1: State the null and alternate hypotheses. STEP 2: (a) Complete the contingency Table 10-6 by lling in the required expected frequencies. H0: School level and soda pop preference are independent. H1: School level and soda pop preference are not independent. The expected frequency Table 10-6 School Level and Soda Pop Preference Elementary School Row Total for cell 5 is Soda Pop High School (40)(80) 16 200 Kula Kola O 33 1 O 57 2 90 for cell 6 is E 36 E 54 (40)(120) 24 200 Mountain O 30 3 O 20 4 Mist E 20 E 30 for cell 7 is (20)(80) 8 200 O5 Jungle 5 O 35 6 Grape E _____ O 12 7 O 8 8 E _____ 40 E _____ Diet Pop 50 E _____ Column 80 120 20 200 Total Sample Size (b) Fill in Table 10-7 and use the table to nd the sample statistic x2. (20)(120) 12 200 Note: In this example, the expected frequencies are all whole numbers. If the expected frequency has a decimal part, such as 8.45, do not round the value to the nearest whole number; rather, give the expected frequency as the decimal number. for cell 8 is The last three rows of Table 10-7 should read as follows: Table 10-7 Computational Table for x2 OE (O E)2 (O E)2/E Cell O E OE 36 3 9 0.25 6 35 24 11 121 5.04 54 3 9 0.17 7 12 8 4 16 2.00 30 20 10 100 5.00 8 8 12 4 16 1.33 20 30 10 100 3.33 Cell O E 1 33 2 57 3 4 5 5 16 11 121 7.56 6 35 24 11 _____ _____ 7 12 8 _____ _____ _____ 8 8 12 _____ _____ _____ (c) What is the size of the contingency table? Use the number of rows and the number of columns to determine the degrees of freedom. (O E)2 (O E)2/E x2 total of last column (O E)2 24.68 E The contingency table is of size 4 2. Since there are four rows and two columns, d.f. (4 1)(2 1) 3 Continued Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 600 Chapter 10 GUIDED EXERCISE 5 CHI-SQUARE AND F DISTRIBUTIONS continued STEP 3: Use Table 7 of Appendix II to estimate the P-value of the sample statistic x2 24.68 with d.f. 3. Right-tail Area 0.005 d.f. 3 12.84 c Sample x2 24.68 As the x2 values increase, the area to the right decreases, so P-value 6 0.005 STEP 4: Conclude the test by comparing the P-value of the sample statistic to the level of signicance a 0.01. ( 0.005 0.01 Since the P-value is less than a, we reject the null hypothesis of independence. Technology gives P-value 0.00002. STEP 5: Interpret the test result in the context of the application. TE C H N OTE S At the 1% level of signicance, we conclude that school level and soda pop preference are dependent. The TI-84Plus / TI-83Plus/TI-nspire calculators, Excel 2007, and Minitab all support chi-square tests of independence. In each case, the observed data are entered in the format of the contingency table. TI-84Plus/TI-83Plus/TI-nspire (with TI-84Plus keypad) Enter the observed data into a matrix. Set the dimension of matrix [B] to match that of the matrix of observed values. Expected values will be placed in matrix [B]. Press STAT, TESTS, and select option C:x2-Test. The output gives the sample x2 with the P-value. Excel 2007 Enter the table of observed values. Use the formulas of this section to compute the expected values. Enter the corresponding table of expected values. Finally, click insert function fx . Select Statistical for the category and CHITEST for the function. Excel returns the P-value of the sample x2 value. Minitab Enter the contingency table of observed values. Use the menu selection Stat Tables Chi-Square Test. The output shows the contingency table with expected values and the sample x2 with P-value. Tests of Homogeneity We've seen how to use contingency tables and the chi-square distribution to test for independence of two random variables. The same process enables us to determine whether several populations share the same proportions of distinct categories. Such a test is called a test of homogeneity. According to the dictionary, among the denitions of the word homogeneous are \"of the same structure\" and \"composed of similar parts.\" In statistical jargon, this translates as a test of homogeneity to see if two or more populations share specied characteristics in the same proportions. Test of homogeneity A test of homogeneity tests the claim that different populations share the same proportions of specied characteristics. The computational processes for conducting tests of independence and tests of homogeneity are the same. However, there are two main differences in the initial setup of the two types of tests, namely, the sampling method and the hypotheses. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 601 Chi-Square: Tests of Independence and of Homogeneity Tests of independence compared to tests of homogeneity 1. Sampling method For tests of independence, we use one random sample and observe how the sample members are distributed among distinct categories. For tests of homogeneity, we take random samples from each different population and see how members of each population are distributed over distinct categories. 2. Hypotheses For tests of independence, H0: The variables are independent. H1: The variables are not independent. For tests of homogeneity, H0: Each population shares respective characteristics in the same proportion. H1: Some populations have different proportions of respective characteristics. EX AM P LE 2 Test of homogeneity Image Source/Getty Images Petswho can resist a cute kitten or puppy? Tim is doing a research project involving pet preferences among students at his college. He took random samples of 300 female and 250 male students. Each sample member responded to the survey question \"If you could own only one pet, what kind would you choose?\" The possible responses were: \"dog,\" \"cat,\" \"other pet,\" \"no pet.\" The results of the study follow. Pet Preference Gender Dog Cat Other Pet No Pet Female 120 Male 135 132 18 30 70 20 25 Does the same proportion of males as females prefer each type of pet? Use a 1% level of signicance. We'll answer this question in several steps. (a) First make a cluster bar graph showing the percentages of females and the percentages of males favoring each category of pet. From the graph, does it appear that the proportions are the same for males and females? SOLUTION: The cluster graph shown in Figure 10-4 was created using Minitab. FIGURE 10-4 Pet Preference by Gender Percentage Looking at the graph, it appears that there are differences in the proportions of females and males preferring each type of pet. However, let's conduct a statistical test to verify our visual impression. 60 54% 50 40 Female Male 44% 40% 28% 30 20 10 0 6% Dog Cat 8% Other 10% 10% No pet Pet Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 602 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS (b) Is it appropriate to use a test of homogeneity? SOLUTION: Yes, since there are separate random samples for each designated population, male and female. We also are interested in whether each population shares the same proportion of members favoring each category of pet. (c) State the hypotheses and conclude the test by using the Minitab printout. H0: The proportions of females and males naming each pet preference are the same. H1: The proportions of females and males naming each pet preference are not the same. Chi-Square Test: Dog, Cat, Other, No Pet Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Dog Cat Other No Pet 1 120 132 18 30 139.09 110.18 20.73 30.00 2.620 4.320 0.359 0.000 2 135 115.91 3.144 Total Chi-Sq 255 16.059, DF 70 91.82 5.185 202 3, P-Value Total 300 20 17.27 0.431 25 25.00 0.000 250 38 0.001 55 550 Since the P-value is less than a, we reject H0 at the 1% level of signicance. (d) Interpret the results. SOLUTION: It appears from the sample data that male and female students at Tim's college have different preferences when it comes to selecting a pet. P ROCEDU R E HOW TO TEST FOR HOMOGENEITY OF POPULATIONS Setup Obtain random samples from each of the populations. For each population, determine the number of members that share a distinct specied characteristic. Make a contingency table with the different populations as the rows (or columns) and the characteristics as the columns (or rows). The values recorded in the cells of the table are the observed values O taken from the samples. Procedure 1. Set the level of signicance and use the hypotheses H0: The proportion of each population sharing specied characteristics is the same for all populations. H1: The proportion of each population sharing specied characteristics is not the same for all populations. 2. Follow steps 2-5 of the procedure used to test for independence. It is important to observe that when we reject the null hypothesis in a test of homogeneity, we don't know which proportions differ among the populations. We know only that the populations differ in some of the proportions sharing a characteristic. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 Chi-Square: Tests of Independence and of Homogeneity 603 Multinomial Experiments (Optional Reading) Here are some observations that may be considered \"brain teasers.\" In Chapters 6, 7, and 8, you studied normal approximations to binomial experiments. This concept resulted in some important statistical applications. Is it possible to extend this idea and obtain even more applications? Well, read on! Consider a binomial experiment with n trials. The probability of success on each trial is p, and the probability of failure is q 1 p. If r is the number of successes out of n trials, then, from Chapter 5, you know that P(r) Multinomial experiments n! prqnr r!(n r)! The binomial setting has just two outcomes: success or failure. What if you want to consider more than just two outcomes on each trial (for instance, the outcomes shown in a contingency table)? Well, you need a new statistical tool. Consider a multinomial experiment. This means that 1. The trials are independent and repeated under identical conditions. 2. The outcome on each trial falls into exactly one of k 2 categories or cells. 3. The probability that the outcome of a single trial will fall into the ith category or cell is pi (where i 1, 2, p , k) and remains the same for each trial. Furthermore, p1 p2 p pk 1. 4. Let ri be a random variable that represents the number of trials in which the outcome falls into category or cell i. If you have n trials, then r1 r2 p rk n. The multinomial probability distribution is then P(r1,r2, p rk) n! pr1pr2 p prk 1 2 k r1!r2! p rk! How are the multinomial distribution and the binomial distribution related? For the special case k 2, we use the notation r1 r, r2 n r, p1 p, and p2 q. In this special case, the multinomial distribution becomes the binomial distribution. There are two important tests regarding the cell probabilities of a multinomial distribution. I. Test of Independence (Section 10.1) In this test, the null hypothesis of independence claims that each cell probability pi will equal the product of its respective row and column probabilities. The alternate hypothesis claims that this is not so. II. Goodness-of-Fit Test (Section 10.2) In this test, the null hypothesis claims that each category or cell probability pi will equal a prespecied value. The alternate hypothesis claims that this is not so. So why don't we use the multinomial probability distribution in Sections 10.1 and 10.2? The reason is that the exact calculation of probabilities associated with type I errors using the multinomial distribution is very tedious and cumbersome. Fortunately, the British statistician Karl Pearson discovered that the chi-square distribution can be used for this purpose, provided the expected value of each cell or category is at least 5. It is Pearson's chi-square methods that are presented in Sections 10.1 and 10.2. In a sense, you have seen a similar application to statistical tests in Section 8.3, where you used the normal approximation to the binomial when np, the expected number of successes, and nq, the expected number of failures, were both at least 5. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 604 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS VI EWPOI NT Loyalty! Going, Going, Gone! Was there a time in the past when people worked for the same company all their lives, regularly purchased the same brand names, always voted for candidates from the same political party, and loyally cheered for the same sports team? One way to look at this question is to consider tests of statistical independence. Is customer loyalty independent of company prots? Can a company maintain its productivity independent of loyal workers? Can politicians do whatever they please independent of the voters back home? Americans may be ready to act on a pent-up desire to restore a sense of loyalty in their lives. For more information, see American Demographics, Vol. 19, No. 9. SECTION 10.1 P ROB LEM S 1. Statistical Literacy In general, are chi-square distributions symmetric or skewed? If skewed, are they skewed right or left? 2. Statistical Literacy For chi-square distributions, as the number of degrees of freedom increases, does any skewness increase or decrease? Do chi-square distributions become more symmetric (and normal) as the number of degrees of freedom becomes larger and larger? 3. Statistical Literacy For chi-square tests of independence and of homogeneity, do we use a right-tailed, left-tailed, or two-tailed test? 4. Critical Thinking In general, how do the hypotheses for chi-square tests of independence differ from those for chi-square tests of homogeneity? Explain. 5. Critical Thinking Zane is interested in the proportion of people who recycle each of three distinct products: paper, plastic, electronics. He wants to test the hypothesis that the proportion of people recycling each type of product differs by age group: 12-18 years old, 19-30 years old, 31-40 years old, over 40 years old. Describe the sampling method appropriate for a test of homogeneity regarding recycled products and age. 6. Critical Thinking Charlotte is doing a study on fraud and identity theft based both on source (checks, credit cards, debit cards, online banking/nance sites, other) and on gender of the victim. Describe the sampling method appropriate for a test of independence regarding source of fraud and gender. 7. Interpretation: Test of Homogeneity Consider Zane's study regarding products recycled and age group (see Problem 5). Suppose he found a sample x2 16.83. (a) How many degrees of freedom are used? Recall that there were 4 age groups and 3 products specied. Approximate the P-value and conclude the test at the 1% level of signicance. Does it appear that the proportion of people who recycle each of the specied products differ by age group? Explain. (b) From this study, can Zane identify how the different age groups differ regarding the proportion of those recycling the specied product? Explain. 8. Interpretation: Test of Independence Consider Charlotte's study of source of fraud/identity theft and gender (see Problem 6). She computed sample x2 10.2. (a) How many degrees of freedom are used? Recall that there were 5 sources of fraud/identity theft and, of course, 2 genders. Approximate the P-value and conclude the test at the 5% level of signicance. Would it seem that gender and source of fraud/identity theft are independent? (b) From this study, can Charlotte identify which source of fraud/identity theft is dependent with respect to gender? Explain. For Problems 9-19, please provide the following information. (a) What is the level of signicance? State the null and alternate hypotheses. (b) Find the value of the chi-square statistic for the sample. Are all the expected frequencies greater than 5? What sampling distribution will you use? What are the degrees of freedom? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 605 Chi-Square: Tests of Independence and of Homogeneity (c) Find or estimate the P-value of the sample test statistic. (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis of independence? (e) Interpret your conclusion in the context of the application. Use the expected values E to the hundredths place. 9. Psychology: Myers-Briggs The following table shows the Myers-Briggs personality preferences for a random sample of 406 people in the listed professions (Atlas of Type Tables by Macdaid, McCaulley, and Kainz). E refers to extroverted and I refers to introverted. Personality Preference Type Occupation E I Row Total Clergy (all denominations) 62 45 107 M.D. 68 94 162 Lawyer 56 81 137 Column Total 186 220 406 Use the chi-square test to determine if the listed occupations and personality preferences are independent at the 0.05 level of signicance. 10. Psychology: Myers-Briggs The following table shows the Myers-Briggs personality preferences for a random sample of 519 people in the listed professions (Atlas of Type Tables by Macdaid, McCaulley, and Kainz). T refers to thinking and F refers to feeling. Personality Preference Type Occupation T F Row Total Clergy (all denominations) M.D. 57 91 148 77 82 159 Lawyer 118 94 212 Column Total 252 267 519 Use the chi-square test to determine if the listed occupations and personality preferences are independent at the 0.01 level of signicance. 11. Archaeology: Pottery The following table shows site type and type of pottery for a random sample of 628 sherds at a location in Sand Canyon Archaeological Project, Colorado (The Sand Canyon Archaeological Project, edited by Lipe). Pottery Type Site Type Mesa Verde McElmo Mancos Row Total Black-on-White Black-on-White Black-on-White Black-on-White Mesa Top 75 61 53 Cliff-Talus 81 70 62 189 213 Canyon Bench 92 68 66 226 Column Total 248 199 181 628 Use a chi-square test to determine if site type and pottery type are independent at the 0.01 level of signicance. 12. Archaeology: Pottery The following table shows ceremonial ranking and type of pottery sherd for a random sample of 434 sherds at a location in the Sand Canyon Archaeological Project, Colorado (The Architecture of Social Integration in Prehistoric Pueblos, edited by Lipe and Hegmon). Use a chi-square test to determine if ceremonial ranking and pottery type are independent at the 0.05 level of signicance. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 606 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS Ceremonial Ranking Cooking Jar Sherds Decorated Jar Sherds (Noncooking) A 86 49 135 B 92 53 145 C 79 75 154 257 177 434 Column Total Row Total 13. Ecology: Buffalo The following table shows age distribution and location of a random sample of 166 buffalo in Yellowstone National Park (based on information from The Bison of Yellowstone National Park, National Park Service Scientic Monograph Series). Age Lamar District Nez Perce District Firehole District Row Total Calf 13 13 15 41 Yearling 10 11 12 33 Adult 34 28 30 92 Column Total 57 52 57 166 Use a chi-square test to determine if age distribution and location are independent at the 0.05 level of signicance. 14. Psychology: Myers-Briggs The following table shows the Myers-Briggs personality preference and area of study for a random sample of 519 college students (Applications of the Myers-Briggs Type Indicator in Higher Education, edited by Provost and Anchors). In the table, IN refers to introvert, intuitive; EN refers to extrovert, intuitive; IS refers to introvert, sensing; and ES refers to extrovert, sensing. Myers-Briggs Preference Arts & Science Business Allied Health Row Total IN 64 15 17 96 EN 82 42 30 154 IS 68 35 12 115 ES Column Total 75 42 37 154 289 134 96 519 Use a chi-square test to determine if Myers-Briggs preference type is independent of area of study at the 0.05 level of signicance. 15. Sociology: Movie Preference Mr. Acosta, a sociologist, is doing a study to see if there is a relationship between the age of a young adult (18 to 35 years old) and the type of movie preferred. A random sample of 93 young adults revealed the following data. Test whether age and type of movie preferred are independent at the 0.05 level. Person's Age Movie 18-23 yr 24-29 yr 30-35 yr Row Total Drama 8 15 11 34 12 10 8 30 9 8 12 29 29 33 31 93 Science ction Comedy Column Total 16. Sociology: Ethnic Groups After a large fund drive to help the Boston City Library, the following information was obtained from a random sample of contributors to the library fund. Using a 1% level of signicance, test the claim that the amount contributed to the library fund is independent of ethnic group. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.1 607 Chi-Square: Tests of Independence and of Homogeneity Number of People Making Contribution Ethnic Group $101-150 $151-200 Over $200 Row Total $1-50 $51-100 A 83 62 53 35 18 251 B 94 77 48 25 20 264 C 78 65 51 40 32 266 D 105 89 63 54 29 340 Column Total 360 293 215 154 99 1121 17. Focus Problem: Archaeology The Focus Problem at the beginning of the chapter refers to excavations at Burnt Mesa Pueblo in Bandelier National Monument. One question the archaeologists asked was: Is raw material used by prehistoric Indians for stone tool manufacture independent of the archaeological excavation site? Two different excavation sites at Burnt Mesa Pueblo gave the information in the following table. Use a chi-square test with 5% level of signicance to test the claim that raw material used for construction of stone tools and excavation site are independent. Stone Tool Construction Material, Burnt Mesa Pueblo Material Site A Site B Row Total Basalt 731 584 1315 Obsidian 102 93 195 Pedernal chert 510 525 1035 Other 85 94 179 1428 Column Total 1296 2724 18. Political Afliation: Spending Two random samples were drawn from members of the U.S. Congress. One sample was taken from members who are Democrats and the other from members who are Republicans. For each sample, the number of dollars spent on federal projects in each congressperson's home district was recorded. (i) Make a cluster bar graph showing the percentages of Congress members from each party who spent each designated amount in their respective home districts. (ii) Use a 1% level of signicance to test whether congressional members of each political party spent designated amounts in the same proportions. Dollars Spent on Federal Projects in Home Districts Less than 5 Billion 5 to 10 Billion Democratic 8 15 22 45 Republican 12 19 16 47 Column Total 20 34 38 92 Party More than 10 Billion Row Total 19. Sociology: Methods of Communication Random samples of people ages 15-24 and 25-34 were asked about their preferred method of (remote) communication with friends. The respondents were asked to select one of the methods from the following list: cell phone, instant message, e-mail, other. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 608 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS (i) Make a cluster bar graph showing the percentages in each age group who selected each method. (ii) Test whether the two populations share the same proportions of preferences for each type of communication method. Use a 0.05. Preferred Communication Method Cell Phone Instant Message 15-24 48 40 5 7 100 25-34 41 30 15 14 100 Column Total 89 70 20 21 200 Age S EC T I O N 10 . 2 E-mail Other Row Total Chi-Square: Goodness of Fit FOCUS POINTS Set up a test to investigate how well a sample distribution ts a given distribution. Use observed and expected frequencies to compute the sample x2 statistic. Find or estimate the P-value and complete the test. Last year, the labor union bargaining agents listed ve categories and asked each employee to mark the one most important to her or him. The categories and corresponding percentages of favorable responses are shown in Table 10-8. The bargaining agents need to determine if the current distribution of responses \"ts\" last year's distribution or if it is different. In questions of this type, we are asking whether a population follows a specied distribution. In other words, we are testing the hypotheses Hypotheses H0: The population ts the given distribution. H1: The population has a different distribution. Computing sample x2 We use the chi-square distribution to test \"goodness-of-t\" hypotheses. Just as with tests of independence, we compute the sample statistic: x2 (O E)2 with degrees of freedom k 1 E where E expected frequency O observed frequency (O E)2 is summed for each category in the distribution E k number of categories in the distribution Next we use the chi-square distribution table (Table 7, Appendix II) to estimate the P-value of the sample x2 statistic. Finally, we compare the P-value to the level of signicance a and conclude the test. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.2 609 Chi-Square: Goodness of Fit TABLE 10-8 Bargaining Categories (last year) Category Percentage of Favorable Responses Vacation time 4% Salary 65% Safety regulations 13% Health and retirement benets 12% Overtime policy and pay Goodness-of-t test 6% In the case of a goodness-of-t test, we use the null hypothesis to compute the expected values for the categories. Let's look at the bargaining category problem to see how this is done. In the bargaining category problem, the two hypotheses are H0: The present distribution of responses is the same as last year's. H1: The present distribution of responses is different. The null hypothesis tells us that the expected frequencies of the present response distribution should follow the percentages indicated in last year's survey. To test this hypothesis, a random sample of 500 employees was taken. If the null hypothesis is true, then there should be 4%, or 20 responses, out of the 500 rating vacation time as the most important bargaining issue. Table 10-9 gives the other expected values and all the information necessary to compute the sample statistic x2. We see that the sample statistic is x2 (O E)2 14.15 E Steve Skjold/PhotoEdit Larger values of the sample statistic x2 indicate greater differences between the proposed distribution and the distribution followed by the sample. The larger the x2 statistic, the stronger the evidence to reject the null hypothesis that the population distribution ts the given distribution. Consequently, goodness-of-t tests are always right-tailed tests. Type of test For goodness-of-t tests, we use a right-tailed test on the chi-square distribution. This is because we are testing to see if the x2 measure of the difference between the observed and expected frequencies is too large to be due to chance alone. To test the hypothesis that the present distribution of responses to bargaining categories is the same as last year's, we use the chi-square distribution (Table 7 TABLE 10-9 Observed and Expected Frequencies for Bargaining Categories Category Vacation time Salary O E (O E)2 (O E)2/E 30 4% of 500 20 100 5.00 290 65% of 500 325 1225 3.77 Safety 70 13% of 500 65 25 0.38 Health and retirement 70 12% of 500 60 100 1.67 Overtime 40 6% of 500 30 100 3.33 O 500 E 500 (O E)2 E 14.15 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 610 Chapter 10 CHI-SQUARE AND F DISTRIBUTIONS of Appendix II) to estimate the P-value of the sample statistic x2 14.15. To estimate the P-value, we need to know the number of degrees of freedom. In the case of a goodness-of-t test, the degrees of freedom are found by the following formula. Degrees of freedom Degrees of freedom for goodness-of-t test d.f. k 1 where k number of categories Notice that when we compute the expected values E, we must use the null hypothesis to compute all but the last one. To compute the last one, we can subtract the previous expected values from the sample size. For instance, for the bargaining issues, we could have found the number of responses for overtime policy by adding the other expected values and subtracting that sum from the sample size 500. We would again get an expected value of 30 responses. The degrees of freedom, then, is the number of E values that must be computed by using the null hypothesis. For the bargaining issues, we have d.f. 5 1 4 where k 5 is the number of categories. We now have the tools necessary to use Table 7 of Appendix II to estimate the P-value of x2 14.15. Figure 10-5 shows the P-value. In Table 7, we use the row headed by d.f. 4. We see that x2 14.15 falls between the entries 13.28 and 14.86. Therefore, the P-value falls between the corresponding right-tail areas 0.005 and 0.010. Technology gives the P-value 0.0068. To test the hypothesis that the distribution of responses to bargaining issues is the same as last year's at the 1% level of signicance, we compare the P-value of the statistic to a 0.01. P-value ) 0.010 ( 0.005 We see that the P-value is less than a, so we reject the null hypothesis that the distribution of responses to bargaining issues is the same as last year's. Interpretation At the 1% level of signicance, we can say that the evidence supports the conclusion that this year's responses to the issues are different from last year's. Goodness-of-t tests involve several steps that can be summarized as follows. FIGURE 10-5 P-value P-value Right-tail area 0.010 4 13.28 d.f. Sample 2 = 14.15 0.005 14.86 Sample 2 14.15 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Section 10.2 P ROCEDU R E 611 Chi-Square: Goodness of Fit HOW TO TEST FOR GOODNESS OF FIT Setup First, each member of a population needs to be classied into exactly one of several different categories. Next, you need a specic (theoretical) distribution that assigns a xed probability (or percentage) that a member of the population will fall into one of the categories. You then need a random sample size n from the population. Let O represent the observed number of data from the sample that fall into each category. Let E represent the expected number of data from the sample that, in theory, would fall into each category. O observed frequency count of a category using sample data E expected frequency of a category (sample size n)(probability assigned to category) Requirement The sample size n should be large enough that E 5 in each category. Procedure 1. Set the level of signicance a and use the hypotheses H0:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

High School Math 2012 Common-core Algebra 2 Grade 10/11

Authors: Savvas Learning Co

Student Edition

9780133186024, 0133186024

More Books

Students also viewed these Mathematics questions

Question

\f

Answered: 1 week ago