Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 20, 2024

1 Analysis of Variance (ANOVA) In Chapter 13 we discussed methods for testing H0 : 1 2 = 0 (i.e., 1 = 2), where 1

1 Analysis of Variance (ANOVA)

In Chapter 13 we discussed methods for testing H0 : 1 2 = 0 (i.e., 1 = 2), where 1 and 2 are the means of two dierent populations or the true mean responses when two dierent treatments are applied. Many investigations involve a comparison of more than two population or treatment means. For example, an investigation was carried out to study possible consequences of the high incidence of head injuries among soccer players ("No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players," The American Journal of Sports Medicine [2002]: 157-162). Three groups of college students (soccer athletes, non-soccer athletes, and a control group consisting of students who did not participate in intercollegiate sports) were considered in the study, and the following information on scores from the Hopkins Verbal Learning Test (which measures immediate memory recall) was given in the paper:

Soccer NonSoccer Group Athletes Athletes Control Sample Size 86 95 53 Sample Mean Score 29.90 30.94 29.32 Sample Standard Deviation 3.73 5.14 3.78

Let 1 and 2 denote the true average (i.e., population mean) scores on the Hopkins test for soccer athletes, non-soccer athletes, and a control group (the students who do not participate in collegiate athletics, respectively). Do the data support the claim that 1 = 2 = 3, or does it appear that at least two of the 's are dierent from one another? This is an example of a single-factor analysis of variance (ANOVA) problem, in which the objective is to decide whether the means for more than two populations or treatments are identical.

2 Single-Factor ANOVA and the F Test

When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor under investigation. For example, an experiment might be carried out to compare three dierent methods for teaching reading (three dierent treatments), in which case the factor of interest would be teaching method, a qualitative factor. If the growth of sh raised in waters having dierent salinity levels- 0%, 10%, 20%, and 30%- is of interest, the factor salinity level is quantitative.

A single-factor analysis of variance (ANOVA) problem involves a comparison of k population or treatment means 1,2,...,k. The objective is to test H0 : 1 = 2 = = k against

Ha : at least two of the 's are dierent

When comparing populations, the analysis is based on independently selected random samples, one from each population. When comparing treatment means, the data typically result from an experiment and the analysis assumes random assignment of the experimental units (subjects

or objects) to treatments. Whether the null hypothesis of a single-factor ANOVA should be rejected depends on how substantially the samples from the dierent populations or treatments dier from one another.

2.1 Notations and Assumptions

Notation in single-factor ANOVA is a natural extension of the notation used in Chapter 11 for comparing two population or treatment means.

ANOVA Notation

k = number of populations or treatments being compared

Population or treatment 1 2 k Population or treatment mean 1 2 k Population or treatment variance 2 1 2 2 2 k Sample size n1 n2 nk Sample mean x1 x2 xk Sample variance s2 1 s2 2 s2 k N = n1 + n2 ++ nk (the total number of observations in the data set)

T = grand total = sum of all N observations = n1 x1 + n2 x2 ++ nk xk x = grand mean = T N

A decision between H0 and H1 is based on examining the x values to see whether observed discrepancies are small enough to be attributable simply to sampling variability or whether an alternative explanation for the dierences is more plausible.

2.2 Example 15.1 An Indicator of Heart Attack Risk

The article "Could Mean Platelet Volume Be a Predictive Marker for Acute Myocardial Infarction?" (Medical Science Monitor [2005]: 387-392) described an experiment in which four groups of patients see king treatment for chest pain were compared with respect to mean platelet volume (MPV, measured in fL). The four groups considered were based on the clinical diagnosis and were (1) noncardiac chest pain, (2) stable angina pectoris, (3) unstable angina pectoris, and (4) myocardial infarction (heart attack). The purpose of the study was to determine if the mean MVP was dierent for the heart attack group, because then MPV could be used as an indicator of heart attack risk and an antiplatelet treatment could be administered in a timely fashion, potentially reducing the risk of heart attack.

To carry out this study, patients seen for chest pain were divided into groups according to diagnosis. The researchers then selected a random sample of 35 from each of the resulting k = 4 groups. The researchers believed that this sampling process would result in samples that were representative of the four populations of interest and that could be regarded as if they were

random samples from these four populations. Table 15.1 presents summary values given in the paper.

Table 15.1 Summary Values for MPV Data of Example 15.1

Sample Group Group Sample Sample Standard Number Description Size Mean Deviation 1 Noncardiac chest pain 35 10.89 0.69 2 Stable angina pectoris 35 11.25 0.74 3 Unstable angina pectoris 35 11.37 0.91 4 Myocardial infarction (heart attack) 35 11.75 1.07

With i denoting the true mean MPV for group i(i = 1,2,3,4), let's consider the null hypothesis H0 : 1 = 2 = 3 = 4. If you compare the given sample means, the mean MVP for the heart attack sample is larger than for the other three samples, it has larger standard deviation too. So, it is not obvious whether H0 is true or false. In situations such as this, we need a formal test procedure.

As with the inferential methods of previous chapters, the validity of the ANOVA test for H0 : 1 = 2 = = k requires some assumptions.

2.3 Assumptions for ANOVA

1. Each of the k population or treatment response distributions is normal. 2. 1 = 1 = = k (The k normal distributions have identical standard deviations.) 3. The observations in the sample from any particular one of the k populations or treatments are independent of one another.

4. When comparing population means, k random samples are selected independently of one another. When comparing treatment means treatments are assigned at random to subjects or objects (or, subjects are assigned at random to treatments).

In practice, the test based on these assumptions works well as long as the assumptions are not too badly violated. If the sample sizes are reasonably large, normal probability plots of the data in each sample are helpful in checking the assumption of normality. Often, however, sample sizes are so small that a separate normal probability plot for each sample is of little value in checking normality.

There is a formal procedure for testing the equality of population standard deviations. Unfortunately, it is quite sensitive to even a small departure from the normality assumption, so we do not recommend its use. Instead, we suggest that the ANOVA F test (to be described subsequently) can safely be used if the largest of the sample standard deviations is at most twice the smallest one. The largest standard deviation is Example 15.1 is s4 = 1.07, which is only about 1.5 times the smallest standard deviation (s1 = 0.69). The test procedure is based on the following measures of variation in the data.

Denition A measure of disparity among the sample means is the treatment sum of squares, denoted by SSTr and given by

SSTr = n1( x1 x)2 + n2( x2 x)2 ++ nk( xk x)2 A measure of variation within the k samples, called error sum of squares and denoted by SSE, is SSE = (n1 1)s2 1 + (n2 1)s2 2 ++ (nk 1)s2 k Each sum of squares has an associated degrees of freedom: treatment df = k1 error df = N k A mean square is a sum of squares divided by its df. In particular,

mean square for treatments = MSTr =

SSTr k1

mean square for error = MSE =

SSE nk The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n1 1) + (n2 1) ++ (nk 1) = n1 + n2 ++ nk 111 = N k

2.4 Heart Attack Calculations Let's return to the mean platelet volume (MPV) data of Example 15.1. The grand mean x was computed to be 11.315. Notice that because the sample sizes are all equal, the grand mean is just the average of the four sample means (this will not usually be the case when the sample sizes are unequal). With x1 = 10.89, x2 = 11.25, x3 = 11.34, x4 = 11.75, and n1 = n2 = n3 = n4 = 35,

SSTr = n1( x1 x)2 + n2( x2 x)2 ++ nk( xk x)2 = 35(10.8911.315)2 + 35(11.2511.315)2 + 35(11.3711.315)2 + 35(11.7511.315)2 = 6.322 + 0.148 + 0.106 + 6.623 = 13.199

Because s1 = 0.69, s2 = 0.74, s3 = 0.91, and s4 = 1.07 SSE = (n1 1)s2 1 + (n2 1)s2 2 ++ (nk 1)s2 k = (351)(0.69)2 + (351)(0.74)2 + (351)(0.91)2 + (351)(1.07)2 = 101.888

The numbers of degrees of freedom are treatment df = k1 = 3 error df = N k = 35 + 35 + 35 + 354 = 136 4

from which

MSTr =

SSTr k1

13.199 3

= 4.400

MSE =

SSE nk

101.888 136

= 0.749

Both MSTr and MSE are quantities whose values can be calculated once sample data are available; i.e., they are statistics. Each of these statistics varies in value from data set to data set. Both statistics MSTr and MSE have sampling distributions, and these sampling distributions have mean values.

2.5 The Single-Factor ANOVA F Test

Null hypothesis : H0 : 1 = 2 = = k

Test Statistic : F =

MSTr MSE

When H0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df1 = k1 and df2 = N k. H0 should be rejected if pvalue

2.6 Heart Attack Calculations Continued

The two mean squares for the MPV data given in Example 15.1 were calculated as

MSTr =

13.199 3

= 4.400 and MSE =

101.888 136

= 0.749

The value of F statistic is then

MSTr =

MSTr MSE

4.400 0.749

= 5.87

with df1 = k 1 = 3 and df2 = N k = 1404 = 136. Using df1 = 3 and df2 = 120 (the closest value to 136 that appears in the table), Appendix Table 6 shows that 5.78 captures the tail area 0.001. Since 5.87 > 5.78, it follows that pvalue = captured tail area < 0.001. The pvalue is smaller than any reasonable , so there is compelling evidence for rejecting H0 : 1 = 2 = 3 = 4. We can conclude that the true mean MPV is not the same for all four patient populations.

THE QUISTIONS IS jQuery224006390162436873448_1619551126875??

.Please read the Soccer Players data