Answered step by step
Verified Expert Solution
Question
1 Approved Answer
mides.chaq 0.0005 0.0050 10000 mildes.Ir.tsq 10 S (a) (c) 10 20 30 40 50 5 10 mildes.. (b) 10 20 30 40 50 FIG.
mides.chaq 0.0005 0.0050 10000 mildes.Ir.tsq 10 S (a) (c) 10 20 30 40 50 5 10 mildes.. (b) 10 20 30 40 50 FIG. 4. Process B alloy composition: (a) X chart; (b) 7 chart based on log ratios; (c) X versus 7. 602 R. A. Boyles For this data set, step (3) in Section 4.2 yields = 4 after rounding up to the nearest integer. Figure 5 shows the x2 chart (Fig. 5(a)), the 72 chart based on log- ratio data analogous to equation (16) (Fig. 5(b)) and the scatterplot of X by T (Fig. 5(c)). The performance of X2 relative to T2 is similar to that observed in the particle size distribution case study in Section 5.1, with 72 identifying two out-of- control points not detected by X, and Xidentifying one point not detected by 7. This data set shows signs of autocorrelation between successive daily checks. This is especially evident in Fig. 5(a). At present, there are no control charting techniques for multivariate autocorrelated data. This is an important area for further search. Such methods will inevitably require more sophisticated computational environ- ments than methods for the independent identically distributed case. Given this, there is no reason why they could not be used for shop floor process monitoring. 7 Summary and conclusions We have developed a modified chi-square control chart that is suitable for shopfloor monitoring of compositional process data. The two main advantages of this technique are that it requires only basic arithmetic operations and statistical functions (mean and variance), and that many technical staff will be familiar with the chi-square test from their basic statistical training. The disadvantage is that it can be less sensitive than the 72 chart. We have found in case studies that this is not a major practical problem, especially in the first stages of statistical monitoring when easily detected assignable causes are relatively plentiful. In our case studies, the availability of a simple multivariate method was the key factor in management decisions to go ahead with shopfloor statistical monitoring programs, which have since proven themselves as value-adding activities. Acknowledgements The author thanks his colleague Donal Krouse for reading preliminary versions of this report and making suggestions that led to significant improvements in the content and presentation. Thanks also to Precision Castparts Corp., for providing the data sets in Sections 5.1, 5.2 and 6.2, and to BHP New Zealand Steel Ltd, for providing the data set in Section 6.1. REFERENCES AITCHISON, J. (1986) The Statistical Analysis of Compositional Data (London, Chapman and Hall). ANDERSON, T. W. (1958) An Introduction to Multivariate Statistical Analysis (New York, Wiley). DUNCAN, A. J. (1950) A chi-square chart for controlling a set of percentages, Industrial Quality Contra, 7, pp. 11-15. HOLMES, D. S. & ZOOK, R. D. (1990) Using chi-square and T-square charts to interpret sieve analysis data, Powder and Bulk Engineering, February, pp. 30-36. Let x=(x1,...,x) have the multinomial distribution (n,), where =(...) is a vector of positive probabilities that sum to 1. The probability mass function is n! p(x) x! xx! for non-negative integer-valued x, with x+ x2+...+x=n. Let u= x/n. Then E(u)= and Cov(u)= {Diag() T}/n As , u has an asymptotic multivariate normal distribution with mean I and the (singular) covariance given above. Let v and be defined by and = (=1 1/2 Then, E(v) 0 and Cov(v) ($)/n (3) (4) (5) As , v has an asumptotic multivariate normal distribution with mean 0 and the (singular) covariance of equation (5). Note that vv coincides with X in equation (2), and that "v'v is the usual chi- square statistic, i.e. nv v = n Note that = 1, and let the columns of the kXk (k-1) matrix A comprise the remainder of an orthonormal basis. Then, we have A'A=L-1> AA'= I-44 Let w = A v. Then, from equations (5) and (6), we have Cov(w)= A (I- $$)A/n = Ix-1/n Because = 0, equation (6) also implies (6) Note that = 1, and let the columns of the k k (k-1) matrix A comprise the remainder of an orthonormal basis. Then, we have A'A-L-15 AA = I-44 Let w = A v. Then, from equations (5) and (6), we have Cov(w)= A (I )A /n = Ix-1/n Because = 0, equation (6) also implies ww = v(I $$)v=vv (6) (7) Because w has an asymptotic multivariate normal distribution with mean 0 and covariance I-/n, it follows that "vv=nw w is asymptotically distributed as x-1 as . This result is the basis for the chi-square goodness-of-fit test and the standard chi-square control chart discussed in Section 1. 3 Dirichlet data Instead of u=x/n for multinomial x, we assume now that u follows the Dirichlet distribution, where = (a, a...,) is a vector of positive parameters. A 592 R. A. Boyles basic reference is Kotz et al. (1982). With = a + a+...+a, the density function is given by p(u)= I(n) T(a)...(a) for 0 < < 1 with u+ + ... + = 1. Setting = a/n, we have E(u) = and Cov(u)= {Diag() - }/(+ 1) With v and defined as in equations (3) and (4), we have E(v) = 0 and Cov(v) = ( -$$)/(n+ 1) (8) On comparing equations (8) and (5), we see that v has the same covariance structure in the Dirichlet case as in the multinomial case. We may represent Dirichlet u in terms of independent gamma variables 21, 22, SB... 21+ 22+ On comparing equations (8) and (5), we see that v has the same covariance structure in the Dirichlet case as in the multinomial case. We may represent Dirichlet u in terms of independent gamma variables 21, 22, ... as 21+ 32+ where 2, has shape parameter a=x, and scale parameter 1. Because 2, is asymptot- ically normal as a, the vectors u and v are asymptotically multivariate normal as for fixed . By the argument given in Section 2, (+ 1)v vis asymptotically distributed as - as for fixed This approximation entails E(vv) (k 1)/(n+ 1) from which we obtain the distributional approximation in its most useful form, i.e. vv~ E(vv)xx-1/(k 1) Note the analogy between equation (9) and the normal-theory distribution of the sample variance, i.e. S -1/(-1) where =E(s). Using this analogy, the easiest way to implement a control chart based on equation (9) is to use the X2 values as input to a standard s chart of sample variances (Montgomery, 1991), where is used as the sample size. Here, one would suppress the lower control limit and set the upper control limit to give the overall '30' false alarm rate of 0.0027. 4 General compositional data 4.1 Theory For general compositional data, we still have E(v) = 0, but we must replace equation (8) with Cov(v) = where rank (2)=-1. Because ov= 0, we have = Var () = 0 (10) Chi-square monitoring compositional process data 593 4.2 Method Based on the theory given here so far, we now describe the general method for calculating chi-square control limits. (1) Estimate the average values 1, 2, ..., with sample means from the baseline data set. (2) Calculate X2 as in equation (2) for each observation in the baseline data set. (3) Estimate the effective degrees of freedom by inserting in equation (14) the 594 R. A. Boyles sample mean and variance of the X (= vv). (4) Round up to the nearest integer. This helps to compensate for possible variance-inflating effects of 'out-of-control' X2 points, although seems to be relatively insensitive to these. (5) Calculate the upper control limit by inputing the X2 data to a standard S charting procedure, using + 1 as the sample size. The upper limit should be set to give the overall '3' false alarm rate of 0.0027. The lower control limit should be suppressed. (6) Follow standard charting procedure for removing extreme outliers (high x values) and recalculating the upper control limit within the S charting procedure. Usually, it is not necessary to go back to step (1) to recalculate the X values and v. 5 Case studies In this section, we discuss two case studies that involve the method proposed here. In each case, a multivariate chart was the key to shop floor monitoring of compositional process data. The key factors in 'selling the chi-square chart were the simplicity of the X2 calculation in equation (2), and the familiarity of technical staff with the chi-square goodness-of-fit test, as obtained from their basic statistical training. In these examples, shopfloor monitoring led to process improvements, including the discovery and resolution of measurement problems; reductions in unplanned equipment downtime; the discovery and resolution of raw materials problems; improvements in operating procedures; and improvements in the quality of products downstream. For each example, we analyze here the baseline data set used to initiate statistical monitoring. 5 Case studies In this section, we discuss two case studies that involve the method proposed here. In each case, a multivariate chart was the key to shop floor monitoring of compositional process data. The key factors in 'selling' the chi-square chart were the simplicity of the X2 calculation in equation (2), and the familiarity of technical staff with the chi-square goodness-of-fit test, as obtained from their basic statistical training. In these examples, shopfloor monitoring led to process improvements, including the discovery and resolution of measurement problems; reductions in unplanned equipment downtime; the discovery and resolution of raw materials problems; improvements in operating procedures; and improvements in the quality of products downstream. For each example, we analyze here the baseline data set used to initiate statistical monitoring. 5.1 Particle size distribution This data set represents daily checks on the particle size distribution of a ceramic stucco used to make shells for investment casting. There are = 5 particle size ranges, with range 5 being a balance component calculated as 100 minus the sum of the other four ranges. The average values used to calculate X in this example are given in Table 1. Figure 1 shows scatterplots of ranges 1-4. Note that range 1 is positively correlated with range 2, and range 3 is positively correlated with range 4. These positive correlations violate the multinomial/Dirichlet correlation structure reflected in equations (5) and (8). Positive correlations are common in composi- tional process data, indicating the need for models that are more flexible than the Dirichlet approach (Aitchison, 1986). In the present context, non-Dirichlet correlation structures are the reason for the degrees-of-freedom adjustment developed in Section 4.1. For each daily check, a technician extracts a sample of the stucco, performs a sieve analysis (see Holmes & Zook, 1990), and then enters the weight percentages Size range Weight % TABLE 1. Average particle size distribution 1 2 3 4 5 4.24 55.86 29.00 9.85 1.05 Chi-square monitoring compositional process data 595 | 10 12 14 15 8 B 50 55 60 65 3. 1 6 7 45 50 55 60 65 10 12 14 16 2 3 25 30 35 FIG. 1. Scatterplot of the non-balance components of the particle size distribution data. 4 2 005 (c) 0510 bas ped 500 0.1 sequence of checks and additional analyses to identify the affected element(s) and assignable cause(s). The averages in Table 2 are constants in the program. The process team periodically reviews these averages and decides whether they need to be updated. For this data set, step (3) in Section 4.2 yields = 1 after rounding up to the TABLE 2. Average alloy composition for Process A Element A Weight % 0.513 B 0.003 D B F G H 0.060 18.239 0.125 4.900 0.020 0.033 Element Weight % I J K L M N 0 P 2.964 52.823 0.008 0.127 0.004 0.866 0.009 19.306 0.5 1.0 5.0 10.0 berpsd Chi-square monitoring compositional process data (b) 7 chart based on log ratios; (c) X versus 7. 597 beloped 283M.E2 0.5 1.0 5.0 10.0 C.1 0 0.05 0.50 5.00 {b} (a) ped.chsq 0.50 500 009 (c) 50 100 150 0.1 0.5 1.0 5.0 10.0 PLEE; 50 100 150 FIG. 2. Particle size distribution data: (a) X chart; (b) 73 chart based on log ratios; (c) X versus 7. : nearest integer. Figure 3 shows the X2 chart (Fig. 3(a)), the T2 chart based on log- ratio data analogous to equation (16) (Fig. 3(b)) and the scatterplot of X2 by T (Fig. 3(c)). In this case, 72 identifies many out-of-control datum points not detected by X, although there are still a few points detected by X but not by T. The X-chart calculations in this case are embedded in the charge adjustment program. Computations are limited to basic arithmetic operations and storage space is limited. In particular, there is no way to perform a log-ratio transformation and insufficient capacity to store 120 extra constants (sample variances and covariances) in addition to 15 sample means. These limitations make the X chart attractive in this application, even though Fig. 3(c) shows it to be less sensitive than T. The key point is that X2 makes possible multivariate monitoring with virtually no interruption of the normal work routine. 6 Additional examples In this section, we analyze two additional data sets that represent potential applications of our method. 6.1 Alloy composition: Process B This data set represents the weight percentages of k-111 elements and a compos- ite balance component that form successive heats in a process that produces a differ- ent alloy from that in the preceding example. The average values used to calculate X in this example are given in Table 3. Element L. is the balance component. For this data set, step (3) in Section 4.2 yields = 2 after rounding up to the nearest integer. Figure 4 shows the X2 chart (Fig. 4(a)), the 7 chart based on log- ratio data analogous to equation (16) (Fig. 4(b)) and the scatterplot of X by T (Fig. 4(c)). In this case, the performance of X2 is closer to that of 72 than in the preceding case studies. Both charts identify heat 10 as requiring investigation for assignable cause. 6.2 Chemical milling bath composition This data set represents the volume percentages of k-18 acids determined by laboratory analysis of daily samples of a chemical milling bath. The average values used to calculate X2 in this example are given in Table 4. The balance component I is water. Element Weight % TABLE 3. Average alloy composition for Process B A 0.41 6 B 0.303 0.760 D 0.035 E 0.011 F 0.021 0093 (c) 0900 by 1200 900'0 Component Volume % 10 6021.r.tsq TABLE 3. Average alloy composition for Process B Element Weight % A B C D E F 0.41 6 0.303 0.760 0.035 0.011 0.021 Element G H I J K L Weight % 0.024 0.002 0.050 0.008 0.029 98.342 100 TABLE 4. Average chemical milling bath composition A B 0.92 2.51 4.92 D 2.72 E F 7.91 10.33 11.56 6.04 G H I 53.07 Chi-square monitoring compositional process data art; (b) 7 chart based on log ratios; (c) X versus 73. 599 b1200 10 60 100 0.050 by 209 9000 0,500 100 200 300 400 (b) (a) 0.005 a6021 chaq C.500 (c) 5 10 50 100 *6021.sq 0 100 200 300 400 FIG. 3. Process A alloy composition: (a) X chart; (b) 73 chart based on log ratios; (c) X versus T. Using the chi-square statistic to monitor compositional process data RUSSELL A. BOYLES, New Zealand Institute of Industrial Research and Develop- ment, Lower Hutt, New Zealand SUMMARY We investigate the use of the chi-square control chart as a simple multivariate method for shopfloor monitoring of compositional process data. Although this chart is usually considered to be applicable only with multinomial process data, we show that it is also valid, in a certain asymptotic sense, for compositional data that arise from the Dirichlet distribution. For general compositional data, we show that the chi-square statistic can be used for process monitoring, provided that we make a simple adjustment to the degrees of freedom in the chi-square reference distribution. This method is illustrated and compared in four examples with the T chart based on log-ratio transformation of the data. 1 Introduction Compositional data are measurements , 2, u, where is the number of components, u, is the proportion associated with the ith component and 181+ 12+ + 2 = 1 (1) Compositional data are often expressed as percentages rather than as proportions, in which case, the right-hand side of equation (1) would be 100 rather than 1. In many real compositional data sets, the fundamental property in equation (1) is compromised through the effects of measurement error. In many cases, this problem is handled by treating one of 1, 2, as a 'balance' component which is redefined so that equation (1) holds exactly. Compositional data are frequently encountered in the chemical and process industries. Common examples involve products or process materials that are chemical mixtures of several ingredients. Aitchison (1986) gives 40 compositional data sets that represent a wide range of industrial and scientific applications. The = which is redefined so that equation (1) holds exactly. Compositional data are frequently encountered in the chemical and process industries. Common examples involve products or process materials that are chemical mixtures of several ingredients. Aitchison (1986) gives 40 compositional data sets that represent a wide range of industrial and scientific applications. The examples discussed in this paper represent three types of compositional data: Correspondence: R. A. Boyles, 3099 Rosemary Lane, Lake Oswego, OR 97034, USA. 0266-4763/97/050589-14 $7.00 1997 Carfax Publishing Ltd 590 R. A. Boyles particle size distribution, where each component is a size range; alloy composition, where each component is an element or combination of elements; and chemical milling bath composition, where one component is water and the others are acids. The measurements are weight percentages in the first two cases and volume percentages in the third case. The methodology presented applies to any type of compositional data used for process monitoring. For statistically monitoring compositional data, a sensible approach would be to apply the log-ratio transformation described in Aitchison (1986) and to monitor the resulting (k-1)-dimensional data vectors with a standard 72 chart for individual observations (Tracy et al., 1992). As described in Anderson (1958, Section 5.5) and other multivariate analysis texts, 72 is uniformly the most powerful among a broad class of statistics for detecting unanticipated changes in a multivariate mean. In the control charting context, this means that the T2 chart has the greatest probability of detecting assignable causes. Even today, the computational complexity of the optimal approach described. above makes it impractical in many shopfloor situations. However, in shopfloor situations that involve complex processes, a multivariate approach is appealing, because it can dramatically reduce the number of charts that require operator monitoring. This is convenient from a practical point of view, and also provides better control of the overall false alarm rate (type I error). In this paper, we investigate a less powerful but more easily implemented multivariate procedure. We propose monitoring compositional process data with the well-known chi-square statistic, defined for our purposes as + + I k (2) where are the process averages. These will typically be estimated from a baseline data set. When the proportions 1, 2, ..., u arise from k-cell multinomial sampling with cell probabilities 1, 2, ...,, X2 is asymptotically distributed as a multiple of x-1. Control charts for X2 in this case are described chemn.rtaq 5 10 (b) (a) (c) 0 20 40 60 80 100 120 5 10 chem..tag Z 0 20 40 60 100 120 FIG. 5. Chemical mill bath composition: (a) X chart; (b) T chart based on log ratios; (c) X versus 7" the well-known chi-square statistic, defined for our purposes as (14) +...+ I2 (2) where , are the process averages. These will typically be estimated from a baseline data set. When the proportions 1, 2, ..., arise from k-cell multinomial sampling with cell probabilities 1, 2, X2 is asymptotically distributed as a multiple of -. Control charts for X in this case are described in Duncan (1950), Marcucci (1985) and Nelson (1987). As pointed out in Duncan (1950) and Holmes and Zook (1990), the X2 chart based on the - distribution is not valid for all types of compositional data. In many non-multinomial cases, it tends to give too small a value for the upper control limit, resulting in a higher than nominal false alarm rate, i.e. too many out-of- control signals produced by common-cause variation instead of assignable causes. In Section 3, we show that the X2 chart based on - is valid as a certain type of asymptotic approximation when compositional data arise from the Dirichlet distribution. For general compositional data, we develop in Section 4 a further approximation which consists of adjusting the degrees of freedom in the chi-square reference distribution. In Section 5, the method is illustrated in case studies that involve particle size distribution and alloy composition. Additional example data sets are analyzed in Section 6. Summarizing remarks and conclusions are presented in Section 7. 2 Multinomial data To prepare for the derivation in Section 3, we review here the standard asymptotic theory for X2 in the multinomial case. A basic reference for this section is Rao (1973). Chi-square monitoring compositional process data 591 Let x=(x1,x2,...,x) have the multinomial distribution (n,), where =(...) is a vector of positive probabilities that sum to 1. The probability mass function is p(x)= n! x!... xa! for non-negative integer-valued x, with x+ x2+ ... + xx= n. Let u= x/n. Then E(u)= and Applying the procedure of Satterthwaite (1946), we may further approximate the right-hand side of equation (12) by ex for constants cand v obtained by equating means and variances. Thus, we obtain our final approximation where vv~ E(vv)x:/v 2 [E(vv)]* Var(v v) (13) (14) It is instructive to compare equation (12) with the standard goodness-of-fit statistic, say G, for testing that a vector v follows a multivariate normal distribution with mean 0 and known covariance given by equation (11): G= vv=wD A-1 = () * (15) Thus, X (= vv) is a weighted version of G. G is an 'omnibus' statistic, equally powerful for detecting multivariate mean changes in any direction. On comparing equation (15) with equation (12), we see that X2 will be more sensitive to changes along the first few principal components of v, corresponding to the largest of the 5, terms, and less sensitive to changes along the last few principal components of v, corresponding to the smallest of the , terms. The T chart based on log-ratios shares the omnibus property of G. This helps to explain the examples in Sections 5 and 6, where the T2 and X charts identify differing sets of observations as 'out of control'. 4.2 Method Based on the theory given here so far, we now describe the general method for calculating chi-square control limits. (1) Estimate the average values 1, 2, ..., with sample means from the baseline data set. (2) Calculate X2 as in equation (2) for each observation in the baseline data set. (3) Estimate the effective degrees of freedom by inserting in equation (14) the 594 R. A. Boyles sample mean and variance of the X (= vv). (4) Round up to the nearest integer. This helps to compensate for possible
Step by Step Solution
There are 3 Steps involved in it
Step: 1
To summarize the analysis and findings from the text on using the chisquare statistic for monitoring ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started