Estimating σ and Improving Control Limits with RWAV for Normal and Non-normal Processes

ABSTRACT In this article, we compare the statistical properties of , , and RWAV methods for estimation of variance of a process for quality control purposes. We investigate effects of non-normality for different estimators. Our results indicate that RWAV gives the best estimates of the standard deviations for both normal and non-normal processes. We recommend the construction of control charts with RWAV.


INTRODUCTION
Statistical process control (SPC) is useful for quality assurance in production and other operations. The Shewhart control charts, i.e., X charts, R charts, and S charts, are one of the most popular tools of SPC. The control charts have two phases. Phase I is the chart construction phase, and Phase II is the monitoring phase. In Phase I, one needs to make sure that the underlying process is in control before he formally estimates the central line and control limits. We also imply that one assumes (and verifies) that the underlying process is normal because the sample size employed in Shewhart control chart methods is usually small (less than 30 in most situations). In Phase II, the quality analyst monitors the process for assignable causes of variation associated with unexpected changes in the process parameters; i.e., the process mean and process variability (Shewhart, 1931). Any advantage an estimator may have relates to the accuracy of parameter estimates (Del Castillo, 1996).
For m samples of size n, the sample range and sample standard deviation are and are estimates of the process standard deviation. If the underlying process is normal, the unbiased estimates are, respectively, r rðRÞ ¼ R=d 2 ðnÞ ½ 3 r rðSÞ ¼ S=c 4 ðnÞ ½ 4 where d 2 ðnÞ and c 4 ðnÞ are dependent on sample sizes, and In Phase I of SPC, we distinguish within-sample variation from between-sample variation. One advantage of this distinction is that information acquired within the samples reflects the process variation when the process mean is in control and the process variation is constant. The second advantage is that both R=d 2 ðnÞ and S=c 4 ðnÞ measure within-sample variation since estimators in Phases I and II may not be, or do not have to be, the same. If necessary, we employ a different estimator for Phase I than for Phase II to obtain a better estimate of the within-sample variability.
Sample sizes may change from sample to sample for a variety of reasons (Montgomery, 2005, see pp. 109-111 and 123). To accommodate variable sample sizes, Montgomery (2005, pp. 194-247) suggests several estimators applying R and S . Their recommendation weights the variance of each sample and corrects the bias with the coefficient of aggregated sample size, f þ 1 ¼ P m i¼1 n i À m þ 1. Therefore, the unbiased estimator for within sample variability iŝ A proof of [7] is in Appendix 1. When f is very large then [7], is equal to unity and becomes the root weighted average of variance, RWAV: Asr rðSÞ, or RWAV=c 4 ðf þ 1Þ, is unbiased, we consider the slightly biased estimator by RWAV in the following sections. We begin by analyzing possible values of c 4 ðf þ 1Þ other than unity in Table 1. For the sample size n, denote n n ¼ P m i¼1 n i =m as the mean sample size. Hence, we have f ¼ ð n n À 1Þm. The bottom row of Table 1 lists the reference values of c 4 ðn ¼ n nÞ for comparison. As suggested by Burr (1976) and Nelson (1984), c 4 ðf þ 1Þ is approximately one; therefore, we use RWAV (Eq. [8]) to estimate the process's standard deviation in Phase I. This is convenient since it does not require one to check or calculate the bias correction factor c 4 .
Among the many estimators proposed in Burr (1976) and Nelson (1984), all except RWAV= c 4 ðf þ 1Þ and RWAV reduce to R=d 2 or S=c 4 ðnÞ, when sample sizes are equal. However, RWAV= c 4ðf þ1Þ is more efficient than R=d 2 ðnÞ and S=c 4 ðnÞ since equal sample sizes is a special case. Moreover, Burr (1976) did suggest the use of RWAV=c 4 ðf þ 1Þ and RWAV when sample sizes are not equal. Later, we will recommend using RWAV for equal sample sizes as well when we focus on the quality of estimators.

EFFICIENCY AND ACCURACY OF ESTIMATORS
With the definition of statistical relative efficiency that the more efficient estimator has smaller mean squared error (MSE)-i.e., the expected squared Estimating r and Improving Control Limits deviation of the sample estimate from the population parameter-we notice that the relative efficiency for unbiased estimators just means that the estimator with smaller variance is more efficient. It is known that when the underlying processes are normal, R=d 2 ðnÞ and S=c 4 ðnÞ are unbiased, and R=d 2 ðnÞ is less efficient than S=c 4 ðnÞ. We can further show that the relative efficiency between the two unbiased estimators, RWAV=c 4 ðf þ 1Þ and S=c 4 ðnÞ, is For certain m, the number of samples in Phase I, the variance of RWAV=c 4 ðf þ 1Þ does not depend on how the sample sizes vary. Instead, it depends on the mean sample size n n, since f ¼ ð n n À 1Þm. The ratio [9] compares the efficiency of RWAV=c 4 ðf þ 1Þ at ( n n, m) where sample sizes vary and that of S=c 4 ðnÞ at (n, m) where sample sizes are equal. In Table 2, we find the numerical values of [9] indicating that RWAV=c 4 ðf þ 1Þ is more efficient than S=c 4 ðnÞ in measuring within-sample variability (hence, it is also more efficient than R=d 2 ðnÞ). The relative efficiency in [9] is more sensitive to sample size, n, than the number of samples, m, while the relative efficiency between R=d 2 ðnÞ and S=c 4 ðnÞ is not related to m. In Phase I where n is often 4 or 5 and m varies between 30 and 50, we observe in Table 2 that S=c 4 ðnÞ is roughly 5% less efficient than RWAV=c 4 ðf þ 1Þ.
Previously, others debated for decades on whether the mean square error (MSE) or mean absolute error (MAE) should be used when assessing the quality of estimates and=or forecasts Jarrett, 1974, 1986;Hanke and Wichern, 2005;Jarrett, 1991). The MSE penalizes greatly for large errors rather than penalizing proportionally for all errors. MSE is most appropriate when economic circumstances require severe penalties for large errors. The use of MAE is the common in industrial applications. In fact, the same logic of using MAE is also found in Burr (1976) where he emphasized that the reason for using R and S in the estimation of control limits is to avoid the greater penalties for larger errors. And this idea has been adopted in SPC since Shewhart (1931).
Our purpose in this study is not to debate whether MSE or MAE is more appropriate. We do note that for a given probability distribution there exists a relationship between the MSE and MAE; i.e., the expected absolute deviation of the sample estimate from the population parameter. For example, the variance of a normal distribution is 1.25 times the MAE. And for other distributions the multiplier is different from 1.25. Therefore, if one estimator has smaller MSE than the other, it must also have smaller MAE. In other words, the two criteria are consistent and will lead to the same decision in the comparison of estimators. In the following we will use MAE as our criterion.
When the underlying processes are not normal, the estimators mentioned above for process standard deviation are all biased. Hence, using variance to assess the relative efficiency as in [9] is not available anymore. To assess the estimation quality of an estimator for the parameter r, we use the estimated MAE, where k is the number of replications of taking the same sampling of m samples of size n, andr r j is the result of estimate of r for the jth sampling, no matter what estimator is used. The process is simulated for certain probability distributions that are not necessarily normal. Now that all the estimators mentioned above are biased for non-normal populations, the use of the bias correction faction c 4 in RWAV=c 4 ðf þ 1Þ is not necessary anymore. We argue that for simplification and convenience we can directly use RWAV to estimate the process standard deviation. We will use MAE as the criterion to compare RWAV and the other estimators mentioned above in non-normal environments. We will show that RWAV is the most efficient estimator in general situations regardless of whether the sample sizes are equal or not and regardless of whether the underlying process is distributed normally or not. The benefit of using RWAV instead of RWAV=c 4 ðf þ 1Þ is that the values of c 4 ðf þ 1Þ are not available in common reference tables.
We first compare RWAV with the other estimators in normal environment. Table 3 shows the simulation results of MAE for the unbiased R=d 2 ðnÞ and S=c 4 ðnÞ and the biased RWAV. It is clear that RWAV, although biased, is better than the other two in that it has smaller MAE. Note, also, that R=d 2 ðnÞ has smaller MAE than S=c 4 ðnÞ as the sample size, n, increases, which is consistent with the result in [9]. In the comparison, R=d 2 ðnÞ and S=c 4 ðnÞ are obtained in the environment of equal sample size simulation. The RWAV is obtained in the environment of equal and unequal sample size simulation, while keeping n n ¼ n. Generally, we let 50% of the samples have size n, while the other 25% samples have greater sizes and another 25% have smaller sizes.

NON-NORMALITY EFFECTS
A good control chart should be robust for the underlying processes of non-normal distributions. Burr (1967) and Schilling and Nelson (1976) reported that x x chart is robust and useful when processes are not normal. We studied the effects of non-normality of the underlying process to ascertain whether RWAV is again the most accurate estimator. By simulation, we study the effects of logarithmic normal and gamma processes for different parameter values and compare the MAEs of RWAV, R=d 2 ðnÞ, and S=c 4 ðnÞ. Table 4 shows the winning estimator (marked as WINNER in the table) among these three estimators that has the smallest MAE for different lognormal processes and sample sizes. The MAEs are calculated for 200 repeated simulations. Again, RWAV is simulated in the way of equal and unequal sample sizes while keeping n n ¼ n. In reality, the underlying process may vary greatly; hence, all the estimators are not entirely unbiased. At the bottom row of the table, we report  Table 4, we observe that in 81% of the cases where the underlying processes follow a lognormal distribution, RWAV is the most accurate estimator. In only 19% of the cases is S=c 4 ðnÞ the best. The results for gamma distributions (the data are not reported here) are similar. Therefore, in general, using RWAV yields more accurate control limits than other estimators. Often, production processes that are not normal contain skewness and are often approximated by either a logarithmic normal or gamma functions. Hence, our results support the use of RWAV. Last, we should note that Does and Schriever (1992) and De Mast and Roes (2004) also attempted similar pooling of variances to obtain the within-sample standard deviation for control charts. Our results are simpler and more straightforward but agree with their general findings.

CONCLUSION
Estimation of the process standard deviation is important in SPC and is fundamental to the construction of control charts. Our aim was to find simple and straightforward estimates to construct the control chart in the best manner. Based on the evidence presented, we recommend the use of RWAV for both equal and unequal sample sizes. Furthermore, we justify the use of the biased RWAV for non-normal underlying processes. Compared with others, RWAV is an efficient estimator for the within-sample variability of all underlying processes, no matter whether normal or non-normal. Consequently, we recommend the use of RWAV to construct Phase I control limits. We feel that employing RWAV in Phase I leads to better construction of quality control charts. In addition, calculating RWAV is no longer a problem and is universally beneficial to those in the quality control profession.
Note that RWAV is not available for Phase II. Since one checks for stability in Phase II, we monitor samples in Phase II one at a time. Estimators such as R=d 2 ðnÞ and S=c 4 ðnÞ are still useful in Phase II. Last, the steps in control chart construction are as follows: . Phase I, the central line is obtained in the usual manner (Central Line ¼ x ¼ ) and we construct the control limits for sample size, n, by . Phase II, we construct the control chart for R as follows: Also, we construct the control chart for S by Ã WINNER means the estimator with smallest MAE among the three under certain process conditions. For example, if a cell is the WINNER and is column RWAV, it means RWAV possesses a smaller MAE than S S=c 4 and R R=d 2 . MAE is the WINNER and is shown at the cell on the left.