Testing the accuracy of employee-reported data: An inexpensive alternative approach to traditional methods

Although Information Technology (IT) solutions improve the collection and validation of operational data, Operations Managers must also rely on self-reported data from workers to make decisions. The problem with this data is that they are subject to intentional manipulation, thus reducing their suitability for decision-making. A method of identifying manipulated data, digital analysis, addresses this problem at low cost. In this paper, we demonstrate how one uses this method in real-world companies to validate self-reported data from line workers. The results of our study suggest that digital analysis estimates the accuracy of employee reported data in operations management, within limited contexts. These findings lead to improved operating performance by providing a tool for practitioners to exclude inaccurate information.


INTRODUCTION
In 2002, we conducted a quality audit for a manufacturing firm to discover the source of an uncharacteristic increase in the number of customer returns. The firm employs Statistical Process Control (SPC) to track performance of its production process. After review of the SPC charts, we found no patterns that indicated the level of defects observed in the returned goods, thus suggesting a product design flaw or error in specifications, which were transmitted into the equipment setup and charts. However, further investigation indicated that an experienced line worker (machine operator) fabricated product weight data on the SPC chart, which was discovered after 10 days production were shipped, rejected, and returned from the customer. The operator failed to perform weight check on production runs and randomly assigned fraudulent weights within the SPC limits in a pattern that appeared to be genuine. This meant that SPC did not detect the product defects. This single incident cost the company an estimated $300,000 (US) of which only $40,000 was recoverable.
This issue presented an interesting problem to managers. Since managers rely on employee reported data to make decisions, how can they estimate the accuracy of the information without reinstituting traditional quality control inspection and sampling procedures for which they worked hard to replace? Additionally, since the company's managers prided themselves on trusting employees, how could they ensure data accuracy without instilling a sense of distrust among machine operators? After all, this problem occurred with only one of 21 operators employed at this facility. If managers, in this example, had an inexpensive tool to validate the data reported by the dishonest operator, the problem could have been identified earlier.
The purpose of this study is to provide operations managers with a tool to validate selfreported data from line workers, where the reports are the only source of information, or where secondary sources are difficult or expensive to obtain. The problem with employee reported data is that it provides the opportunity for individuals to manipulate the results, thus reducing its suitability for decision-making. Managers who do not have a secondary source for verifying the accuracy of employee reported data, may find decreased performance in activities and processes that rely on this information. Hence, we identified how other disciplines address this issue. We found that financial auditors commonly employ a method called digital analysis to identify suspect data. One purpose of this study is to apply this method in two companies and industries to evaluate its ability to detect fraudulent data in the context of operations. A second purpose is to extend the use of digital analysis to data types not considered in previous studies, i.e. to distributions previously considered inappropriate for digital analysis.

LITERATURE REVIEW
The literature is replete with methods for gathering and evaluating data from manufacturing processes, most notably Statistical Process Control, statistical (acceptance) sampling, and postprocess quality control inspection (Deming, 1986). While powerful, these techniques rely heavily on the integrity of the workers collecting the data -which at times prove to be problematic (Hales et al., 2004). To combat fraud in self-reported data, managers rely on duplicate measurements and post-process inspections to validate information. The main problem with these approaches is that they are expensive to execute and run counter to many contemporary business techniques of trusting workers and eliminating waste through duplicate efforts (Deming, 1986). In reviewing how other disciplines detect fraudulent data, we found digital analysis.
Digital analysis refers to a technique for estimating the distributions of certain digits, 0 -9, in naturally occurring data (i.e. data that is not intentionally manipulated). The premise of the technique is that naturally occurring data have different distributions than manipulated data. One estimates probabilities for these occurrences and then compares them to actual process data. If there are differences between the actual and estimated distributions, then the data is said to have a probability of containing systematic error or statistical bias (Nigrini, 1996a(Nigrini, , 1998(Nigrini, , 1999(Nigrini, , 2000Nigrini and Mittermaier, 1997). Other applications include detecting fraud in areas such as declaration records (Browne, 1998), and the feasibility of outputs from computer simulations and logistics models (Hill, 1995(Hill, , 1996(Hill, , 1998. While the procedure is not applicable in all situations, it does provide an inexpensive alternative to other forms of validation procedures such as statistical sampling or duplicate measurement. In the only Operations application found, Becker (1982) used the method to estimate the degree to which machine failure rate lists, based on Mean-Time-To-Failure calculations had systematic error, indicating intentional manipulation or defects in measurement processes.
The literature suggests the type of data appropriate for digital analysis. The primary qualifications are that they be from large data sets with preferably large ranges, generated naturally without pre-set limits or breakpoints, and without intervention. Evidence suggests that the larger the range, the smaller the required data set. The distributions are Weibull-like in shape and hold true for populations and representative samples (Brown, 2005;Nigrini, 1996b;Hill, 1995). These assumptions require that a test for appropriateness be conducted using values known to be generated without intervention. If the observed probability distribution matches that predicted by the analysis, then the data set is considered appropriate for testing. In terms of digits, this means that each digit's place in a numerical value has a distinct probability of having a value 1 -9 for a first place digit, and 0 -9 for a second place and subsequent place digits.
In investigating the theoretical support for digital analysis, we discovered a principle called Benford's Law (hereafter Law). The phenomenon on which the Law is based was first discovered in 1881 by astronomer Simon Newcomb (Newcomb 1881), and then independently in 1938 by General Electric physicist Frank Benford (Benford 1938), for whom it is named. The Law states that digits of data generated under certain assumptions do not occur randomly, but in distinct patterns. For example, the so called first digit pattern can be determined using equation

1.1
The distribution predicted by equation 1.1 is summarized in Table 1. Table 1  This model has been used to accurately predict many empirically-based distributions (Hill, 1996b). Since Benford's discovery of the phenomenon, researchers found that probabilities can be estimated for digits occurring as 'second', 'third', 'fourth' and 'fifth' places (Nigrini, 1999). A general formula for finding these probabilities is known as the General Significant-Digit Law (Hill, 1995 (Nigrini and Mittermaier, 1997). It is important to note here that systematic error occurs from two sources; first, by intentional manipulation of data, and second, by faulty measurement instruments. In an operations context, discovering the true cause of systematic error before assigning blame is crucial.

METHODOLOGY
Digital Analysis involves two steps. The first step is to verify that the distribution of accurate data from a process operating under controlled conditions conforms to the expected distribution determined based on equation 1.2 above. It serves to establish applicability of the methodology.
Second the expected distribution is compared to actual data from operations, thus testing for manipulation or fraud. To test the applicability of digital analysis in operations, we chose to examine data gathered at two manufacturing firms. To compare observed and expected distributions we use Chi-square analysis. A similar approach was used by Nigrini (1996), Hill (1996b, and others on the basis that since data conforming to digital analysis are not normally distributed the use of distribution-free techniques is appropriate. However, it is important to note that Chi-square tests are only appropriate for data sets with fewer than 10,000 observationsbecause above this level the calculated statistic almost always exceeds the critical value (Nigrini, 2000). As a prerequisite to the publication of our findings, we agreed to protect the identity of one of the firms. The first company (company A) is a plastics manufacturer owned, at the time of this study, by Constar Inc. -a fortune 500 company. Company B is a manufacturer of small storage units owned, at the time of this study, by MWI Inc.
The event described in the introduction occurred in Company A. The data in question pertained to the weights of plastic bottles manufactured with extrusion blow molding. As bottle weight stability is a key indicator of process performance, controlling its variation is important to assuring quality products. The company tracked bottle weights using SPC charts (X-bar chart) to monitor process performance. We propose that these charts could be checked for systematic error by using digital analysis. In addition to studying bottle weight data, we examined SPC charts used to control the length stability of wood components at Company B. As a benchmark, we investigate the applicability of the digital analysis method in an additional manufacturing firm with different products and specifications. Company B manufactures components for and assembles small wooden storage buildings. Length stability (consistency of cutting equipment) of these components is a key indicator of process stability.
Once the data were validated, Company A afforded the researchers the opportunity to study the possible benefits of implementing digital analysis in one of the company's production plants. To do so, we conducted a follow-up study to estimate the benefits of discovering systematic bias in bottle weight reports. Since news of the fraudulent operator was public knowledge, we waited six months before conducting the follow-up study. With the concurrence of the General Manager of the company, we conducted the follow-up study as an experiment. To prevent line operators from biasing the collection of baseline data, they were not initially informed of the purpose of our study. To begin the experiment, we examined data from SPC charts that were generated over the previous 10 days. Then, operators were notified that the researchers would be reviewing SPC charts to examine the accuracy of weight check data. After 10 days we compared the "before" and "after" results.

The Empirical Studies in Companies A and B
First, we tested the suitability of digital analysis for the bottle weight data in Company A. We found the actual distribution of weights not to conform to that predicted by digital analysis. This occurred because the first two or three digits of bottle weights were nominal and dictated by customer specification. Therefore, they could not randomly take on any value between 1-9, and 0-9 respectively. For example, quart bottles are produced so that the average weight is at 53.00 grams, with an accepted variation of ± 3.00 grams. This means that the process is designed to produce quart bottles between the values of 50.00 -56.00 grams, with two decimal place sensitivity. This characteristic traditionally disqualified the data as appropriate for digital analysis because the first digit is always 5 and the second digit can vary only between 0 and 6.
However, we observed an interesting phenomenon in the digits following the decimal place.
While examining a sample of 515 quart bottles, we recognized that the third and fourth digits vary according to the pattern predicted for the third and fourth digits by digital analysis. Next, we randomly examined samples of 523 one-gallon bottles and 576 pint bottles and found a similar phenomenon in the latter digits. The sample sizes were conservatively chosen because there are no useful guidelines for the minimum number of observations needed for digital analysis in the literature. Further, we found no existing study with less than 100 data points.
Based on this exploratory finding, we formulated two hypotheses for quart weights. While each hypothesis is individually listed in The results were surprising in that the quart weights had the characteristics predicted by digital analysis for the third and fourth place digits. While the results provide justification for using digital analysis for further testing, it represents only a single test case involving one SKU, in one industry. Therefore, we formulate two additional hypotheses to test gallon-size bottles with weight specification 300 ± 10.00 grams, and found the third and fourth digits conformed to the expected distributions. Last, we tested pint-size bottles with weight specification 22 ± 2.00 grams. These also conformed to the expected distributions for the third and fourth digits. These data and the statistical results are shown in Table 2. Table 2  in.). The resulting data, hypotheses, and results of the analysis are shown in Table 3.

The Experimental Study in Company A
To estimate the benefits that digital analysis could provide to the firm, we conducted a randomized experiment. The experiment involved collecting baseline data on five line operators (chosen at random), then introducing the treatment factor (our visible supervision of weight checks), and collecting follow-up data. As part of the baseline data, we conducted digital analysis for a period of ten days, segregated by operator and bottle type. Since no minimum sample size requirement has been established in the literature, we examined the data in aggregate, by product type. This meant that there were approximately 2200 weight checks examined per bottle-type. This is four times that used in our preliminary study to qualify the bottle weight data for digital analysis.
Out of the five operators, two showed strong bias toward systematic error and one had mixed results. The other two operators did not have Chi-square values that exceeded the critical value for rejection of the null hypotheses. The results were consistent across product types, in the sense that the observed patterns were not different from the expected patterns but inconsistent across operators. Within bottle types we detected systematic error only when segregated by operator. This indicated that the systematic error was caused by the line operators and provided strong evidence that two of the five line operators were manipulating data.
We then applied the experimental treatment by notifying the line operators that we will be observing them performing weight checks. Next, we conducted a follow-up analysis of their SPC data sheets. Since the five operators worked only on first shift, we were able to periodically observe their behavior and make our supervision obvious.
During the first day of observation, the two operators with systematic errors were extremely busy, and as such were not able to perform weight checks on time. In fact, two of the production lines were down during some weight checks. When these two operators asked us how they should complete their reports, we instructed them to perform weight checks on startup and note the time discrepancy. While the remaining three operators were also busy, they had time to complete weight checks as scheduled. This same phenomenon was repeated on day two and three.
At the end of day three we met with the General Manager and informed him that two of the line operators did not complete their weight checks during the three days of observation; however, according to their reports, they had completed them during the pervious ten days. In addition, the operators asked us how to conduct weight checks when they were busy or a line was down at the time the check was to be conducted, which appeared odd to us. A question arose as to what procedure they, and the other operators, were following prior to our experiment when production lines were down.
After some investigation, the General Manager discovered that line operators were reprimanded by supervisors when SPC charts were incomplete. The standard procedure was to have another operator conduct weight checks if a line operator was tied up. Since down production lines were extremely costly, operators were required to make restarting them a priority over all other responsibilities in their jobs. Another finding was that the two operators in question were responsible for lines which were soon to be overhauled as part of scheduled maintenance (at the end of their maintenance cycle) and thus, required a great deal of what was referred to in Constar Inc. as "hand-holding" or "baby-sitting" to keep them running. In fact, the supervisor informed us that the two lines were well beyond their maintenance cycle, but were kept in service due to unexpected demand spikes. In addition, when questioning the other three operators on why they had not assisted the two busy operators, the supervisor found an apparent personal conflict between them. To avoid airing the personal conflicts to the supervisor and receiving reprimands due to incomplete reports, the two operators were filling in numbers on SPC charts.
While we were not able to identify a priori the cause of the systemic errors in the reports from two of the operators, digital analysis revealed the bias. We thought we would improve the data accuracy by informing the operators we were checking their work; however, we found solid reasons why the data were being manipulated. This corroborates our earlier remark that digital analysis should not be the sole basis for assigning blame, but instead it provides a basis for further investigation. In addition, the one operator that had mixed results prior to revealing our intentions to supervise the weighing procedure also had mixed results during our observations.
We could find no reasonable cause for the indication of manipulation at times, and no manipulation at other times. The operator appeared to conduct the weight measurements properly.

RESULTS AND IMPLICATIONS
Based on our findings, the company decided to change the way it conducted SPC analysis.
While disappointed with the way the operators failed to cooperate, the General Manager saw the effects of keeping equipment in service well beyond its maintenance cycle and the reliance of the current system on perfect cooperation between line operators -which is unreasonable to expect from a human behavior perspective. To correct the problem, the company spent $2,000 (US) per production line on a system that would automatically report bottle weight results to a quality department computer as soon as the operator placed the bottles on a scale. The system would then automatically generate SPC reports. The operator, supervisors, and others could monitor these reports to detect out-of-specification conditions. This system would signal the line operator when to perform the weight checks, and would record the results directly from the scales, saving the operator approximately 45 minutes per day in paperwork. If a weight check was missed, the supervisor was notified. The supervisor would then take corrective action.
Except for occasional incidents, this new system afforded line operators the time to perform proper routine adjustment on machines, and weight checks as long as equipment is not operated outside of its maintenance cycle. Based on managerial cost estimates of past quality problems attributed to similar failures of the system, Constar estimated a savings of $300,000 (US) per month.

CONCLUSIONS AND LIMITATIONS
This study demonstrates how to conduct digital analysis, and suggests potential benefits from its use in operations management. We qualified two types of manufacturing line measures as appropriate for digital analysis and, in doing so, found it could be extended to data types previously considered inappropriate. We found no previous study applying digital analysis in this manner. In addition, we used an experiment to demonstrate how to detect systematic problems and provide cost benefits to an adopting firm.
A significant benefit of this application is that it is relatively easy and inexpensive to use, making it a preferable method even in the presence of other verification options. In addition, it can be performed without the knowledge or assistance of line workers who generate the data.
Other techniques find this difficult to accomplish. In our example given in the introduction section, one may catch the fraudulent reports early, thus preventing a loss of customer goodwill and the scrapping of ten days production. However, following the lead of others, we propose its use as a tool for identifying suspect data, and not as final evidence of fraud or non-systematic error. Lastly, we propose that digital analysis be extended to data containing some degree of nominal values. We suggest that the minimum criteria is that at least one digits place must contain random values that are appropriate for digital analysis.

Limitations of this Study and Digital Analysis
Although promising as an analytical tool, digital analysis does not determine fraud, but it may suggest fraud. For example, Hill (1995) warns that its reliability in detecting accounting fraud is dubious, since it provides a large number of false positives. It is logical to assume that false positives would also occur in an operations context -especially in extreme digits. However, Nigrini, et al. (1998Nigrini, et al. ( , 1999, have ambitiously promoted its use in auditing. They argue that while digital analysis is not deterministic of fraud or manipulation, it provides a solid basis for segregating suspect data -data with a high probability of manipulation -from data with an extremely low probability of manipulation. This process can be used as a tool to focus scarce firm resources. In addition, the identification of suspect data may be important to managers who rely on the information for making decisions.
Another limitation of this study is its examination of distributions of digits in extremewhere the patterns begin to approach randomness. This means that the higher the placement of a digit in a numerical value, the closer the distributions become to random patterns -especially at the extremes of fifth digits. There is a paucity of research validating the use of digital analysis at such extreme values with large data sets. This calls into question the veracity of using digital analysis for these extreme values and highlights the need for further empirical testing before generalizations can be made.
As only 131 articles exist on the topic, its application is still exploratory and requires further testing. This study only examined 12 products, comprising two data types in two industries. A much larger data set is required to verify our extension of the analysis to new data types previously considered inappropriate.