A Comparison of Five Rules for Determining the Number of Components in Complex Patterns

The performance of five methods for determi ning the number of components to retain (Horn' s parallel analy s is, Velicer 1 s MAP, Cattell 1 s SCREE, Bartlett's test and Kaiser ' s eigenvalu e greater than unity) was investigated across seven systematically varied factors (sample size, number of variables, number of component s, component saturation, equal or unequal numbers of var iables per component, and the presence or absence of unique and complex variables). Five sample correlation matrices were genera t ed at each of two levels of sample s ize from the 48 known population correlation matrices representing six levels of component patt ern complexit y. The performance of the parallel analysi s and the MAP methods was generally the best across all situati ons. The SCREE test was generally accurate but variable . Bartlett 's test was less accurate and more variable than the SCREE test. Kaiser ' s method tended to severely overestimate the number of components. Recommendations concerning the condi tions under which the methods are accurate and th e most effective and useful applications of combinations of methods are discuss ed.


Introduction
The r epresentation of a l arge set of obser ved variables (P) by a smaller set (m) has been identified as a common problem in the behavioral sciences (Bartlett, 1950(Bartlett, , 1951; Van de Geer, 1971). Hor st ( 1965) and Van de Geer ( 1971) discu ss principal component analysis (PCA) as an approach to thi s problem. Another approach is common factor analysis (CFA). Altho ugh both PCA and CFA all ow a l arge set of observed variables to be represented by a smaller set, there is di sagreement concern in g how to determine the number (m) of components or factors required to construct the smal ler set . Thi s study presents the r esu lts of a Monte Carl o evaluation of five methods for determining m.
Principal Componen t Analy s i s Hotelling (1933) introduced this widely used procedure (Gl ass and Taylor, 1966;Kaiser, 1970). The first princ ipa l compone nt, Y 1 , i s defin ed as th e weighted combin ation of the P observed variables which has the greates t sampl e variance under the constraint t he weight vector i s of unit length. Each subsequent principal component Yj is s imil arly defined as the weighted combinat ion with maximum variance and unit l ength weight vect or that i s or thogonal to all previous compo nents.
The principal compon ent so lu tion may be viewed as an eigen decomposition of the P x P sample corre l at ion matrix R, where R = L 1 D 2 L o 2 is the P x P diagonal matrix containing the eigen roots of Rand Lis a P x P matrix which contains the corresponding eigenvectors. When where Dm contains the first m eigen roots and Lm contains the correspo nding first m eigenvectors. Kaiser (1970) reported on the widespread use of PCA in this manner. Velicer (1974Velicer ( , 1976Velicer ( , 1977 and Velicer, Peacock and Jackson ( 1982) have shown this use of PCA and CFA result in essentially equivalent sol utions .

Factor Analysis
A second class of procedures, called common fa ct or analysis (CFA) has also been employed to express a set of P variables more pars imoniously as a smaller set. The factor analytic model specif i es a P x P correlation (or covariance) matrix may be accounted for by m common and P unique factors. This model may be expressed as where A is a P x m pattern matrix and u 2 is the P x P diago nal matrix of weights for the unique factors. U is conceived of as that part of the item score not 11 explained 11 by the common factors . It is import ant to note that mis frequently ass um ed to be known for the derivation of these procedures. Sometimes the maximum lik elihood test is employed to test if the assumed numb er of factors is correct .

Selection of Techni ques
Since both CFA and PCA are available as data reduction techn1ques, it i s important to note some differe nces bet ween them.
The CFA approach requires th at m, the dimension of th e reduced set of variables, be known pr i or to the analysis.
The val ue of m may be determined in one of two genera l ways. First , some method of determining m may be applied to a PCA solution and th e result the n used i n the factor analysis solut ion. A second approach uses a maximum l ikelihood test of t he s ignifican ce to test the fit for different valu es of m. Unfortunately, many of the methods applied to t he PCA solution prov i de different results f r om each other and from the maximum li kelihood approach . Furt her, Jackson and Chan ( 1980) have discussed numerous diff i cult i es with in the maximum likelihood approach itse lf. In addition to these difficult i es some doubt has been cast upon th e fact or anal ytic mod el presented in equat i on (3).
An indete rminacy has been identified in the simul taneous es ti mation of A and u 2 Schonemann and W ang, 1972;St eigen and Schonemann, 1979). This indeterminacy is i nherent i n the CFA procedure. In light of diff iculties assoc i ated with the requirement that m be known a pr i or i, th e indeterminacy of th e factor model, the widespread use of PCA, and t he general comparabili ty of resu lt s across the two methods, this study focused on the PCA procedure.

Properti es of Ret ained Compo nents
The comparison of methods to dete rmine the number of components to retain req uires a description of the qualities desirable in a reta i ned compone nt. A revi ew of the properti es of principal components, lin ked with the goal of data summarization, pr ovides such a discr ipti on.

4
Number of subst anti al loading s. Intuitively, a parsimony appli cation of PCA requ ire s each retained compone nt to contain at least two substantial l oadin gs. Summarizing power is lost unless at least two variables are represented. Al gebra ic (Anderson and Rubin, 1956) and statis tic al (Lawley, 1940, Morrison, 1976 examinations of CFA agree that at least t hr ee variables are required before the first factor can be identified. Anderson and Rubin ( 1956) have further demonstrated that each subsequent identifiable factor must contain at le ast three non-zero lo adings . At a sample l evel, a minimum of at least three signif i cant loadings are required for factor identification. Since compl ex load ing s satisfy this requirement it is not necessary that P be gr eater th an or equal to 3m.
Variance accounte d for . Principal components analysis proceeds from a corre la t i on matrix, a sta ndard i zed variance-covariance matrix in which t he variance of each original variable is equal to 1.0. The varia nce of each principal component is equal to the eigenvalue of t hat com ponent. The sum of all P eigenvalues i s equal to P, the number of var i abl es . An eigenvalue of 1.0, therefore, accounts for as much variance as that of a s in gle variable.
Components with e igenvalu es near zero pr ovide no summarizing power. A compo nent with an eige nvalu e greater t han 1.0 provides more summar izing power than the or igin al variable.
Compone nt reliability. Kaiser (1960) and Kaiser and Caffr ey ( 1965) addressed the i ssue of component re li abilit y . Noting that a component must be reliab l e to be usefu l, Kaiser ( 1960) reported that the reliabili ty of a compone nt will always be non-negat ive when the eigenvalue exceeds 1.0.  noted th at this approach to reliability includes all P variab le s regardless of their component loadings.
In applied usage component scores are usually generated as an unweighted sum of those variables with substantia l component loadings. Reliability estimates based only on those items contribut i ng to the component score can be quit e high even when the component eigenvalue is below 1.0 . probably be of interest to most investigators. Components which have l ess than three substa nti al loadings but an eig envalue of 1.0 or greater and components which have more than three substantial loading s but an eigenvalue of less than 1.0 may be of interest to some investigators and will be referred to be low as minor (MNC) compone nt s. Finally, components with both less than three substantial loading s and an eigenvalue less than 1.0 should never be retained . Such components will be referr ed to below as trivial (TC) components. Table 1 represents these operational definitions of major, minor and trivial (MJC, MNC, TC) components.
Given these three categories of components, the performance of various rules for retaining components may be examined.
Determinin g the Number of Components 6 A number of rules have been suggested to determine the appropriate number of components to retain (Bartlett, 1950(Bartlett, , 1951Cattell, 1966;Crawford, 1975;Everett, 1983;Joreskog, 1962;Kaiser, 1960;Revelle and Rocklin, 1979;Veldman, 1974;Velicer, 1976). These rules often do not give the same results (Anderson, Acito and Lee, 1982;Cattell and Vogelman, 1977;Linn, 1968;Zwick and Velicer, 1982) . Applied re searc hers are, therefore, often at a loss as to how to proceed. Conflicting research conclusions can be traced to differing methods of defining th e correct number of components.
This section will describe the five methods to be evaluated in this study. The methods are: 1) the Bartlett test; 2) the eigenvalue greater than 1.0 rule; 3) the minimum average partial rule; 4) the scree test and 5) the parallel analysis method. Bartlett's test (BART). Following Lawley's (1940Lawley's ( , 1941 test for maximum likelihood factor analysis, Bartlett (1950Bartlett ( , 1951   BART appears sensitive to the number of subjects employed. Gorsuch (1975) argued that as the number increases, the tests of significance become more powerful and, therefore, less and l ess substan tial differences between eigenvalues are found to be significant. This can lead to the retention of more components as a function of the number of subjects, other things being equal . In response to this, Horn and Engstrom {1979) have suggested changing the alpha level at different l evels of N. It should be recalled, however, that as the sample size increase s the estimates of populat ion eigenvalues will become increasingly accurate. This increased accuracy leads to smaller differences between equal eigenvalues . It should be the case that, within reasonable ranges of sample size, this increased accuracy offsets the increased power of the Bartlett test when the population eigenvalues are actually equal. In such cases, Zwick and Velicer ( 1982) found the BART test to be somewhat more accurate with relatively large r samples than with sma 11 samples .
Eigenvalue greater than l.0 · (Kl). Perhaps the most popular , certainly the most commonly employed method, is to retain the components with eigenvalues greater than 1.0 . This method is based on one of three lower bounds discussed by   is necessary and suffic ient that the associated eige nvalu e be greater than one" (p. 145) and "the number of eigenvalues greater than one of the observed correlation matrix led to a number of factors corresponding almost invariably, in a great number of studies, to the number of factors which practicing psychologists were able to interpret" (p. 145).  noted that many user s follow Kaiser (1960) and employ the Kl rule to determine the number of components rather than as a lower bound as originally presented.
Difficulties associated with this use are noted by Mote ( 1970) and Humpreys (1964) who argued that rotation of a greater number of components resulted in more meaningful solutions. They imply the rel at ivel y blind use of the Kl rule therefore, may sometimes lead to the retention of too few components.
A number of researchers (Browne, 1968;Cattel l and Jaspers, 1967;Lee and Comrey, 1979;Linn, 1968;Ravell e and Rocklin, 1979;Yeomans and Golder, 1982;Zwick and Velicer, 1982)  numerous but small •... The substantive factor s will be extracted first and the smaller trivial factors will be removed later." (Gorsuch, 1974, p. 152 Cliff (1970) found it to be accurate, particularly if questionable components are included. Cattell and Jaspers (1967) found th e test to be correct in 6 of 8 cases , while Cattell and Vogelmann ( 1977) reported the te st to be accurate over 15 systematically differing analyses. Further, Cliff and Hamburger (1967) found more definite breaks with larg er (N = 400 vs. N = 100) sample sizes and Linn (1968) (Zwick and Velicer, 1982).

M ethods To Be Included
The correct determination of the number of components has been identified as a crucial step in the data reduction application of PCA. There continues to be general disagreement concerning the best method to accomplish this step. This study compares the performance of five decision methods on simulated data sets incorporating variables expected to influence each method. The Kl method was included because it is so widely used. The MAP method was inc l uded because of its unambiguous solution, its relation to "common factor" concepts and its good performance in a recent study . Bartlett

Method of Data Generation
Studies of the effect i veness of the various decision methods may be categorized into one of two types. Historically, the more common type of study employed real data representing either new work or "classic" data sets. These studies employed some logical criteria concerning the appropriate number of components and compared the performance of the proposed decision method to the logically arrived at value (e.g., Catte ll, 1966;Humphreys and Montanelli, 1975;Velicer, 1976). Such studies, in employing an arbitrary logical criterion, may have inaccurately estimated the performance of the decision method in question. More recently, studies of decision rule effectiveness have employed correlation matrices generated from component structures entirely under the control of the investigator (e.g., Anderson, Acito and Lee, 1982;Cattell and Vogelman, 1977;Tucker, Koopman and Linn, 1969;Zwick and Velicer, 1982). These studies have the advantage of a known criterion against which to measure the performance of the decision methods. The fssue of generalization to rea l data sets is an important but separate issue which may be independently addressed in the particular way the data is generated.
This study employed an approach similar t o the "middle model" of Tucher, Koopman and Linn (1969).

Procedure
The number of variables (P) to be employed was set at 36 and 72.
These values represent small and moderately large data sets and accommodate constraints imposed by the se lecti on of the number of components to be included. Larger sets of variables have been shown to have a positive impact on MAP, BART (Zwick and Velicer, 1982) and SCREE (Cattell and Vogelman, 1982) and a negative impact on Kl (Zwick and Ve li cer, 1982).
The sample sizes (N) chosen were selected to reflect coITITion, applied usage. They were set as a function of the number of variables. The lower N was set at twice the numb er of variables.
The higher N was set at five times the number of variables. The resulting N's were 72 and 180 in the cases including 36 variables.
When 72 variable cases were examined, N's of 144 and 360 were selected. These appear to inc lude a representative range of sample sizes as reported in applied educational and psychological research.
Larger sample sizes have been shown to moderately improve the performance of the MAP, SCREE and Kl methods (Cattel l and Vogelman, 1977;Zw ic k and Velicer, 1982} and to some times imp rove and some times weaken the accuracy of the BART method (Gorsuch, 1975; Zwick and Velicer, 1982}. As described above, major components (MJC) are defined as those with three or more substant ia l loading and an ei genvalue greate r than    The SCREE test was performed on com puter generated plots of the eigenva lue s of each of t he 480 matri ces . These pl ots were examined by two raters tr ained (Cattell and Vogelman, 1977) in the SCREE method. The two rat ers were col l ege gr aduates who had majored in psychology. Although they were trained in the SCREE procedure th ey were uninformed of its purpose . The rater s wer e also naive to the exact purpose of the exper iment and had no prior appl i ed experience with th e SCR EE t es t . The  (Tables 6 and 7) and pattern complexity (Tables 8, 9, 10 and 11) will be summarized within each le vel of P and SAT . Tables 12 and 13 present the proportion of each method's estimates of MJ C which deviated a set amount from the population value. This representation of the distribution of the estimates is also presented at each level of P and SAT.  Table 4 represent 60 observations. Tables 4 through 11 follow essentially the same format. A detailed description will, therefore, be given only for Table 4.
The first row of  Table 4 present the mean differences and standard deviations for each method when P was 36, the saturat ion was .8 and the sample size was 72. All the methods showed improved average estimates of the criterion at this higher level of saturation. It should be noted, however, that the sta ndard deviation of the differences increased for all levels of the BART method and, to a le sser extent, for the Kl rule as well. Rows 7 and 8 of Table 4 present the mean differences and standard deviations for each method when the sample size was increased to 180, P was 36 and the saturation was .8.
Compared to the results in rows 5 and 6, the larger sample size resulted in more accurate (d = 0.0) and consistent (sd = 0.0) estimations by MAP and PA methods.
Kl method was not greatly effected.

The performance of the SCREE and
The three levels of the BART method retained more components at the higher sample size. This led to a larger overestimation at BA and a switch from under-to overestimation at BB and BC. The standard deviations at all three levels of BART appear to have been larger at N = 180 than at N = 72.
The Kl method performed s lightl y better at the higher sample size at both levels of component saturat ion. BART retained more components at the higher level of sample size at both levels of component saturation. Table 5 parallels Table 4 with P equal to 72. It summarizes the impact of sample s ize at both levels of component saturation.
The MAP and PA methods were again minimally influenced by the sample size change at both levels of component saturation.
When the saturation was .5, th e SCREE method showed less overestimation at the higher than at the lower sample size. This effect was not apparent when the saturation was .8.

28
The role of the number of variables is presented from a different perspective in Tables 6 and 7. Table 6   and SCREE showed essentially no improvement at the higher level of P/MJC. The Kl and BART methods showed some improvement at the higher l evel of P/MJC at both levels of saturation.   The range of pattern complexity affect ed th e methods differently.
Although the methods tended to perform best at Comp le xit y lev el 1, they had different worst cases. When th e satur at ion was .5, in Table   8, t he worst cases were: MAP and PA at level 5; SCREE at level 2; Kl and BART at level 4. A comparison of Tables 8 and 9 Tables 10 and 11 parallel Tables 8 and 9 with P equal to 72. As was the case when P was 36, the range of complexity appears to have differentially effected the method's performance. At a saturation of .5, in Table 10, MAP was quite accurate at levels l, 2, 4 and 5 but underestimated erratically at levels 3 and 6. At a saturation of .8, in at level s 2 and 4 and when the saturation was . 8, level 4. The Kl method gave gross overestimates at all le vels of complexity when the saturation was .5. It was quite accurate when the saturation was .8 at levels 1, 2 and 3. At the same saturation at levels 4, 5 and 6 the method consistently overestimated the criterion. The BART method showed a moderate range of underestimation when the saturation was .5 with the worst case appearing to be level 6. When the saturation was .8 BART performed well at levels 1 and 3, overestimated moderately at level 2 and overestimated greatly at levels 4, 5 and 6.
A general overview of the performance of the different methods may be gained by calculating the percent of times each method's estimate deviated a set amount from the criterion. Since P and saturation appear to have had the most substantial impact on the methods, the percentages were computed at each level of these variables. Deviations of greater than three were collapsed for

Method
Complexity  because of attenuated range on these tables. The BART method retained more components when P was 72 than 36. BART was more often accurate when the saturation was .5 than .8 when P was 72.

Discussion
The question of interest in this study was the ability of five decision methods to estimate the number of major components present in the population correlation matrices given only the generated sample matrices.
The difference between the estimated number and the defined number of major components served as the primary dependent variable in this simulation study. The standard deviation of the difference scores gave further information about each method's consistency.
Finally, the percent of correct decisions and the   were defined as those having more than three substantial loadings and an eigenvalue greater than or equal to 1.0 at the population level.
Two types of minor components (MNC) were defined . It i s felt this patterns expand upon the formal model and incorporates cases likely to be encountered in real data analyses.
The Kl rule was found to overestimate consistently the number of major components. It never underestimated. This finding is consistent with those of Cattell and Jasper (1967), Linn (1968), Yeomans and Golden (1982) and Zwick and Velicer (1982). At a component saturation of .5, the number retained often fell in the 1/3 to 1/2 of Prange discussed by . As the number of variables increased, so did the number of components retained.
Kl retained more components when unique variables were included in the population pattern. These findings are clearly contrary to those of Humphreys (1964) and Mote (1970). It certainly has not been supported as the best automatic choice as is the case in a number of currently available statistical packages.

42
The BART method's performance was the most variable of those examined. In addition to variability, the method was sensitive to a number of influences. Increases in N, P and SAT as well as the use of conservative alpha levels and the presence of unique variable s all lead to the retention of more components. The first four of these influences may be seen as affecting the statistical power of the Bartlett test. In data sets where the P-m eigenvalues were in fact equal at the population level, Zwick and Velicer (1982} found the method to be moderately accurate. In the broader range of complexity examined here, the test tended to retain both types of minor components defined above. Although examination of different alpha levels led to fewer or greater numbers of components retained, the accuracy and consistency of the method did not appear to be markedl y improved by adjusting alpha levels to sample size (see Table 4) .
Other factors present in this study appear to have had greater influence on the performance of BART, across alpha levels, than did sample size alone.
The Bartlett test is accurate in answering statistical questions concerning the equality of eigenvalues (Bartlett, 1950;1951).
Researchers inclined to examine minor components, particularly early in the course of exploratory analysis, may find the method helpful.
However, the Bartlett test cannot be recommended as a general method of determining the number of major components to retain.

43
The SCREE method had moderate overall reliability when the mean of two trained raters was used. The correlation of the mean of those raters' decisions with an expert judge indicated fair overall agreement. Reports of rater reliability on the SCREE have ranged from very good (Cattell and Jaspers, 1967) to quite poor (Crawford and Koopman, 1979). This range may reflect either the training or the task complexity across research projects. The raters in this study appear to have shown greater agreement at higher than at lower component saturation levels.
They also appear to have shown greater agreement when there were more rather than fewer variables. Perhaps more importantly, the interrater reliability of the SCREE proceedure had a fairly wide range across levels of complexity and saturation.
The moderate reliability of the SCREE method is very problematic for the applied researcher. Unreliability at this point in the analysis may well expose a study to otherwise avoidable experimenter bias. In any case, applied researchers should note that reliability questions always arise in any use of the SCREE method.
In general the SCREE method was more accurate and less variable than either the Kl or BART method. The method was more accurate and less variable at the higher level of component saturation.
Larger sample sizes also improved its accuracy when P was 72 and SAT was 44 .5. Sample size did not appreciably affect SCREE at other levels of P or SAT. This effect of larger sample size is consistent with those reported elsewhere (Cliff and Pennell, 1967;Linn;Zwick andVelicer, 1982 Cat te ll, 1966; Crawford, 1975;Everett, 1983;Joreskog, 1962;Kaiser, 1960;Revelle and Rocklin, 1979;Veldman, 1974;Veli cer, 1976). These methods often do not give the same resu lts (Ander son, Acito and Lee, 1982;Cattell and Vogelman, 1977;Hakstian, Roger s & Cattel l, 1982;Linn, 1968;Zwick and Veli cer, 1982 (Crawford, 1975;Revelle and Rocklin, 1979;Veldman, 1974). Me th ods within each of the se categories have something t o recommend them and will be examined in turn.

Statistical Methods
Bartlett's test. Followin g Lawley (1940) , Bartl ett (1950Bartl ett ( , 1951 suggeste d a statistical test of the null hypothesis th at the remaining eigenvalues are equal . Guttman {1954) has described why such a test might be usef ul. The sum of the eigenvalues must equal P. The average eigenvalue will, therefore, be 1.0. Either each eigenvalu e must be equal to 1.0 or at least one will be greater, and one le ss , than 1.0.  sta tes the distribution of t he P eigenvalues about the mean i s asymme tric such that if a few ar e somewhat larger than unity , then a greater number will be somewhat less than unity. Tests of the essential equality of the remaining P The statistic V is distributed as chi square with degrees of freedom equal to: (P -m -1) (P -m + 2)/2N Bartlett's test appears sensitive to the number of subjects employed. Gorsuch (1975) argued that as the number increases, the tests of significance become more powerful and, therefore, less and less substantial differences between sample eigenvalues are found to be significant. This can l ead to the retention of more components as a function of the number of subje cts , other things being equal. In response to this,    Kaiser (1960) and employ the Kl rule to determine the number of components rather than as a lower bound as orig inally presented. Difficulties associated with this use are noted by Mote (1970) and Humpreys (1964) who argued that rotation of a greater number of components resulted in more meaningful solutions. They imply the relatively blind use of the Kl rule therefore, may sometimes lead to the retention of too few components.
A number of researchers (Browne, 1968;Catte ll and Jaspers, 1967;Lee and Comrey, 1979;Linn, 1968;Ravelle and Rocklin, 1979;Yeomans and Golder, 1982;Zwick and Velicer, 1982) however, have found the numb er of components reta ined by this method often overestimates the known underlying component structure. Gorsuch (1974) and Kaiser (1960) report that the number of components retained by Kl is commonly between one third and one fifth or one sixth the number of variables included in the correlation matrix.
This relationship of retained components to the number of variable s is detrimental to the accurate estimation of the underlying component structure. The Kl method, although commonly used, is believed by some critics to sometimes underestimate and by many others to sometimes grossly overestimate the number of components, the latter particularly when there are a large (e.g., P greater than 50) number of variables involved.
The Minimum Average Partial (MAP). Velicer (1976) has suggested a method based on the matrix of partial correlat ions. The average of the squared partial correlation is calculated after each of them components has been partialed out. The minimum average of the squared partial correlation indicates the stopping point for this method. That is, when the average squared partial corre lation reaches a minimum, the numbe r of components partialed out is the numb er of components to be retained. Velicer (1976) demonstrated that the average of squared partials will continue to decrease until the residual matrix most closel y resembles an identity matrix. After that point, the average squared partial will increase. Using this rule two or more variables would be expected to have high loading s on .  Velicer (1976) points out the method is exact, can be applied with any covariance matrix and is logically related to the concept of factors as representing more than one variable. In a recent study (Zwick and Velicer, 1982), it was found the MAP rule was more accurate in identifying a known number of components than was the Kl or BART rule.
Non-trivial Contribution Methods. The substantive factors will be extracted first and the smaller tri vial factors will be removed later." (Gorsuch, 1974, p. 152  The scree test has been most effective when strong components are present with little confounding due to error or unique factors. Tucker, Koopman and Linn (1969) found the scree test to be correct in 12 of 18 cases. Cliff (1970) found it to be accurate, particularly if questionable components are included. Cattell and Jaspers (1967) found the test to be correct in 6 of 8 cases, while Cattell and Vogelmann (1977) reported the test to be accurate over 15 systematically differing analy ses . Further, Cliff and Hamburger (1967) found more definite breaks with larger (N = 400 vs . N = 100) sample sizes and Linn (1968) concurred in this conclusion. Zw ick and Velicer (1982) found the sc r ee test to be most accurate with lar ger samples and stro ng components. They found the scree test to be the most accurate of four methods for determini ng t he number of components to retain across many examples of matrices of known, non-complex, structure.
Use of the scr ee test always invol ves issues of interrater reliability. Cattell and Vogelmann (1977) and Zwick and Velicer (1981) have reported good interrater reliability among naive and among expert judges . Crawford and Koopman ( 1979) have reported extremely low interrater reliabilities, however.

63
Parallel analysis. A third non-trivial contribution method based upon an examination of the eigenvalues has been suggested. Parallel Analysis , involves a comparison of the obtained, real data eigenvalues with the eigenvalues of a correlation matrix of the same rank and based upon the same number of observations but containing only randomly associated variab l es. This method is an adaptation of the Kl ru l e. Guttman' s (1954) development of upper and lower bounds was based upon population values.  noted that, at the population level, the eigenvalues of a correlation matrix of randomly associated variables would all be 1.0 . When sampl es are generated based upon such a matr i x, however, the initial eigenvalues exceed 1.0 while the final eigenvalues are below 1.0.
The smaller the sample size, the more the initial eigenvalues exceed 1.0.  suggested that the eigenvalues of a correlation matrix of P randomly associated variables be contrasted with those of the data set in question, based on the same sample size . Components of the matrix of interest which have eigenvalues greater than those of the comparison random matrix would be retained. This approach integrates the reliability and data summarizing emphasis of the population based Kl rule without ignoring the ef f ect of sample size .  presented one example of PA in a PCA problem. He recomme nded that the com pari son eig enval ues be based upon a num ber of generated random matrices to avoi d major sampling errors in the est imat es of the eigenva lue s. Although ther e has been no published systematic exami nation of the PA method with PCA, Richman (personal comm unic at ion , Oct., 14, 1983) reported a series of simulation studies with the method. He found PA to be very accurate when appli ed to correlation matr ice s conforming to the formal factor analytic model. He further repor t ed that PA led to retent ion of too many components when appli ed to correlation matrices conforming to the mi ddle model described by Tucker , Koopman, and Linn, (1969). The method was more accurate i n both cases at larger (N = 500) than at smaller (N = 100) sample sizes. (1974)  Following Montanelli and Humphreys (1976) general rationale but incorporating Bartlett's (1950) presentation concerning degrees of freedom, the following equation is suggested as a starting point to develop a useful prediction equation for PCA.

Rotational methods
A recent approach to the problem of determining the number of components to retain focuses upon the pattern of loadings which result from the rotation of differing numbers of components. Veldman (1974) and Crawford (1975) emphasized the goal of simple structure in determining the number of components to retain. They have suggested that the rotational criterion used to select the best solution within the set of those available at a given value of m could also serve as a criteria to select the best solution from those available at different levels of m. That is, they suggest one should compare the rotated solution of different numbers of components to find the one which best fits some mathematic definition of simple structure. Veldman (1974) emphasized the orthogonal varimax criteria as his choice while Crawford (1975) presented a more general criteria adaptable to both orthogonal and oblique rotations. Revelle and Rocklin (1979) extended this approach to include the practice of using unweighted component scores and the concept of a minimized residual matrix.
Each of the rotational approaches share the advantage that the retained number of components will provide a best approximation of simple structure. These approaches emphasize the overall component pattern rather than the propertie s of any one component. Anderson and Lee (1982) found the varimax criterion to be useful in image analysis approaches but not in PCA. Crawford (1975) Tables 8-17 through B-26.

69
The proportion of each decision method's estimate which deviated set amounts from the known popul ation value is presented in Table   8-27 (P = 36) and B-28 (P = 72). These proportions provide a summary overview of the accuracy and variability of each decision method's performance.
Tal be B-1 through B-26 follow the same general format. A detailed description is therefore given only for the first table.  Table 8  The impact of the various conditions upon the performance of each method is presented in detail in Table B-1 through B-16. The most useful understanding of the impact of the condit ion s may be gleaned by noting the two worst cases for each method. These cases will be presented in order from the most accurate method to the least accurate.
Overall, the PA method was the most accurate. It showed its largest mean deviation {d = 1.00) from the cri terion when P = 36, N = 180, SAT= .5, there were six variables per component, and complex variables, unique variables and equal variables per component were present (Table 8-  The largest overestimation occurred with complex variables absent while th e second l argest occurred when they were present (Table B-                       The resultant population R matrix (R = R* + o 2 ) would equal the R* matrix with l .0's in the diagonal. Montaneilli's (1975) program was then employed to generate five sample correlation matrices, at each of two levels of N, from the population matrix.