The Approach to Ergodicity in Monte Carlo Simulations

The approach to the ergodic limit in Monte Carlo simulations is studied using both analytic and numerical methods. With the help of a stochastic model, a metric is defined that enables the examination of a simulation in both the ergodic and non-ergodic regimes. In the non-ergodic regime, the model implies how the simulation is expected to approach ergodic behavior analytically, and the analytically inferred decay law of the metric allows the monitoring of the onset of ergodic behavior. The metric is related to previously defined measures developed for molecular dynamics simulations, and the metric enables the comparison of the relative efficiencies of different Monte Carlo schemes. Applications to Lennard-Jones 13-particle clusters are shown to match the model for Metropolis, J-walking and parallel tempering based approaches. The relative efficiencies of these three Monte Carlo approaches are compared, and the decay law is shown to be useful in determining needed high temperature parameters in parallel tempering and J-walking studies of atomic clusters.


I. INTRODUCTION
A goal of Monte Carlo (MC) simulations in statistical mechanics [1] is the calculation of ensemble mean values of thermodynamic quantities. Ensemble mean values are multidimensional integrals over configuration space where P (x) is the probability of finding a system in the state defined by x, and the functional form of P (x) depends on the ensemble investigated. MC simulations usually generate a sampling of configuration space {x k } K k=1 by the use of a stochastic process with stationary probability P (x k ). The quantity U evaluated at x k is the output of the simulation U (x k ) = U k , and its arithmetic mean value U , in principle, must approach the ensemble mean value. [1] In this paper we refer to the set of configurations generated in a Monte Carlo simulation as a time sequence, and we study the behavior of these temporal sequences {U k } and their arithmetic mean, to understand better how MC simulations approach ergodic behavior. It is important to emphasize that there are two time variables to consider. The time variable k labels the separate configurations generated in a Monte Carlo walk. Variations of properties with k provide information about the short-time behavior of a MC simulation. The time variable K labels the total length of the MC walk, and variations of computed properties with K provide information about the convergence of the simulation on a long time scale.
Given an infinite time, the stochastic walker in a MC simulation visits every allowed point in configuration space. [2] Ergodic behavior is reached when the length of the walk is sufficiently long to sample configuration space appropriately. [3] In practice, this does not mean that the space has been densely covered but that every region with non-negligible probability has been reached. In such a case we can say that the simulation is effectively ergodic or that it has reached the ergodic limit.
For a finite walk, in the event of broken ergodicity [4], phase space is effectively disconnected. The different disconnected regions (called components) are separated by barriers of zero effective probability. If a stochastic walker starts its walk in one of these regions, it may not cross the barriers within the time of the simulation. If the simulation length is increased, some barriers may become accessible for the walker and phase space is better sampled. We can conclude that a time τ exists such that, for simulation lengths shorter than τ , the walker becomes trapped in one of the phase space components. For simulation lengths much larger than τ , phase space is effectively covered by the walker.
In this study we imagine a system having more than one time scale τ 1 ≪ τ 2 ≪ . . . ≪ τ Λ . In a Monte Carlo simulation each scale comes from stochastic processes with different correlation times. [5] A precise definition of the correlation times for Monte Carlo processes is given in Section III, but for the moment we can think of these correlation times as identical to physical time scales of the system under study. To understand these time scales more fully, it is useful to focus on an example. Prototypical of systems having such disparate time scales are atomic and molecular clusters. Typical cluster potential surfaces have many local minima separated by significant energy barriers. [6][7][8] The local minima can be grouped into basins of similar energies, with each basin separated from other basins again by energy barriers. At short Monte Carlo times a cluster system executes small amplitude oscillations about one of its potential minima. We can think of these vibrational time scales as the shortest time scales that define a cluster system. As the simulation time is extended the system eventually hops between different local minima within the same basin. The time scale for the first hops between local minima can be considered the next shortest time scale for the simulation. At still longer Monte Carlo times, the system hops between different energy basins defining yet another time scale for the simulation. This grouping of time scales continues until the longest time scale for a given system is reached. At Monte Carlo times that are long compared to this longest time scale, the simulation is ergodic.
Consider a system with several time scales as mentioned above. If the length of the simulation is smaller than the smallest correlation time, the walker may become trapped in an effectively disconnected region and the sampling of phase space is incomplete. By increasing the time, the memory of the initial condition in the sampling decreases as the walker crosses to other previously unreachable regions. These oscillations and hoppings can be modeled by a superposition of stochastic processes with different correlation times. These processes with non-zero correlation times are known as colored noise processes (as opposed to zero correlation time white noise processes). [5] From the study of the autocorrelation functions of a stochastic model defined using these colored noise processes, we can verify that, at a fixed run length K, there exist two different groups of processes; those that contribute to the autocorrelation function with terms that decay like 1/k (called diffusive processes), and those that contribute to the autocorrelation function with terms that decay slower than 1/k (called non-diffusive processes). When the time of the simulation is increased, some non-diffusive processes at shorter run lengths, start to contribute to the autocorrelation function like diffusive processes. After the walk length reaches the largest correlation time τ Λ , all processes contribute to the autocorrelation function with terms that decay like 1/k. At this point, the simulation is at the diffusive regime and effective ergodicity has been reached. A principal goal of this work is to investigate the way in which the MC output {U k } reaches the diffusive limit (i.e. the ergodic limit) by studying the properties of autocorrelation functions under changes of scale in time, K → bK with b > 1. By time scaling it is possible to infer the decay law of the non-diffusive contributions with respect to the total simulation time K. The functional dependence of the non-diffusive contributions on the parameter b that is used to scale K is determined empirically. We have found the decay law so determined to be a particularly valuable method of concluding when a simulation can be considered ergodic. Unlike previous studies [3,[9][10][11] that only have investigated the behavior of certain autocorrelation functions in the ergodic regime, by focusing on the approach to ergodic behavior we have a more careful monitor of the onset of ergodicity. Once the non-diffusive contributions have decayed to a point where they are too small to be distinguished from zero to within the fluctuations of the calculation, we can say that the ergodic limit has been reached.
The autocorrelation functions we use to measure the approach to the ergodic limit are based on one of the probes of ergodicity developed by Thirumalai and coworkers [3,[9][10][11], and is often called the energy metric. The energy metric has been proposed as an alternative to other techniques [3] (like the study of the Lyapunov exponents [12]) for the study of ergodic properties in molecular dynamics (MD) simulations. The metric has been used to study the relative efficiency of MC simulation methods as well. [13] The MC metric as used in the current work can easily be extended from the energy to other scalar observables of the system.
We present two key issues in this paper. First, from the knowledge of the decay law of the non-diffusive contributions to the MC metric, we infer how long a simulation must be to be considered effectively ergodic. Second, once the ergodic limit is reached, we can compare the results from different numerical algorithms to measure relative efficiencies. Because the outcomes of MC simulations are noisy, we have found it useful to separate diffusive and non-diffusive terms in the MC metric with a Fourier analysis so that we can neglect the high frequency components of the noise. This technique has given reproducible results.
To test the match between the stochastic model and actual Monte Carlo simulations, we examine the approach to ergodic behavior in simulations of Lennard-Jones clusters. Recently [14,15] we have studied the thermodynamic properties of Lennard-Jones clusters as a function of temperature using both J-walking [16] and parallel tempering methods. [17][18][19] Both simulation techniques require an initial high temperature that must be ergodic when Metropolis Monte Carlo methods [20] are used. If the Metropolis method does not give ergodic results at the initial high temperature, systematic errors propagate to the lower temperatures in J-walking and parallel tempering simulations, and the results can be flawed or meaningless. In most Monte Carlo simulations of clusters at finite temperatures, [21,22] the clusters are defined by enclosing the atoms within a constraining potential about the center of mass of the system. The constraining potential is necessary because clusters at finite temperatures have finite vapor pressures, and the association of any one atom with the cluster can be ill-defined. From experience [14,15,23] we have found that if the radius of the constraining potential and the initial high temperature are not both carefully chosen, it can be difficult to attain ergodicity with Metropolis methods. A key concern then is the choice of constraining radius and the choice of initial temperature. We verify the stochastic model by investigating Monte Carlo simulation results as a function of the temperature and the size of the constraining potential.
The contents of the remainder of this paper are as follows. In Section II we motivate the studies that follow by examining numerally the behavior of a set of Monte Carlo simulations of a 13-particle Lennard-Jones cluster. This cluster system is used to illustrate the results throughout this paper. In Section III we introduce the stochastic model based on a continuous time sequence. In Section IV we extend the model to discrete time sequences characteristic of actual Monte Carlo simulations. In Section V we test the discrete stochastic model with applications to Lennard-Jones clusters and in Section VI we summarize our conclusions. Many of the key derivations needed for the developments are found in two appendices.

II. AN EXAMPLE CALCULATION
Before discussing the major developments of this work, it is useful to understand the nature of the problem we are attempting to solve by examining some numerical results on a prototypical system. We take the 13-particle Lennard-Jones cluster defined by the potential function where ε and σ are the standard Lennard-Jones energy and length parameters, N is the number of particles in the cluster (13 in the present case), r ij is the distance between particles i and j and V C is the constraining potential discussed in Sec. I where X c is the coordinate of the center of mass of the cluster and R c is the radius of the constraining sphere. The 13-particle Lennard-Jones cluster has a complex potential surface with many minima separated by significant energy barriers, [6][7][8] and ergodicity problems associated with the simulation of properties of this system are wellknown. [16] We now consider a Metropolis MC simulation of the average potential energy of the system in the canonical ensemble at temperature k B T /ε = 0.393(k B is the Boltzmann constant). This average potential energy V k is defined by and is displayed in the upper panel of Fig. 1 as a function of the walk length k for 20 independent simulations each initialized from a random configuration. Over the maximum time scale K of the walks, it apparent that the potential energy averaged over each independent walk has not converged to the same result. Such unreproducible behavior is symptomatic of a simulation not yet at the ergodic limit.   Fig. 1 must approach the same value for each walker. Using related ideas developed elsewhere, [3,9,10] the extent to which the walks approach the same limit can be measured in terms of a metric d k defined by In Eq. (6) M represents the number of independent walks, and V (i) k is the average potential energy computed in walk i at MC time k. The metric measures the energy fluctuations in the walk as a function of the walk length. For an ergodic simulation, the metric must decay to zero. For the 20 simulations of the 13-particle Lennard-Jones cluster, the metric as a function of k is plotted in the lower panel of Fig. 1. Rather than asymptotically approaching zero, over the short length of the walk displayed here, d k has decayed to a constant, and as discussed later in this paper, over the time scale of this simulation, d k can be qualitatively represented by the function where A K and B K are coefficients that are dependent on the total walk length K. As K is increased to a time where the walk is ergodic, B K must decay to zero. Major goals of this work are to understand how B K decays and to use the discovered decay law to determine the onset of ergodic behavior. Our approach is to introduce first a continuous stochastic model of a simulation followed by a discrete model more clearly linked to actual MC studies.

III. STOCHASTIC MODEL
We have discussed in the introduction how the output of MC simulations can be considered to be a combination of stochastic processes with different time scales, and how the contributions to autocorrelation function from these processes can vary when the length of the simulation is enlarged. Here we present a continuous time model for the stochastic processes that occur in a simulation. Even though a MC simulation occurs in a discrete time (each MC point represents a time unit), we find that the continuous model helps to understand better the ideas used in the modeling of the MC output.
In this section the ensemble mean value is used to find the expression for the autocorrelation functions of the model. Although in actual numerical calculations the ensemble mean is replaced by a mean over a finite number of independent experiments, the results obtained here give information about the limit of an infinite sample.
The stationary process used to sample space is a stochastic process. We assume the output of the MC simulation can be modeled by a linear superposition of stochastic processes with different correlation times τ ℓ ≥ 0, where U c is a constant, the random variable ξ(t) represents white noise processes with zero correlation time (τ 0 = 0), and the {g ℓ (t/τ ℓ )} are stochastic processes with correlation times τ ℓ > 0. ξ(t) and g ℓ (t/τ ℓ ) have units of the square root of time, and Γ 0 and Γ ℓ are constants with units of U 2 /t. If U is chosen to be the the x-coordinate of a particle, Γ 0 and Γ ℓ have units of a diffusion constant.
Consequently we refer to these constants as generalized diffusion coefficients. The white noise process has the following properties [5] and the remaining colored noise processes are assumed to satisfy so that they represent processes with a memory f ℓ . Even though correlations between processes with different correlation times may be non-zero, we assume the processes to be independent, i.e.
The memory function is assumed to be a continuous function that depends only on the distance between t and t ′ disregarding the time origin (stationary condition). The memory function represents the correlation between two times of the process g ℓ . In our model we impose the condition The scope and implications of the leftmost inequality are explored in Appendix A. In Appendix A we also examine the conditions f ℓ must satisfy in order to yield contributions to autocorrelation function that decay more weakly than 1/t. We now assume that this inequality can be taken as a bound to possible maxima of f ℓ appearing at t > 0. The rightmost inequality enables us to assume f ℓ is normalized We have identified here the time scale τ ℓ with the correlation time of the stochastic process g ℓ . This identification is valid if which implies that the behavior of f ℓ at large t must be O t −(2+ǫ) , or smaller. In addition, by the properties of the ensemble mean value, we have that for all real λ Equation (18) must be true for all λ. Therefore, the discriminant of the polynomial in λ must be non-positive Consequently The ensemble mean value U is time independent. The ensemble mean value of the noise processes is zero. Therefore, U c must be equal to U . Processes defined by Eq. (8) have two different components, uncorrelated white noise and correlated processes with correlation time τ ℓ . Because the goal of the simulation is the calculation of the ensemble mean U by the analysis of the time series, we study the behavior of the temporal mean U (t) where W (t) is a Wiener process, [5] . Equation (20) implies that the evolution of the temporal mean U (t) has the same structure as U , with an uncorrelated term and terms with tailed correlation functions.
The autocorrelation function of the process U at times t and t ′ is defined by where we have used Eqs. (13) and (14) to neglect terms involving processes with different correlation times.
Because we have assumed f ℓ is a continuous function, f ℓ reaches its maximum and minimum value within any closed interval considered. The ℓth non-diffusive contribution to κ(t, t ′ ) is bounded where t > = max(t, t ′ ), and t min is the time at which f ℓ reaches its minimum value in the closed interval [0, t > ].
There exists a t * ℓ (t > ) ∈ [0, t min ] [24] such that, Using Eqs. (23) and (27) in (24), we find that For all times shorter than τ 1 the autocorrelation function is the sum of diffusive contributions (proportional to 1/t) plus non-diffusive contributions. These contributions implicitly depend on t > through t * ℓ (t > ). We assume that f ℓ satisfies the conditions stated in Appendix A, so that the dependence of f ℓ on t is weaker than 1/t (for total time scales shorter than τ ℓ ; see Appendix A).
We next consider the behavior of Eq. (28) for time scales greater than τ 1 . Under the scale change t → bt such that τ 1 ≪ bt > ≪ τ 2 , the contributions to the correlation function from the process with correlation time τ 1 can be considered diffusive [in other words, by virtue of Eqs. (10) and (12), f 1 /τ 1 has become a delta function]. With bt > ≪ τ 2 , the other processes preserve their old properties. Then, the autocorrelation function can be expressed The complete derivation of Eq. (29) can be found in Appendix B. For a times larger than the correlation time τ Λ , all contributions to the autocorrelation function are diffusive, the simulation can be considered ergodic, the sampling complete, and the temporal mean is equal to the ensemble mean within O(1/t) mean square fluctuations.

IV. DISCRETE TIME SEQUENCES AND THE MC METRIC
Monte Carlo simulations generate discrete sequences U k of values of the quantity under study. Additionally, in actual calculations the ensemble of sequences is represented by a finite rather than an infinite set. In this section, the model developed in the previous section is extended to finite sets of discrete sequences. We express the M se- , where the label (m) ranges from 1 to M . The exact ensemble mean value U can be obtained in the limit that M becomes infinite. In analogy with the model developed in Section III, each output is assumed to have the form where The true ensemble average U does not depend on the index m.
In the discrete case we define a metric where the bars represent the temporal mean value Observe that in the present case, our finite sample of the infinite ensemble is the set of outcomes from M independent numerical experiments. The metric we have defined in Eq. (35) can be contrasted with alternative metrics [3,9,10] previously defined for molecular dynamics simulations. These alternative metrics examine the fluctuations of two simulations initialized from different components of configuration space averaged with respect to all the particles in the system. The metric we use in this work is determined using an average with respect to M independent simulations that represent a subset of the full ensemble.
Using the model presented in Eq. (30), we now develop a way to predict the behavior of the MC simulation in the non-ergodic and the ergodic regimes. We first consider the case that the total simulation time K is larger than the first correlation time τ 1 but shorter than τ 2 , i.e. τ 1 ≪ K ≪ τ 2 . The expression for d k is given by If the number of experiments M is sufficiently large, we can neglect terms involving processes with different correlation times, and products of sequences belonging to different experiments. Under these assumptions we obtain Equation (40) preserves the form of Eq. (28). To make this statement explicit, let us rewrite Eq. (40) as where In Appendix B we present a study of the way non-diffusive contribution become diffusive under time scale changes. If M is sufficiently large and τ 1 ≪ K ≪ τ 2 , by virtue of Appendix B, Γ k must roughly be a constant. By roughly a constant we mean a constant C plus some rapidly fluctuating function ζ k , with the following properties: a) ζ k = 0 and b) |C| ≫ max k=1,2,...,K (|ζ k |). Then If K is enlarged, we expect to have a larger value of Γ K . Υ k is a quantity related to the memory functions f ℓ with correlation times τ ℓ ≫ K. In the continuous time model, the colored noise processes contribute to the autocorrelation function with terms proportional to f ℓ (t * ℓ (t > )/τ ℓ ), which are weakly dependent on t (see Appendix A). We can expect Υ k to be weakly dependent on k, and for sequences of length K and for M sufficiently large, we consider this quantity roughly to be a constant where β k represents additional random noise. Then, for a given length k ≤ K, the MC metric d k can be approximated by where γ k = 2(ζ k /k + β k ) represents remaining stochastic noise from both contributions. In this approximation, Γ K and Υ K are the quantities that carry the long time dependence. Short time features appear in the 1/k dependence and in the remaining noise γ k . If the sequences considered are increased in size by a factor of b, such that where Υ bK must go to zero and Γ bK must approach a constant when b is increased. By virtue of the expected behavior of the non-diffusive contributions (see Appendix A), we propose the following expression for Υ bK where φ(b) is a decreasing function of b. Moreover, Υ bK is a sum of non-diffusive contributions. As presented in Appendix A, each non-diffusive contribution to the autocorrelation function has a relative variation smaller than the relative variation of the diffusive contribution, namely 1 − 1/b. If this inequality is applicable to the sum of nondiffusive contributions, we have that for all b > 1. Then, φ must be either with 0 < υ < 1 and 0 < η ≤ 1. Equation (53) can be thought as the limit of Eq. (52) when the exponent goes to zero. We know of no a priori argument to justify Eq. (48). However, as is discussed in Section V, our numerical experience has shown Eq. (48) to be obeyed in all cases we have examined. Our goal is to develop a criterion to decide when the simulation can be considered ergodic. From the previous considerations it is clear that the ergodic limit is reached when Υ K is indistinguishable from zero. The output from a MC simulation is usually noisy. There-fore, γ k can not be neglected. A useful way to separate diffusive and non-diffusive contributions and to eliminate the stochastic noise from Eq. (46), is to perform a Fourier analysis of the function kd k . Let us define the frequencies ω n = (2π/K) n, with n = 0, 1, . . . , K − 1. The discrete Fourier transform of the function kd k is the signal Y K (ω n ) In general, kγ k (ω n ) is negligible except at high frequencies. For small positive values of the frequency we can make the approximation cot(ω n /2) ≃ 2/ω n . From this approximation we have The real part of Eq. (55) for positive frequencies is Even though simpler than Eq. (56), we have found Eq.
(57) is more sensitive to the deviations of d k from the approximation Eq. (46). Therefore, the data obtained from the real part is of poorer quality than the data obtained from the imaginary part. Equation (56) implies that for a given simulation length K, the contributions to the MC metric from the nondiffusive process can be determined from a simple relationship involving the Fourier transform of the function kd k at low frequencies. By increasing the length of the run K by a factor of b, it is possible to observe the dependence of Υ bK on bK.

V. APPLICATIONS
The concepts developed in the previous sections are sufficiently general to be applied to any kind of MC simulation. We devote the present section to the application of the developments of this paper to the study of the Lennard-Jones 13-particle cluster in the canonical ensemble. This system has been introduced previously in Sec.

II.
Some thermodynamic properties of clusters as a function of temperature exhibit rapid changes that are reminiscent of similar changes that occur for the same properties in bulk systems at phase transitions. In a bulk system a phase transition occurs at a single temperature. For clusters the rapid changes in thermodynamic properties occur over a finite temperature interval. To distinguish the temperature range where thermodynamic properties change rapidly in clusters from a true phase transition, we follow Berry et al. [25] and refer to such changes in physical properties as associated with a phase change. A common property that has been found to be useful in monitoring these phase change intervals of temperature is the heat capacity at constant volume [26] where · T represents the classical canonical mean value. In this work we consider the bare Metropolis (Met), [20] J-walking (Jw), [16] and parallel tempering (PT) [17][18][19] approaches to Monte Carlo simulations. The free variable of all these methods is the reduced temperature k B T /ε. In PT and Jw simulations, the highest temperature used (T h ) must be sufficiently large to ensure that Met is ergodic. [16] From experience simulating a variety of systems, we have found that T h must also be lower than a temperature T b where cluster evaporation events become frequent. It is useful to think of T b as the cluster analogue of a boiling temperature. We have found that Met is unable to sample the boiling phase change region for clusters ergodically, using total time scales accessible to current simulations.
For the results that follow, U (m) k is chosen to be represented by the potential energy of the system. In general U (m) k can be any scalar property of the system. We define a pass to represent a set of single particle MC moves taken sequentially over the 13 particles in the cluster. We take U (m) k to be the potential energy at the k ′ th pass, in the m ′ th experiment. Using Eq. (55) we can write In the non-ergodic regime, Y K (0) grows with K, while in the ergodic regime, the signal Y K (0) approaches a constant.
We begin by displaying results obtained for a calculation that has not attained ergodicity over the time scale of the simulation. We examine the 13-particle Lennard-Jones cluster with the Met algorithm setting R c = 4σ at a temperature of k B T h /ε = 0.393. The temperature is chosen to be that typically used as the initial high temperature in Jw and PT studies of LJ 13 . By choosing a large constraining radius, the evaporation events are so frequent at the chosen temperature that attaining ergodicity proves to be quite difficult. We demonstrate the effect of reducing the constraining radius shortly. There are three sets of curves, each of which is indicative of sampling of at least three different energy basins. At low values of K the curves in the lower panel differ significantly. At K ≃ 4 000 the high energy basin curves begin to decrease in energy. For a value of K larger than the data displayed in Fig. 2, the curves can be expected to coalesce with the low energy basin curves. It is clear that for K ≤ 10 000, the simulation is not ergodic.
In PT and Jw studies it is essential that the initial high temperature walk be ergodic. Ergodicity can be attained for LJ 13 by reducing the radius of the constraining potential so that evaporation events are rare. We now present a study of Υ K as a function of K for several values of R c . To determine Υ K , we have calculated the Fourier transform function Y K (ω n ) using Eq. (54) at a series of frequencies ω n = 2πn/K where n has ranged from 1 to min( √ 12bK/20π, 100). This range of frequencies ensures the linear approximation used in Eq. (55) is valid while including sufficient numbers of points for accuracy.
[27] Using Eq. (56), we have calculated the slope of the imaginary part of 1/Y K (ω n ) as a function of ω n , for these frequencies. The data points appearing in Fig. 3 are the mean value over twenty independent calculations of the slope of 1/Y K (ω n ). and its error vs. log 2 (b) for Rc = 2.5σ and 2σ, with K = 10 4 . When Υ bK is on the order of its own error, the simulation can be considered ergodic. For Rc = 2σ the simulation becomes ergodic at log 2 (b) ≃ 4 (bK ≃ 16 × 10 4 ). For Rc = 2.5σ a longer simulation is needed to reach ergodicity.
Starting from random configurations, we have performed 5 × 10 4 Met passes at k B T h /ε = 0.393. After this warmup process, we have created sequences of sizes bK = 10 4 , 2 × 10 4 , 4 × 10 4 , . . ., 64 × 10 4 . The results are presented in Fig. 3 for R c = 4σ, 3σ, 2.5σ, and 2σ. The upper panel shows Υ bK as a function of log 2 (b), for fixed K = 10 4 . We have chosen to present the data using base 2 logarithms for clarity (each increase by 1 unit of log 2 (b) represents a factor of 2 scale increase). All the data decrease with increasing b, but only R c = 2σ and R c = 2.5σ appear to vanish to within the error bars over the time scale of the current simulation. In the lower panel we present Υ K /Υ bK as a function of log 2 (b) for R c = 4 and 3σ. The decay law suggested in Eq. (48) with φ given by Eq. (53) is satisfied for both radii. We have stated that the simulation can be considered effectively ergodic when Υ K is indistinguishable from zero. In Fig. 4 we have plotted Υ bK and its statistical error as a function of log 2 (b) for R c = 2.5 and 2σ. For R c = 2σ the crossing point of Υ bK and its error is at bK ≃ 16 × 10 4 . For R c = 2.5σ the crossing point is at a bK > 64 × 10 4 . We can conclude that for k B T h /ε = 0.393 and R c = 2σ the simulation can be considered effectively ergodic after 16 × 10 4 Met passes.
Once a constraining radius is chosen, PT and Jw simulations require the highest temperature T h be chosen so that Met is ergodic. For a given R c , the extent of ergodicity can be tested using the same metric that has been used for determining the optimum value of R c , but by varying the temperature. For the parameters k B T h /ε = 0.393 and R c = 2σ the simulation is ergodic even at very short sequence lengths. We have found that for k B T h /ε < 0.393 the simulations are not ergodic. To be sure that the parameters are appropriate, we have performed a short PT simulation (10 4 passes, ten PT passes consists of nine Met passes plus an exchange attempt) with 40 equally spaced temperatures in the range k B T /ε =[0.028,0.393] in order to obtain a first estimate of the position of the melting and boiling temperature regions. The boiling peak in the specific heat appears to be located at a higher temperature than k B T /ε = 0.393. Moreover, the value of C V at k B T /ε = 0.393 is about one-half the value of C V at the temperature of the melting peak k B T m /ε = 0.282. From these results we feel it is safe to choose R c = 2σ and k B T h /ε = 0.393 for the calculations that follow.
We now illustrate the convergence characteristics of Υ K when we increase the total time scale of the calculation by a factor b. We illustrate this behavior using a PT simulation of LJ 13 , and we focus on results at the temperature of the melting peak in the heat capacity (k B T m /ε = 0.282). We choose this temperature, because from experience [14,15,23] we know the statistical fluctuations are large at the melting heat capacity maximum. The large statistical fluctuations make it possible to emphasize the behavior of Υ K . We have run the PT simulation at 40 equally spaced temperatures in the range k B T /ε = [0.028,0.393]. The initial warmup time has been set to 10 4 Met passes, followed by 2 × 10 4 PT passes. Following the warm-up period, we perform simulations of 10 5 , 2 × 10 5 , 4 × 10 5 , 8 × 10 5 , 16 × 10 5 , and 32 × 10 5 PT passes. In each case the initial configuration has been taken to be the last configuration of the previous run. The output of the simulation are sequences of the potential energy. Υ K has been determined in the same way as in the calculation of the high temperature parameters (presented in Fig. 3 and Fig. 4). The data points appearing in the upper panel of Fig. 5 are the mean value over twenty independent calculations of the slope of 1/Y K (ω n ). In the lower panel of Fig. 5 we have plotted log 2 (Υ K /Υ bK ) as a function of log 2 (b), where K = 10 4 and b = 1, 2, 4, . . . , 32. The slope of the linear fit is the exponent υ, according to Eq. (52). At the temperature of the melting peak, υ = 0.93 ± 0.03.
It is of interest to perform a similar study of the behavior of Υ K as a function of the time scaling for an Met calculation. We have taken the final configuration of the PT simulation at k B T m /ε = 0.282 as an initial configuration, and we have performed a simple Met simulation at that melting temperature. A graph of Υ bK and log 2 (Υ K /Υ bK ) as a function of log 2 (b) for Met is also presented in Fig.  5. From the upper panel of Fig. 5, it is evident that Met results are not ergodic within the same scaled time as the PT results. It is also evident that the power law exponent for both Met and PT are not distinguishable. Similar studies of the power law using the Jw method also give the same exponent. Neither an increase in the number of temperatures nor changing the distribution of temperatures in both Jw and PT simulations has any effect on the calculated exponent. By using the results to compare the relative efficiencies of Met, Jw and PT simulations for the LJ 13 system. We have found that PT and Jw simulations can be considered ergodic if the run length is on the order of 2 × 10 5 passes, while Met simulations that are initialized from configurations generated from an ergodic PT study are ergodic when the total run length consists of 2 × 10 6 passes or more.
In order to compare approaches, we have calculated Γ as a function of the reduced temperature, for the three methods. The comparison of diffusion coefficients from different algorithms has also been used by Andricioaei and Straub [13]. The comparison of Jw and Met with PT is presented in Fig. 6. The Jw and PT simulations are found to have comparable efficiencies using Γ as a measure for all calculated temperatures. At intermediate temperatures, Met is significantly less efficient. We have chosen to truncate the Jw study at k B T /ε = 0.12. For temperatures below k B T /ε = 0.12, Jw simulations require significant effort, because a large set of external distributions must be generated. Because at temperatures below k B T /ε = 0.12 LJ 13 is dominated by structures close to the lowest energy icosahedral isomer, we expect the Jw and PT methods to have similar efficiencies (as measured by Γ) for all temperatures.

VI. CONCLUSIONS
In this paper we have presented a study of the approach to the ergodic limit in MC simulations. In all the cases examined, the behavior of the MC metric d k can be approximated by Eq. (46), and the behavior of Υ bK satisfies Eq. (48). Because the exponent υ is smaller than one for all the cases studied, the dependence of the non-diffusive contributions on d k is weaker (in the sense of Appendix A) than the diffusive contributions. The assumption on which we have built the stochastic model have been verified numerically for a system having a sufficiently complex potential surface to be viewed as prototypical of a large set of many-particle systems.
The MC metric used in this work appears to be a valuable tool to study the ergodicity properties of MC simulations. The non-ergodic components of the MC metric enable the prediction of the minimum length a MC simulation must have in order to be considered ergodic. The comparison of Γ from different algorithms gives a reasonable estimate of their relative efficiencies.
From the study of the melting region of 13 particle clusters, we have found that the exponent υ depends both on the method used and the nature of the potential energy function. We have performed calculations, not discussed in this work, where the functional form of the potential energy is modified. These studies have shown υ to be dependent on the details of the potential. We have not found the exponent υ to be a strong function of method. Although PT and Met have significantly different efficiencies as measured by their relative diffusion coefficients, υ is nearly the same in the two methods. The difference in the decay of Υ K appears to be dominated by the coefficient in Eqs.(48) and (52) rather than the exponent.
As discussed in the text, parallel tempering and Jwalking studies of many-particle systems must have an initial high temperature component that is chosen so that a Met simulation is known to be ergodic. For cluster simulations that require an external constraining potential to define the cluster, the radius of the constraining potential must be carefully chosen in order to achieve ergodic results. We have found the metric and associated decay laws developed in this work to be a particularly valuable method of choosing these initial parameters in both parallel tempering and J-walking simulations.
We also remark that the metric introduced here may be a more sensitive probe of ergodicity than may be required in some applications. For example in previous J-walking studies [26] of the 13-particle Lennard-Jones cluster, the heat capacity curve determined with a constraining radius of 4σ is nearly indistinguishable from the curve obtained with a constraining radius of 2σ. From the results of this work, we know the initial high temperature walk is not ergodic when a constraining radius of 4σ is used. It is striking that the non-ergodicity as measured by the energy metric is not apparent in the heat capacity curve.
We have constructed a metric based on an ensemble of MC trajectories. By using an ensemble we attempt to cover sufficient portions of space so that all components are accessible. In practice only a finite subset of a full ensemble can be included, and it is always possible that components of space are missed. In such a case Υ K may decay to zero numerically within the subspace, and the behavior may give misleading evidence that the simulation is ergodic. Because components of space may be missed in any finite simulation, it is impossible to guarantee ergodicity. It is hoped by using a sufficiently large ensemble of trajectories to define the metric, the possibility of missing components is minimized.

ACKNOWLEDGMENTS
We would like to thank Dr. O. Osenda for helpful comments. This work has been supported in part by the National Science Foundation under grant numbers CHE-9714970 and CDA-9724347. This research has been supported in part by the Phillips Laboratory, Air Force Material Command, USAF, through the use of the MHPCC under cooperative agreement number F29601-93-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Phillips Laboratory or the U.S. Government.

APPENDIX A: WEAK DEPENDENCE OF THE NON-DIFFUSIVE CONTRIBUTIONS
We have considered two overall time scales for a MC simulation. Properties calculated at short times (labeled k in the discrete case) provide information about each step of the MC process, and properties averaged over the total simulation time (labeled K in the discrete case) give information about the approach to ergodic behavior. When K is sufficiently short we have both diffusive and nondiffusive contributions as a function of k. In this Appendix we explain the relative time dependence of the diffusive and non-diffusive contributions to the autocorrelation function.
It has been assumed that the autocorrelation function Eq. (28) can be expressed as the sum of diffusive terms plus non-diffusive terms, i.e. where then, the quotient κ nd , ℓ (bt, bt ′ )/κ nd , ℓ (t, t ′ ) is (A21) By Eq. (A8), H ℓ (t; τ ) > 0 ∀ t and τ . By the Lemma the numerator is smaller than the denominator. Then 0 < κ nd , ℓ (bt, bt ′ )/κ nd , ℓ (t, t ′ ) < 1 and then, , and all f ℓ satisfy the Lipschitz condition [28] (for all closed interval A exists a real positive number C ℓ such that for all x and y in A).
where the operations to reach Eq. (A24) are valid by using Corollary. Then Using the intermediate value theorem, [24] we have where t * (t) ∈ [t, bt]. Let be t * α (t) and t * β (t) the values at which the intermediate value theorem is satisfied, in the intervals [t, bt] and [t > − t, b(t > − t)] respectively then, the remainder can be written as By the Lipschitz condition, we have that where C ℓ is a suitable positive real constant. Using Eqs. (A32) and (A35) in Eq. (A30) we have where is a continuous and differentiable function of t. The inequality (A45) holds for any b > 1. Suppose that F ℓ (t < ) + where L > 2, we have that in contradiction with the hypothesis that F ℓ (t < ) + F ℓ (t > ) − F ℓ (t > − t < ) is negative. Then J. P. Neirotti, D. L. Freeman. and J. D. Doll 0 ≤ F ℓ (t < ) + F ℓ (t > ) − F ℓ (t > − t < ) .
The results of the present appendix are valid in the limit of a complete ensemble. In our numerical experiments only partial samples of the ensemble can be considered. The memory functions that appear in our numerical calculations come from partial mean values of the product of discontinuous functions (every noise process is a discontinuous function). These memory functions are discontinuous. The behavior of the non-diffusive contributions observed in our numerical experiments is in agreement with these analytic (infinite ensemble limit) results. We can infer that there might be a version of the theorem applied to discontinuous memory functions, but we have been unable to develop such a theorem.

APPENDIX B: CONSEQUENCES OF THE TIME SCALE CHANGE IN THE NON-DIFFUSIVE CONTRIBUTIONS
In this appendix we show the behavior of the function f 1 when its correlation time is changed according to τ 1 → τ b1 = τ 1 /b, with b ≫ 1; i.e. when the total simulation time is scaled to exceed the correlation time of the first colored noise process.
We multiply the time variables by a number b, such that τ 1 ≪ bt > ≪ τ 2 . We have that the g 1 process contributes to the autocorrelation function with where t ′ = t/b and τ b1 = τ 1 /b. We want to compute this contribution both within the neighborhood t 1 = t 2 as well as outside such a region. To do so, we can split the integral in Eq. (B2) in three parts 1 b 2 tt ′ G 1 (bt/τ ℓ ) G 1 (bt ′ /τ ℓ ) = I 1 + I 2 +