THREE-DIMENSIONAL PASSIVE‐SOURCE REVERSE TIME MIGRATION OF CONVERT WAVES: A NEW METHOD TO IMAGE THE EARTH’S DISCONTINUITIES

At	  seismic	  discontinuities	  in	  the	  crust	  and	  mantle,	  part	  of	  the	  compressional-­‐wave energy	  converts	  to	  shear	  wave,	  and	  vice	  versa.	  These	  converted	  waves	  have	  been widely	  used	  in	  receiver	  function	  (RF)	  studies	  to	  image	  discontinuity	  structures	  in the	  Earth.	  While	  generally	  successful,	  the	  conventional	  RF	  method	  has	  its	  limitations and	  is	  suited	  mostly	  to	  flat	  or	  gently	  dipping	  structures.	  Among	  the	  efforts	  to overcome	  the	  limitations	  of	  the	  conventional	  RF	  method	  is	  the	  development	  of	  the wave-­‐theory-­‐based,	  passive-­‐source	  reverse-­‐time	  migration	  (PS-­‐RTM)	  for	  imaging complex	  seismic	  discontinuities	  and	  scatters.	  To	  date,	  PS-­‐RTM	  has	  been implemented	  only	  in	  2D	  in	  the	  Cartesian	  coordinate	  for	  local	  problems	  and	  thus	  has limited	  applicability.	  In	  Chapter	  1,	  we	  introduce	  a	  3D	  PS-­‐RTM	  approach	  in	  the spherical	  coordinate,	  which	  is	  better	  suited	  for	  regional-­‐	  and	  global-­‐scale	  seismic imaging.	  	  New	  computational	  procedures	  are	  developed	  to	  reduce	  artifacts	  and enhance	  migrated	  images,	  including	  back-­‐propagating	  the	  main	  arrival	  and	  the	  coda containing	  the	  converted	  waves	  separately,	  using	  a	  modified	  Helmholtz decomposition	  operator	  to	  separate	  the	  P	  and	  S	  modes	  in	  the	  back-­‐propagated wavefields,	  and	  applying	  an	  imaging	  condition	  that	  maintains	  a	  consistent	  polarity for	  a	  given	  velocity	  contrast.	  The	  new	  approach	  allows	  us	  to	  use	  migration	  velocity models	  with	  realistic	  velocity	  discontinuities,	  improving	  accuracy	  of	  the	  migrated images.	  We	  present	  several	  synthetic	  experiments	  to	  demonstrate	  the	  method,	  using regional	  and	  teleseismic	  sources.	  The	  results	  show	  that	  both	  regional	  and teleseismic	  sources	  can	  illuminate	  complex	  structures	  and	  this	  method	  is	  well	  suited for	  imaging	  dipping	  interfaces	  and	  sharp	  lateral	  changes	  in	  discontinuity	  structures.


Introduction
The method of stacking receiver functions at the common conversion points (CCP) (e.g., Dueker and Sheehan, 1997;Shen et al., 1998;Ryberg and Weber, 2000;Abe et al., 2011) has been widely used to image velocity discontinuity interfaces of the Earth's interior, such as the Moho discontinuity and mantle transition zone boundaries (e.g., Rondenay et al., 2000;Ai et al., 2007;Kind et al., 2012;Tauzin et al., 2013;. Stacking of multiple receiver functions over finite lateral and depth dimensions is necessary to enhance the signals of the converted waves as individual receiver functions often have low signal--to--noise ratios. Because of this spatial averaging, the CCP method assumes locally horizontal or sub--horizontal velocity discontinuity interfaces within the common conversion patch (point) used in stacking (Dueker and Sheehan, 1997) and does not account for the effects of wave diffraction and scattering caused by lateral variations in the interfaces (Chen et al., 2005a). Consequently, the CCP method is inadequate for imaging complex structures such as scatters, sharp changes in discontinuity structures, and high-angle dipping or faulted interfaces (e.g., vertical Moho offsets or slabs), as demonstrated by Sheehan et al. (2000) and Cheng et al. (2016).
Several other techniques have been developed to use converted waves recorded by seismic arrays to image crustal and mantle discontinuity structures: Bostock et al.
(2001) developed a 2D imaging method using an inverse scattering theory based on generalized radon transform; Liu and Levander (2013) used generalized transform based on Kirchhoff approximation; Pavlis (2003a, 2003b) used plane wave decomposition to transform the recorded data set into ray parameter and back--azimuth domain plane waves and migrated them separately; Ryberg and Weber (2000) introduced post--stack migration of receiver functions; Revenaugh (1995), Sheehan et al. (2000), Levander et al. (2013), and Cheng et al. (2016Cheng et al. ( , 2017 developed pre--stack Kirchhoff migration to image scatters and velocity discontinuities; Chen et al. (2005aChen et al. ( , 2005b presented a wave--equation migration method, which back--propagates the CCP stacked receiver functions with one--way phase screen propagator.
Using synthetic waveforms generated by finite--difference simulations, Shang et al.
(2012) developed a 2D passive--source reverse--time migration (PS--RTM) method to image complex structures with converted waves. Different from the conventional active--source, reverse--time migration (RTM) method in industry (Baysal et al. 1983;McMechan 1983;Chang andMcMechan 1987 and1994), which uses both the source--side (forward) and receiver--side (time--reverse, back--propagated) wavefields, the PS--RTM method uses only the receiver--side wavefield. Therefore, as in receiver function studies, the PS--RTM images the structures beneath the receivers using converted waves without a detailed knowledge of the sources. By comparing the CCP and PS--RTM results for a synthetic model with an offset in the Moho structure, Shang et al. (2012) demonstrated the advantage of this wave--equation--based migration method in imaging complex structures. Compared to Kirchhoff migration, the wave--equation--based migration method is computationally more expensive. But for complex structures, it theoretically has advantages over Kirchhoff migration as it accounts for finite--frequency wave effects and overcomes, for example, multipathing in the propagating wavefield that may affect Kirchhoff migration. However, the 2D PS--RTM method of Shang et al. (2012) has limited applicability for several reasons.
First, it can only use linear seismic arrays with earthquakes from a narrow azimuth range in the direction of the linear array. Second, since it does not account for diffracted and scattered waves outside of the 2D plane, its application is mostly suited to 2D structures perpendicular to the linear seismic array. Finally, a smoothed reference model was used by Shang et al. (2012) to back--propagate seismic records in order to minimize spurious phase conversions at the interfaces in the back--propagated wavefields (see more discussion in the Method section). This artificial smoothing of the velocity structure inevitably alters the wavefield near sharp velocity discontinuities and thus may introduce errors in the migrated images.
With the advance of computational power and deployment of dense three-component, areal seismic arrays, it is increasingly feasible and necessary to use wave--equation--based migration methods to image the Earth structure without the locally horizontal subsurface assumption and accurately account for the effects of wave propagation in realistic 3D heterogeneous structures. In this study, we extend the method of Shang et al. (2012) from 2D to 3D, using a 3D finite--difference elastic wave solver (Zhang et al., 2012) in the spherical coordinate, which is suited for simulating wave propagation at regional and global scales. Unlike Shang et al. 5 (2012), we separate the main P arrival and P--to--S converted waves in both the data (time) and image (spatial) domains. This allows us to use reference models with realistic, sharp velocity discontinuity structures. We adapt a new imaging condition that maintains a consistent polarity of the converted phases for a given velocity contrast (from low--to--high or high--to--low velocities) for sources with different azimuths and incidence angles, making it more straightforward to stack images obtained from different earthquake sources. We focus on the P--to--S converted waves in this study, though S--to--P conversions can be applied in a similar way. In the following sections, we describe the new 3D PS--RTM method and show several numerical examples to demonstrate the potentials of this method in imaging complex 3D velocity discontinuity interfaces.

Summary of the computational procedure
Our 3D PS--RTM method contains the following steps: Step 1: Separate the main P arrival and its coda containing converted S--wave energy by windowing them in the data (time) domain; Step 2: Back--propagate the time--reversed, three--component P--wave seismogram and the coda containing the converted S wave separately to reconstruct two back-propagated elastic wavefields; 6 Step 3: Apply a modified Helmholtz decomposition operator to further isolate the P mode in the back--propagated P wavefield and S mode in the back--propagated coda wavefield, respectively, in the 3D spatial domain; Step 4: Apply an imaging condition using the P and S modes to generate an image. A newly developed imaging condition is adapted to maintain a consistent polarity of the converted phases for a given velocity contrast; Step 5: Stack the images generated from different earthquake sources to suppress random noise and artifacts.

The forward and back--propagated wavefields
To explain the above procedure and illustrate the effects of our new approaches, we generate a 3D synthetic model, which contains two layers with different velocities as shown in Figure 1.1a. The velocity interface between the two layers is offset vertically and laterally, thus the model represents a simplified 3D geological structure in places with vertical and lateral Moho offsets. In later discussion, we refer to the interface at 30 km depth as the shallow Moho and that at 50 km depth as the deep Moho. The dimension of this model is 222 km × 222 km laterally and 100 km in depth. The finite--difference grid spacing is about 0.555 km, 0.555 km, and 0.5 km in the x (co--latitude), y (longitude), and z (depth) directions, respectively. The upper and lower layer S--wave velocities are 3.9 km/s and 4.4 km/s, respectively, while the P--wave velocities are proportional to the S wave velocities (Vp/Vs=1.74).
Since the model is created in the spherical coordinate, the horizontal grid spacing decreases slightly with depth (Zhang et al., 2012).
Synthetic seismograms are calculated using the elastic wave equations with a free-surface boundary condition: Here is the source--side (forward), 3--component displacement wavefield, is the density, is the external force representing the earthquake source, c is the fourth-order stiffness tensor, is the stress tensor and is the strain tensor (see details in Zhang et al., 2012). The superscript s represents the source--side variables. The subscript x, y, and z represent south, east and up directions, respectively, in the spherical coordinate. We calculate the synthetic wavefield with a 3D non--staggered-grid, finite--difference solver (Zhang et al., 2012) in the spherical coordinate with 4 th order spatial accuracy and 2 nd order temporal accuracy. The free surface condition is implemented with the stress--image method (Levander, 1988;Graves, 1996), which anti--symmetrically sets the value of the stress component at ghost points above the free surface (Zhang et al., 2012). For other boundaries, we use 12 perfectly matched layers as the absorbing boundaries (PML, Zhang and Shen, 2010).
Two types of seismic sources are used in this study: local/regional sources ( This recorded synthetic wavefield is used as our "data" to image the subsurface structure. Figure 1.2b shows an example of 3--component seismograms for a regional source represented by the green star #3 in Figure 1.2a. Figure 1.2d shows an example of 3--components seismograms of a plane wave with a back azimuth of 0 degree (due north) and an incidence angle of 27 degrees (from the vertical, the black arrow in Figure 1.2c).
In both examples, the main P arrival and its coda are color--coded and presented on the 3--component synthetic seismograms (Figures 1.2b and 1.2d). Due to the complexity of the velocity model (relative to 1D models with flat interfaces) and the differences in the wave paths, the converted S--wave arrivals have complex traveltime--distance curvatures, so the move--out and stacking procedure in the CCP method will not stack the converted waves from the vertical interfaces and sharp corners effectively (Shang et al., 2012). Furthermore, since the CCP method usually uses only the radial component receiver function for imaging purposes, it does not utilize the information provided by other components.
To construct the receiver--side elastic wavefield, we use seismograms on the evenly distributed finite--difference grids on the free surface (340 by 340 grids with about 0.555--km station spacing), reverse the 3--component seismograms in time, set them as the boundary condition on the surface of the study region (equation 1.8), then use the elastic wave solver of Zhang et al. (2012) to back--propagate the ground motion on the surface into the model interior by solving elastic wave equations: To illustrate the possible artifacts in the back--propagated wavefield and the reason for separating the main P arrival and its coda in our new approach, we first back-propagate the entire time--reversed wave train, including the main arrival (P) and its coda, as in Shang et al. (2012). and vertical interfaces are identified in the receiver--side wavefield and marked as thick black dashed lines 1, 2 and 3, respectively in Figure 1.3. By comparing the forward--and backward--propagated wavefields, we can also identify artifacts in the receiver--side wavefield that could contaminate the separation of the P and S modes and construction of the subsurface images.
These artifacts can be attributed to two causes. First when we numerically back-propagate P wave (or S wave), it also generates artificial, lower amplitude S wave (or P wave), because the corresponding stress field on the surface is not recorded Curtis, 2013a and2013b) and thus the boundary condition is incomplete. The artifact caused by incomplete boundary condition is small in amplitude compared to the direct P wave, but comparable to the converted S wave (see the thin dotted line in Figure 1.3d and the white dash line A in Figure 1.4b).
Second, when we use a reference velocity model with discontinuity interfaces to back--propagate the recorded waves, the resulting waves generate undesirable reflected and converted waves at the interfaces, which do not exist in the forward wavefield (black dash line B in Figure 1.4b). This artifact, if not removed, results in discontinuity interfaces in the RTM images that are inherited from the reference velocity model (white dash line in Figure 1.4c). Finally, when the P coda contains multi--reflected P waves, such as PpmP (P wave reflected at the free surface and the Moho), these phases will be back--propagated together with the converted S wave, complicating the interpretation.

Isolating P and S modes (Steps 1--3)
To solve the problems identified in the above section, we propose several new computational approaches in the data and image domains. In the data domain, we separate the three--component seismograms into the main P arrival and its time--windowed coda that contains the expected converted S--wave (Figures 1.2b and 1.2d).
In the examples shown, the P--wave window is from the beginning of the seismograms to 2 s after the peak of the direct P arrival; the coda wave window is from the end of the P wave window to the end of the seismograms. We taper the P and coda windows at the both ends of the seismograms with a one side Hann window (10s width Hann window) before back propagate them separately.
We propose a modified Helmholtz decomposition operator to further isolate P and S waves in the back--propagated wavefields. In the image domain, several decomposition methods have been developed (Zhang and McMechan, 2010). The commonly used method to separate the P and S waves with Helmholtz decomposition is shown in equations (1.9) and (1.10) (Morse and Feshbach, 1954;Zhang and McMechan, 2010): , , , = × , , , , (1.10) where and are the P and S modes, respectively, is the receiver--side displacement wavefield generated by integrating the velocity wavefield through time. Notice that the P mode is a scalar wavefield and S mode is a vector wavefield, and the vector direction information of P and S wave are lost. 13 In our modified Helmholtz decomposition, the P--and S--mode separation is achieved with equations (1.11) and (1.12), , , , = × ( × , , , ), where is the displacement wavefield constructed from the back--propagated main P arrival, is the displacement wavefield constructed from the back-propagated windowed coda waves that contains the converted S wave. Here both the P and S modes are vector wavefields and the particle motions are consistent with the original P and S waves. More complex Helmholtz operators preserving the original physical units, phase, amplitude and vector characteristics (Zhang and McMechan, 2010;Brytik et al., 2011) are computationally more expensive and beyond the scope of this Chapter. Since we use modified Helmholtz decomposition to isolate the S mode in the back--propagated coda wavefield, we also suppress the multiple reflections of P wave in the coda (e.g., PpmP). propagation of the entire wave train (Figures 1.3b, d and f), the artifacts are removed in the P and S modes (Figure 1.5). We note that the results of our modified Helmholtz decomposition (equations 1.11 and 1.12) resemble particle acceleration with side lobes on both sides of the main peak. For imaging purpose, it does not affect the location of the main peak, and thus it won't alter the location of the interface in the image.

Imaging (Steps 4 and 5)
With traditional Helmholtz decomposition (equations (1.9) and (1.10)), the conventional imaging condition (Duan and Sava, 2015) is: (1.14) Here n is the index of seismic sources, nsrc is the total number of sources, and i is x, y or z component. P and S are the P mode and S mode obtained with traditional To overcome these issues, we use the modified Helmholtz decomposition operator (equations 1.11 and 1.12), which preserves the particle motion direction, and apply a new imaging condition: Here n is the index of seismic sources and nsrc is the total number of sources. P and S are the P mode and S mode obtained with modified Helmholtz decomposition (equations 1.11 and 1.12). ! is the image generated with an individual source n and !"!#$ is the image after stacking images from all sources. ! is the weight of nth image which can be determined by the quality of the image (such as signal to noise ratio and maximum amplitude of the image).
Following Wang et al. (2015), we use the sign of the dot product of the P and S modes to correct the polarity of the image, as it maintains the polarity on both sides of the normal incident point for a given velocity contrast (from low--to--high velocities or high--to--low velocities). However, for incoming P waves with small incident angles, such as teleseismic arrivals, the dot product of the P and S modes is close to zero. So unlike in Wang et al. (2015), we use the product of the absolute P and S modes to preserve the magnitude of the converted phase. This imaging condition creates one image instead of three, because product of absolute P and S modes is a scalar instead of a vector (Wang et al., 2015). After that we normalize individual images with different weights based on the quality of the images, and stack the normalized images of different sources to generate our final image with equation (1.16). In the synthetic tests, we use the reciprocal of the maximum amplitude of the image as the weight. For real applications, the weights depend on the noise level and illumination of the images.

Numerical examples
We use the same velocity model introduced in the Method section (Figure 1.1a) to demonstrate the results of the 3D PS--RTM method. In this example, we run two synthetic experiments using two types of sources. In the first experiment (Case 1), we use deep regional sources in the model ( Since a plane wave is an approximation for teleseismic arrivals, this experiment represents the case of using teleseismic sources with different epicentral distances and back--azimuths. The simplification of the sources is justifiable as a source-equalization and de--convolution step can be applied to real earthquake data to remove the source effects. In Case 1, the results obtained with the new imaging condition (equations 1.15 and 1.16) and the background model in Figure  Notice that in the horizontal slice (Figure 1.8b), the vertical interface at the eastern end (positive Y direction) is not well imaged, this is because the 36 sources are located near the center, so they don't provide full illumination for the vertical structure near the edge of the model, which is a limitation caused by source distribution.
In Case 2, we use 126 plane waves with different incident angles (12--27 degrees with a 2.5--degree interval) and back--azimuths (0--360 degree with a 20--degree interval) to represent teleseismic P arrivals to generate synthetic seismograms. This case more closely resembles receiver function studies in terms of the source--station geometry. Compared to Case 1, the whole volume of our model is well illuminated (Figure 1.8c) due to a larger and more complete source aperture.
We note that the interfaces in the horizontal slice ( The computational cost of 3D PS--RTM is generally proportional to the number of earthquakes used in imaging. In our study, calculation for each earthquake needs about 2 hours wall--clock time with 40 CPU cores on a high--performance cluster at the University of Rhode Island. To calculate the wavefields of 126 teleseismic sources, we need a total of about 24 hours with 400 CPU cores. Bases on this experience, we anticipate that application of 3D PS--RTM will be feasible at regional (a few hundred kilometers to a thousand kilometers) scales in the near future, given an adequate seismic array and moderate (100s to 1000s CPU cores) high-performance computational resources.

Conclusions
We have developed a 3D PS--RTM method to image velocity discontinuities and scatters. Compared with the CCP method, this method is better suited to image complex structures (e.g., Moho offsets, sutures, subducting slabs). The new method builds on the 2D PS--RTM method of Shang et al. (2012) with several significant differences and improvements. First, we extend the method to 3D using a finite-difference wave equation solver in the spherical coordinate (Zhang et al., 2012), so it can be applied to 3D geological structures at regional and global scales. Second, we separate the main P arrival and its coda containing the converted S wave in the data domain and back propagate them separately. This new procedure doubles the memory requirement, but suppresses the artificial converted waves generated by the discontinuities in the reference velocity model in the back--propagated wavefields, allowing us to use realistic velocity models with discontinuities. The new procedure also suppresses the multiple P reflections (e.g., Ppmp) and artifacts caused by the incomplete boundary condition, though converted P--to--S phases from shallow reverberations could be falsely mapped as a deep structure, as in P wave receiver functions. We apply a modified Helmholtz decomposition operator to isolate P and S wave energy in the separate, back--propagated wavefields, then apply a new imaging condition to the P--and S--mode wavefields to generate images. The new image condition maintains the polarity on both sides of the normal incident point and thus yields a consistent polarity of the converted phases for a given velocity contrast (from low--to--high or high--to--low velocities) for sources with different azimuths and incidence angles, making it more straightforward to stack images obtained from earthquake sources from different azimuths and incidence angles. With these new data processing and computational procedures, the 3D PS--RTM can use regional seismic arrays to image 3D structures with a realistic background velocity model. Though care should be taken for vertical and high-dipping angle interfaces, for which the sense of the velocity contrast may change with the back azimuths of incoming waves. 23

Bibliography:
Abe, Y., T. Ohkura, K. Hirahara, and T. Shibutani (2011)       method. Among these methods, the cubic spline method is a simple and low--cost method that can be easily implemented. In this chapter, we test the efficacy of this interpolation method using synthetic data and then use the interpolated seismograms as the input data for the PS--RTM method.
In the following sections, we first introduce the methodology and procedures of the cubic spline method. Then we use synthetic seismic arrays with different station spacing to investigate the station spacing requirement for the interpolation method, and add random noises of different amplitudes to the synthetic data to investigate the level of tolerance of the method to random noise.

Method
In this study, we use cubic spline interpolation to interpolate seismograms recorded by sparse seismic stations onto dense numerical grids on the free surface (Zhang and Zheng 2015;Press et al. 2007). First, we align the seismograms along the direct P wave. In this procedure, seismic phases with time--distance move--out curvatures similar to that of the direct P wave are also aligned or nearly aligned. For synthetic data, the direct P arrival can be easily picked by choosing the time of maximum amplitude in the time series of vertical--component displacement seismograms.
Alternatively, this can be achieved with waveform cross--correlation. We align the seismograms by shifting them by the relative arrival times of the P wave at different stations, which can be expressed as in equation (2.1) where !" ′ is the aligned seismogram, !" is the original displacement seismogram, is time, and ! is the time difference of the direct P arrival between station i and the reference station. The subscript i represents the index of the station while subscript j represents one of the three components (vertical, north and east). For observed data, we use the STA/LTA method (Allen 1978(Allen , 1982 to pick the initial direct P-wave arrival times and then the multi--channel cross--correlation method to align the seismograms (VanDecar and Crosson 1990;Rondenay and Fischer 2003).
Based on the time shifts at the stations, a time--shift map can be constructed with the cubic spline method as in equation (2.2) where !" is the weight station i contribute to numerical grid k, which is calculated with the Matlab function "griddata" using the cubic spline method. The implementation of the cubic spline method in Matlab is based on triangulation, so !" is non--zero only at the grids within the three vertexes of the triangle (Press et al. 2007); ! is the interpolated relative arrival time of the direct P arrival at numerical grid k. N is the total number of seismic stations used.
The cubic spline method is used again to interpolate the seismograms at each time step of the time--shifted seismograms.
Here !" is the same weight number as equation (2.2). !" ′′ is the j component seismogram at grid k after interpolation.
The interpolated seismograms are shifted to the absolute arrival time using the relative arrival time map (equation 2.2) with the following equation: !" The shifted seismograms !" ! serve as our input data for the 3D PS--RTM package to image the subsurface structure.

Numerical tests
To test the cubic spline interpolation method, we create a model contains a homogeneous crust and a slab in an otherwise uniform upper mantle (Figure 2.1a).
The crust and the background mantle S--wave velocities are 3.900 km/s and 4.480 km/s, respectively; the corresponding P--wave velocities are 6.800 km/s and 8.036 km/s, respectively. The slab structure has a +8% velocity perturbation compared images based on P wave and its coda also suffer from the interference of shallow, multiple converted phases, as in P receiver functions. By using SdP (S wave converted to P wave at discontinuities) to image the discontinuities with the same method we can compare the results and identify the multiples.

Effect of station spacing
In the above synthetic test, we use the numerical grids on the surface as seismic We first apply the PS--RTM method to the seismograms recorded by the seismic array with the average station spacing of 5 km without interpolation (Figure 2.2a).
The result shows strong artifacts near the surface (Figure 2.3), which suggests that even for a slightly aliased seismic array the artifacts severely contaminate the RTM image. The reason is that the grids without stations have zero values, resulting in a discrete and discontinuous surface boundary condition for back--propagation. So, the interpolation method is necessary to reconstruct a smooth receiver side wavefield to create a continuous image of discontinuity interfaces.
Then we apply the cubic spline method (

Effect of random noise
To show the effect of random noise, we present another experiment, in which white noise is added to the 3 component seismograms, resulting in a signal to noise ratio (SNR) of 3, 10 and 20 (Figure 2.6). The SNR is defined as the peak amplitude of the direct P arrival divided by the standard deviation of the amplitude of random noise.
In the SNR=20 case, the amplitude of noise is at about the same level as the amplitude of converted waves.
We use the same interpolation method as discussed above to interpolate the seismograms recorded by 5--km--spacing seismic stations onto the numerical grids on the free surface. The RTM result using seismograms with SNR=20 (Figure 2.7c) shows that even with a noise level comparable with the converted wave signal, we can still image the deep structure in the upper mantle. However, in the shallow crust there are some artifacts due to the random noise and the lack of overlapping back-propagated waves from multiple stations to suppress the noise. Since we are interested in the upper mantle structure in this example and the slab structures are well imaged, the artifact in the crust is not an issue. For the image result using seismograms with SNR=10, in which the amplitude of white noise is about 2 times larger than that of the converted wave, the slab structure is still well imaged with more random noise in the background. The shallow structures are contaminated more by the artifacts than in the case with SNR=20. For the image result using seismograms with SNR=3, the image is totally contaminated by random noise. This suggests that we should be able to successfully image the deep structure with converted waves that have similar or slightly lower amplitudes than those of random noise. We attribute this to two factors: First, the converted waves from multiple stations converge coherently at depth, while random noise does not; and second, stacking of the images for teleseismic sources with difference back azimuths and incidence angles (or ray parameters) suppresses random noise.

Summary
In this chapter, we tested the cubic spline method in interpolating the surface wavefield to make use of sparsely and unevenly deployed seismic arrays with the 3D PS--RTM method. We first showed the RTM result without wavefield interpolation for a seismic array with about 5 km station spacing. The image is contaminated by strong artifact due to a discontinuous wavefield used in back-propagation simulation. By applying the cubic spline method to seismograms recorded by stations with different spacing, we showed that the interpolation method recovers the receiver--side wavefield as long as the station spacing is smaller than the half apparent wavelength of the converted waves we are interested in. At 49 larger station spacing, the cubic spline method fails to recover the converted wave generated at steep interfaces due to the differences in the move--out curves of the direct P arrival and the converted S wave. Though the most straightforward solution to this issue is to deploy denser seismic arrays, with limited resources, it is worth developing more sophisticated interpolation methods that account for the move--out curvatures of difference phases. The 3D PS--RTM method using data interpolated with the cubic spline method has a good tolerance of random noise. As long as the noise level is comparable with or even slightly larger than the converted wave we are interested in, we can image the discontinuities with seismic sources from different back--azimuths and ray parameters.

Introduction
In Chapters 1 and 2, we demonstrate the ability of the three--dimensional passive-source reverse--time migration (3D PS--RTM) method in imaging complex structures such as Moho sutures and subducting slabs with synthetic data. In this chapter, we present the application of the method to the Yellowstone hotspot and discuss data regularization in practice for an unevenly and sparsely deployed seismic array.
We select Yellowstone as our first application of the 3D PS--RTM for several reasons: First, the Yellowstone hotspot is one of the most studied intraplate hotspots, yet there is on--going debate regarding whether the source of the Yellowstone hotspot comes from a deep mantle plume (e.g., Humphreys et al., 2000;Christiansen et al., 2002;Yuan and Dueker, 2005;Waite et al, 2006;Smith et al., 2009;James et al., 2011;Fouch, 2012;Kincaid et al, 2013;Foulger et al., 2015;Leonard and Liu, 2016;Zhou et al., 2018;Zhou, 2018;Nelson and Grand, 2018); second, there have been previous studies of the mantle transition zone (MTZ) in the region using traditional receiver function methods (Schmandt et al., 2012;, so we can compare our RTM results with the previous results; finally, the USArray provides arguably the best coverage of any known hotspot for imaging the MTZ over a broad region. The depth of origin of hotspot volcanism is fundamental to the mantle plume hypothesis (Morgan, 1971) and has profound implications for plate tectonics (e.g., DeMets et al., 2010), mantle convection, and thermal and compositional evolution of the Earth's mantle (e.g., Olson et al, 1990). Shen et al. (1998) suggested using the geometry of the MTZ boundaries, the 410--and 660--km discontinuities in seismic velocities and density, to determine the depth of origin of mantle plumes. The 410-and 660--km discontinuities are generally thought to correspond to the two phase transformations from olivine (α--phase) to spinel (β--phase) crystal structure and from ringwoodite (γ--phase) to perovskite plus magnesiowüstite, with a positive (pressure increases with temperature) and a negative (pressure decreases with temperature) Clapeyron slope, respectively (e.g., Bina and Helffrich, 1994), though the mineral phase assemblage and associated Clapeyron slope change at very high temperature (e.g., Hirose, 2002;Ishii et al., 2018). The opposite Clapeyron slopes of the phase transformations predict that higher temperatures associated with a mantle thermal plume cause a deeper 410--km discontinuity and a shallower 660-km discontinuity within the plume conduit, and therefore a thinner MTZ thickness.
Indeed, receiver functions containing P--to--S converted waves at mantle discontinuities reveal a thinner MTZ thickness beneath Iceland, which was suggested as evidence for the lower mantle origin of the Iceland hotspot (Shen et al., 1998). A thinner MTZ is also observe beneath Hawaii (Li et al., 2000;Shen et al., 2003;Agius et al., 2018), the Galapagos hotspot (Hooft et al., 2003), and some but not all other hotspots (e.g., Owens et al., 2000;Li et al., 2003;Tauzin et al., 2008;Vinnik et al., 2012;Reed et al., 2016;Wei and Chen, 2016). In addition, several studies using underside SS reflections off the MTZ boundaries have also observed a thinned MTZ beneath some hotspots (e.g., Niu et al., 2002;Deuss, 2007;Cao et al., 2011;Yu et al., 2017). Because of the weak signals of the converted waves from the MTZ discontinuities, most of the receiver function studies used a ray--theory--based common--conversion point (CCP) stacking method (Dueker and Sheehan, 1997;Shen et al., 1998) to suppress noise and enhance signals.
Receiver functions have also been used to image the MTZ boundaries in the Yellowstone region with contradicting results. Using pre--USArray seismic data in the region, Fee and Dueker (2004)  Tauzin (2017) suggested a much broader (500--600 km in diameter) thinner transition zone beneath Yellowstone. , however, found no mantle transition zone anomaly beneath the Yellowstone hotspot. The latter three studies shared much of the same data in the Yellowstone region, so the different results are likely attributable to differences in data selection, processing, and the method of CCP stacking. The width of receiver function binning in the CCP method, for example, is a subjective choice that may affect the observation of the transition zone anomaly.
Furthermore, the complex finite--frequency wave phenomena such as multi--pathing and scattering, which may be significant in the Yellowstone region because of its highly heterogeneous three--dimensional velocity variations (e.g., Porritt et al., 2014;Schmandt and Lin, 2014), are not accounted for in previous receiver function studies.
By fully accounting for finite--frequency wave propagation in a 3D heterogenous structure, our PS--RTM method holds the promise of improved accuracy and resolution of the MTZ discontinuities. The results will help address the question of the origin of the Yellowstone hotspot. Furthermore, details of the MTZ discontinuities provide important constraints on mantle dynamics if the Yellowstone hotspot is fed by a mantle plume. For example, the lateral offset between the MTZ anomaly and the surface hotspot reflects the relative motion between the surface plate and the mantle driven by slab subduction and global mantle density distributions (Steinberger, 2000;Shen et al., 2002;Smith et al., 2009), and the magnitude and lateral dimension of the MTZ discontinuity topography and thickness anomaly reflect temperature and heat flux in the transition zone, which may be related to the plume temperature and heat flux at the core--mantle boundary (Zhong, 2006;Lin and van Keken, 2006;Leng and Zhong, 2008).
In this chapter, we first discuss the data processing procedures for selecting quality

Earthquake Data and Wavefield Regularization
To apply the 3D PS--RTM method , it is necessary to select quality earthquake arrivals, remove the source and instrument effects on the waveforms, suppress uncorrelated noise, and interpolate receiver functions at unevenly and often sparsely distributed seismic stations to evenly distributed numerical grids.

Data Selection
We use broadband earthquake signals recorded by the Transportable Array (TA) of the USArray and archived at the Incorporated Research Institutions for Seismology (IRIS) Data Management Center. The stations used are within 38°N to 50°N, and --117°E to --105°E, a 12° by 12° area roughly centered on Yellowstone (Figure 3.1).
We requested earthquake records between May 2 nd , 2005 and Sep 5 th , 2010 with the moment magnitude (Mw) larger than 5.0 and the epicentral distance between 28° to 98°, based on the source information in the global CMT catalog Ekström et al., 2012) (Figure 3.2). The time window of the downloaded seismograms is between 100 s before and 150 s after the direct P arrival, while the theoretical direct P arrival time is calculated using the TauP Toolkit (Crotwell et al., 1999) with the one--dimensional reference earth model iasp91 (Kennett, 1991). We use the obspy toolbox (Beyreuther et al. 2010) to preprocess seismograms with the following steps: 1, Delete the blank traces; 2, Remove the instrument response and de--trend the seismic signal; 3, Discard three--component seismic traces with SNR lower than 3.0. In this study, the SNR is calculated using the vertical--component seismogram and is defined as the ratio between the maximum amplitude of the direct P arrival and the standard deviation of the trace in the time window 10 to 100 seconds before the direct P arrival; 4, Calculate the maximum amplitude of each seismogram and compare it to the median of the maximum amplitudes of all stations for each component and for each earthquake, and then remove traces with abnormally large (larger than 2 times of the median) or small maximum amplitude (smaller than 1/10 th of the median); 5, Pick the direct P arrival with the classic STA/LTA method (Withers et al., 1998) and then use the multi--channel cross correlation method (VanDecar and Crosson, 1990) to further align the direct P arrival on the vertical--component seismograms; 6, Delete traces with the P--arrival cross correlation coefficients lower than 0.5 and skip earthquakes recorded by less than 10 stations remained after the screening in the previous steps; 7, Filter the traces using a band--pass filter with a frequency band between 0.02 Hz to 0.1 Hz; 8, Trim the seismic traces using a time window of 50 s before and 100 s after the direct P arrival and taper the traces at both ends with a 5--s time window.

Deconvolution
In traditional receiver function studies, the receiver functions are calculated by deconvolving the vertical--component seismogram from the radial--and transverse-component seismograms for each station (e.g., Langston, 1979)  = ⋆ !!! , (3.1) (3.7) In this study, the critical values for (Equation 3.5) and (Equation 3.6) are 0.05 and 0.0005, respectively, and the maximum iteration number is 500. Figure 3.3 shows an example of seismograms before and after deconvolution for an earthquake on July 16 th , 2007 near the west coast of Honshu at depth of 15.1 km and with a moment magnitude of 5.74.

Principle Component Analysis
To suppress uncorrelated noise, we use a principle component analysis ( We sort the singular values for all three components and calculate their sum as the total energy of the data. Then we keep the first few principle components corresponding to the singular values whose sum is larger than 95% of the total energy (Figure 3.4b).
In the earthquake example shown in Figure 3 Overall, it takes only the first few PCA components to reach the 95% of the energy on the vertical, radial and transverse components.

Interpolation
Following the procedures in Chapter 2, we then use the cubic spline interpolation method (Zhang and Zheng, 2015) to interpolate the three--component receiver functions from sparse stations to evenly distributed numerical grids on the surface.
We first interpolate the direct P arrival times at stations to construct an arrival time map as a function of latitude and longitude using cubic spline (

Reference Velocity Model, Back--Propagation and Stacking
We use the DNA13 velocity perturbation model (Porritt et al., 2014) added to the one--dimensional velocity model PREM (Dziewonski and Anderson, 1981)  For each selected earthquake, we back--propagate the P arrival and its coda from the free surface separately, use modified Helmholtz decomposition to further separate P and S waves in the image domain and apply a zero offset cross--correlation type image condition to calculate the correlation between P and S waves to image the mantle transition zone boundaries as described in .
Due to the distribution of seismic stations recording one specific earthquake and direction of wave propagation, this earthquake can illuminate specific part of the model. To combine the images created with different earthquakes, we develop a weighted stacking scheme, in which the weight is based on the first Fresnel zone of the P arrival with the dominant frequency for each earthquake and seismic station pair.
For each earthquake e and seismic station s, we calculate the travel time ! , , from the earthquake e to the grid , , in the model, the travel time ! , , from the grid , , to the seismometer s and the travel time !" from the earthquake e to the seismometer s. If the total travel time from earthquake to grid to seismometer is larger than the travel time from earthquake to seismometer !" plus half of the dominant period T (for grids outside of the first Fresnel zone), we set the weight to zero, otherwise we set the weight to one (Equation 3.8). The travel time ! , , , ! , , and !" are calculated with the Taup toolbox (Crotwell et al., 1999). The weights of different stations for each event are combined through Equation 3.9 and then used to obtain the stacked image with Equation 3.10.

RTM Results
We obtain the RTM model for the the image appears noisy. The 660--km discontinuity can also be identified and traced continuously along most of the profiles, though it appears as a slightly weaker feature compared to the 410--km discontinuity, In addition to the 410--and 660--km discontinuities, there is a feature at 500--520 km depth, which may correspond to the 520--km discontinuity and the β to γ phase change (e.g., Shearer, 1996), and a feature at 200--260 km depth, which may be associated with reverberation from shallower mid--lithosphere discontinuity (e.g., Abt et al., 2010;Lekic and Fischer, 2014;Liu and Gao, 2018). Because of the 70 km station spacing of the USArray, the crust and lithosphere structures are not illuminated continuously and by multiple stations, which are essential for suppressing random noise. Thus we refrain from speculating whether the feature at 200--260 km depth is a reverberation from a shallow depth or not. Herein we focus our discussion on the more prominent features in the mantle transition zone, the 410--and 660--km discontinuities.
To determine the MTZ boundaries, we first choose one depth profile beneath Yellowstone (at latitude=44.667, longitude=--111.667) as the reference profile and interpolate the profile using an even--spacing depth interval of 0.5 km. We pick the depth of the 410--km discontinuity at the maximum amplitude in the profile between 390 km to 430 km and pick the depth of the 660--km discontinuity at the maximum amplitude between 640 km to 680 km. Then cross correlation is used to pick the depths of the two discontinuities at other locations within the deep range of 390--430 km and 640--680 km, respectively. We use a bootstrap method (Efron and Gong, 1983) to estimate the discontinuity depth and uncertainty. In general, the MTZ thickness is a more accurate measurement than the depth of the 410--and 660--km discontinuities (Bina and Helffrich, 2014), because of uncertainties in shallow (<400 km depth) velocity variations affect the traveltimes of the converted waves from the 410--and 660--km discontinuities and thus their apparent depths, but do not affect the transition zone thickness due to nearly identical paths of the converted waves above the 410--km discontinuity. The MTZ thickness around the Yellowstone region is 230±7 km (Figure 3.8), about 20 km thinner than the averages of the contiguous United States (250 km) and the western United States (249 km) . For reference, the magnitude of the MTZ thinning is comparable to that of Iceland (~20 km, Shen et al., 1998)  Beneath the southwest corner of our model, previous studies (e.g., Schmandt et al. 2012) have shown the presence of subducted slabs, which have lower temperatures than the surrounding mantle. We found that both the 410 and 660 discontinuities in this region are depressed (Figure 3.9), with the 410 discontinuity depressed by about 20km and the 660 discontinuity depressed by about 30km. A possible explanation of the depressed 410 discontinuity is that highly depleted mantle with high magnesium composition, for which the phase transition from olivine to wadsleyite requires a relative high pressure (Frost 2003), is entranced by the cold down going slab. Meanwhile, the depressed 660 discontinuity is mainly caused by low temperature and possibly high water content.

Effects of Earthquake Distribution
The earthquakes used in this study are unevenly distributed around the world (Figure 3.2) and can be roughly sorted into four groups based on their back-- To test the effects of earthquake distribution on the RTM result, we stack the RTM volumes for earthquakes from the four different groups (Figure 3.10). Groups 1 and 3 yield a more clearer image of the transition zone discontinuity structure than groups 2 and 4 due to more earthquakes in the former to depress noise and artifacts. Because of the back--azimuth differences among the groups, the illuminated region for each group is different, determined by the back azimuth. For example, the southeast parts of the model are well illuminated by group 1, while the northwest parts of the model are better illuminated by group 3. We note that the 660--km discontinuity is less well illuminated than the 410--km discontinuity, possibly due to a larger lateral distance between the station and the P--to--S conversion point at the 660--km discontinuity, a relatively lower sampling density per unit area on the 660-km discontinuity than on the 410--km discontinuity, and geometric spreading of back--propagated waves that results in lower amplitudes of P and converted phases at the 660--km discontinuity than at shallower depths. By expanding the station coverage to a larger area in future studies, these differences can be reduced in the interior of the model sufficiently far away from the side boundaries of the model.

Effects of the Reference Velocity Model
The reference velocity model DNA13 we used in this study incorporates teleseismic P wave, independent SH and SV waves, and surface waves from both teleseismic earthquakes and ambient noise to constrain the relative wave--speed from the crust down into the lower mantle (Porritt et al., 2014). To understanding the sensitivity of the RTM results to the reference velocity model, we carried out RTM imaging for two additional models: PREM (Dziewonski and Anderson, 1981) and PREM perturbed with twice the velocity variation of DNA13.
The 410--km discontinuity has a depression in a 200 km by 300 km area around Yellowstone for all three reference velocity models (Figure 3.8a, 3.11a and d).
However due to the differences in the velocity, the PREM model yields the largest average 410--km discontinuity, while the model with twice the DNA13 anomalies yields the shallowest average 410--km discontinuity depth. This is especially obvious near the northern edge of the RTM models. This is caused by the low velocity anomaly in the northwest of DNA13 model. For the 660--km discontinuity, a 300km by 300km uplifted area is observed for all the image results. Similarly with 410km discontinuity, the more perturbation of DNA13 we add the shallower of the mean depth of 660km discontinuity (Figure 8b, 11b and 11e). The 660--km discontinuity depth is more sensitive of reference model than 410--km discontinuity.
On the other hand, the MTZ thickness is less sensitive to the reference model ( Figure   8c, 11c and 11f). The MTZ thickness around the Yellowstone region varies within 10km using different reference models.
These tests show that the depths of the discontinuities are sensitive to the reference model used during the migration. Extra attention should be paid to the reference model to analysis the migration result and multiple velocity models should be used to test the image result if possible.

Effects of the Dominant Frequency
The dominant frequency band may affect the RTM result in term of image resolution and reliability. In this section, we compare the results using waves in two different frequency bands: 4--50 s period and 10--50 s period (hereinafter shorter--and longer-period data, respectively). We use the same data selection and wave field regularization method described in sections 3.2 and 3.3 except the frequency band used for filtering the data. We also cut the model to a maximum depth of 765 km to reduce the computational cost.
The RTM results along the latitudinal (BB') and longitudinal (EE') directions created with PCA components containing 95% of energy are shown in Figure 3.12. Because the time windows of P and P coda waves in the data domain are chosen based on the dominant frequency (1 period after the direct P wave, ). The P--wave coda separated from P using the shorter (4 s) time length contains more signals generated in the lithosphere. Compared to the images obtained with the longer-period waves (Figure 3.6), the shorter--period data and smaller grid spacing yield sharper and more detailed images. However, they also contain significant noise and artificial boundaries in the image that have shapes similar to the Fresnel zones beneath different stations. This is because the USArray with a station spacing of 70 km has significant spatial aliasing in sampling for short--period waves (Figure 3.5).
Furthermore, the Fresnel zones beneath the stations do not overlap sufficiently, particularly at shallow depths, so noise is not sufficiently suppressed and is reflected in the image result. In contrast, the RTM result using the longer--period data show smoother and clear result for the MTZ boundaries due to longer apparent wavelengths on the surface and less aliased wavefield interpretation.
Another practical matter of using the longer--period data is a much lower computational cost. To avoid numerical dispersion and instability in finite difference simulation, the grid spacing ∆ cannot exceed 1/7 th of the minimum wavelength and the time step ∆t cannot exceed ∆x/c, where c is the apparent velocity at the 82 numerical grid (Zhang et al. 2012). For this reason, to use the 4--s period data, the total grid number is about 8 times more and the time step is about 2 time more than using the 10--s period data. With the same computational resources, the shorter-period results take about 16 times more cpu hours to obtain.  (Figure 3.13). Signals generated at relatively flat discontinuities are enhanced, while those from dipping structures or scatters are smoothed out. The RTM results obtained with fewer principle components have relative smoother topography for both the shorter--and longer--period data.

3.6.Conclusions
In this chapter, we apply the 3D PS--RTM method with USArray data to image the MTZ boundaries in the Yellowstone region. The results demonstrate the ability of the PS--RTM method in imaging the mantle discontinuities. With growing deployment of dense seismic networks, there is an increasing opportunity to use the method to image complex and detailed crustal and mantle structures such as Moho topography, sutures, and subducting slabs. We caution that although the 3D PS--RTM method is a powerful technique compared to traditional receiver function imaging methods, care must be taken in data regularization procedures such as data selection, deconvolution and PCA analysis. As in the active--source RTM method used in exploration, the 3D PS--RTM requires a relatively dense seismic network to recover the wavefield on the surface. The distribution of seismic stations is always limited by costs and logistic difficulties and the recorded signals are always contaminated by natural or instrument noise. To minimize uncorrelated noise, we develop a data regularization procedure using the PCA method, and interpolate the sparsely recorded signals onto evenly distributed numerical grids on the earth surface using cubic spline interpolation. However, when choosing the components of PCA, there is a tradeoff between reserving the signal and removing the noise. Using fewer PCA components would remove more uncorrelated noise, however it would also remove the uncorrelated signal that may be caused by scattering.
The earthquakes used to image the earth structure are located mostly at subduction zones and because of the uneven distribution of the sources, the RTM result in this study does not have a good illumination in the northeast part of the model.
Furthermore, since the lateral distance between the station and conversion point is larger for the deeper discontinuity, the 410 discontinuity is better illuminated than the 660 discontinuity. The 3D PS--RTM method is also sensitive to the reference velocity model used for migration. In this study, the RTM results using 1D and 3D reference models both show a depressed 410--km discontinuity and an uplifted 660-km discontinuity, though the average depths of the discontinuities changed when using different models.
In theory, higher frequency waves would yield more detailed images, though the maximum frequency suitable in the PS--RTM depends on the station spacing and the target depth of discontinuities. As the frequency increases, the apparent horizontal wavelength on the surface decreases. When the station spacing is greater than half of the apparent horizontal wavelength on the surface, spatial aliasing occurs, 85 causing artifacts in the interpolated wavefield. To image the MTZ with the USArray data, the (10--s) longer--period waves are a more appropriate choice than the shorter--(4 s) period waves.

Acknowledgement
Data collected by the TA network were made freely available as part of the