Augmented Implicitly Restarted Lanczos Bidiagonalization Augmented Implicitly Restarted Lanczos Bidiagonalization Methods Methods

. New restarted Lanczos bidiagonalization methods for the computation of a few of the largest or smallest singular values of a large matrix are presented. Restarting is carried out by augmentation of Krylov subspaces that arise naturally in the standard Lanczos bidiagonaliza-tion method. The augmenting vectors are associated with certain Ritz or harmonic Ritz vectors. Computed examples show the new methods to be competitive with available schemes.


Introduction.
Many problems in scientific computation require knowledge of a few of the largest or smallest singular values of a matrix and associated left and right singular vectors. These problems include the approximation of a large matrix by a matrix of low rank, the computation of the null space of a matrix, total leastsquares problems (see, e.g., Björck [4, sections 4.6 and 7.6.5]), as well as the tracking of signals; see, e.g., Comon and Golub [8] for a discussion of the latter.
Let A ∈ R ×n be a large sparse matrix. We may assume that ≥ n, because otherwise we replace the matrix by its transpose. Let This paper presents new restarted Lanczos bidiagonalization methods for computing a few of the largest or smallest singular triplets. The methods compute sequences of projections of A onto judiciously chosen low-dimensional subspaces. Restarting is implemented by augmentation of Krylov subspaces that are determined similarly as in the standard Lanczos bidiagonalization method.
Application of m steps of partial Lanczos bidiagonalization to the matrix A with initial unit vector p 1 ∈ R n yields the decompositions (1.4) where P m ∈ R n×m , Q m ∈ R ×m , P T m P m = I m , P m e 1 = p 1 , Q T m Q m = I m , r m ∈ R n , and P T m r m = 0. (1.5) Further, the matrix is upper bidiagonal, I m ∈ R m×m denotes the identity matrix, and e j denotes the jth axis vector. We refer to the decompositions (1.3)-(1.4) as a partial Lanczos bidiagonalization of A and to the vector r m in (1.4) as the residual vector; see Björck [4, section 7.6] for a recent discussion on partial Lanczos bidiagonalization. The number of bidiagonalization steps, m, is assumed to be small enough so that the decompositions (1.3)-(1.4) with the stated properties exist.
In applications of interest to us, m is not very large, and the singular triplets {σ  4) with m small for a sequence of initial vectors p 1 ; see, e.g., [5,13,14,15,16,17,28]. These methods are commonly referred to as restarted partial Lanczos bidiagonalization methods. We will comment on some of these methods below. They differ in their choice of initial vector p 1 used for the restarts and in their implementation of the restarting procedure. Ideally, we would like p 1 to be a linear combination of the right singular vectors of A associated with the desired singular values.
Sorensen [29] proposed efficient approaches for restarting the Arnoldi and the Lanczos tridiagonalization procedures. These approaches can be thought of as curtailed QR-algorithms, and, similarly to the QR-algorithms, their performance depends critically on the selection of shifts; see also [1,7,21,30] for discussions and extensions. Björck, Grimme, and Van Dooren [5] derived analogous recursion formulas for a restarted Lanczos bidiagonalization method and presented an application to the solution of ill-posed problems. Recently, Kokiopoulou, Bekas, and Gallopoulos [17] applied these recursion formulas to compute a few desired singular triplets of a large sparse matrix. The shifts are applied by "chasing the bulge" in a curtailed QR-algorithm. Other implementations of restarted Lanczos bidiagonalization are discussed in [13,14,15,16,28]. The present paper describes mathematically equivalent, but numerically more robust, implementations of the methods discussed by Kokiopoulou, Bekas, and Gallopoulos [17].
This paper is organized as follows. Section 2 reviews Lanczos bidiagonalization and introduces notation used in the remainder of the paper. Section 3 describes our implementations of restarted Lanczos bidiagonalization. Restarting is carried out by augmenting the Krylov subspaces that arise naturally in the standard Lanczos bidiagonalization method by Ritz vectors or harmonic Ritz vectors associated with desired singular triplets. A few computed examples, which compare the performance of several methods and implementations, are presented in section 4.

Lanczos bidiagonalization.
The following algorithm determines the partial Lanczos bidiagonalization (1.3)-(1.4) of the large sparse matrix A ∈ R ×n . The number of Lanczos bidiagonalization steps, m, is typically much smaller than either one of the matrix dimensions and n. Throughout this paper · denotes the Euclidean vector norm or the associated induced matrix norm. We remark that the methods described can easily be modified to apply to matrices A with complex-valued entries; the main modification required is that the matrices P m , Q m , and B m in (1. 1. P 1 := p 1 ; q 1 := Ap 1 ; 2. α 1 := q 1 ; q 1 := q 1 /α 1 ; Q 1 := q 1 ; 3. for j = 1 : m 4. r j := A T q j − α j p j ; 5. Reorthogonalization: r j := r j − P j (P T j r j ); 6. if j < m, then 7. β j := r j ; p j+1 := r j /β j ; P j+1 := [P j , p j+1 ]; 8. q j+1 := Ap j+1 − β j q j ; 9. Reorthogonalization: When the computations with Algorithm 2.1 are carried out in finite precision arithmetic and the columns of P m and Q m are not reorthogonalized, the computed columns might be far from orthogonal. We therefore reorthogonalize the columns of these matrices in lines 5 and 9 of the algorithm.
Several reorthogonalization strategies for the columns of the matrices P m and Q m are discussed in the literature. Larsen [18] found that when m is fairly large and one is interested in computing a few of the largest singular triplets of A, partial reorthogonalization gives comparable accuracy and requires less computational work than full reorthogonalization, but when a few of the smallest singular triplets are desired, often (essentially) full reorthogonalization is required to achieve high accuracy. Wu and Simon [33] report that when m is not large, full reorthogonalization should be carried out, because due to the overhead associated with partial reorthogonalization, the latter is not competitive. Moreover, Simon and Zha [28] show that when the matrix A is not very ill-conditioned, only the columns of one of the matrices P m or Q m need to be reorthogonalized. Reorthogonalization of the columns of P m only can reduce the computational effort required to compute the partial Lanczos bidiagonalization (1.3)-(1.4) considerably when n. Algorithm 2.1 can easily be modified to implement the latter approach.
The Lanczos bidiagonalization algorithm (Algorithm 2.1) requires the diagonal entries α j > 0, 1 ≤ j ≤ m, as well as the superdiagonal entries β j , 1 ≤ j < m, of B m to be nonvanishing. Assume for the moment that α j > 0 for 1 ≤ j < m, and α m = 0. Then the vector q m determined in line 8 of the algorithm vanishes, and the decompositions (1.3)-(1.4) can be expressed as where B m−1,m denotes the leading (m − 1) × m submatrix of B m . It follows that A is singular and that the singular values of B m−1,m are singular values of A. The associated singular triplets of A can be determined from P m , Q m−1 , and the singular triplets of B m−1,m . The singular values of B m−1,m are nonvanishing. Moreover, the right singular vector of A associated with the zero singular value can be expressed as a linear combination of the columns of P m , e.g., by using the singular value decomposition of B m . However, the corresponding left singular vector of A is not readily available. For simplicity, we will in the remainder of this paper assume that all α j are positive. It follows from Algorithm 2.1 that β m , the last superdiagonal element of the upper bidiagonal matrix B m+1 ∈ R (m+1)×(m+1) obtained by m + 1 steps of partial Lanczos bidiagonalization, is given by and therefore can be computed when the decompositions (1.3)-(1.4) are available. We may assume that β m > 0, because otherwise the singular values of B m are singular values of A, and the associated singular triplets of A can be determined from the singular value decomposition of B m and the matrices P m and Q m . If β m = 0 and the determined singular triplets of A include all the desired ones, then we are done; otherwise we proceed to compute additional singular triplets by the methods described in this paper.
When β m > 0, the last column of P m+1 := [P m , p m+1 ] is given by where the matrix B m,m+1 ∈ R m×(m+1) is obtained by appending the column β m e m to B m . In particular, the matrices in the decomposition (2.3) are available after m Lanczos bidiagonalization steps. We also note that B m,m+1 is the leading m × (m + 1) submatrix of the matrix B m+1 obtained after m+1 steps of Lanczos bidiagonalization.
We will use the connection between partial Lanczos bidiagonalization (1.
where the last equality follows from the fact that B m is upper bidiagonal. The matrix is symmetric and tridiagonal, and the expression (2.4) is a partial Lanczos tridiagonalization of A T A with initial vector p 1 = P m e 1 ; see, e.g., [11, section 9.1.2] for a discussion on Lanczos tridiagonalization. Since T m is tridiagonal, (2.4) shows that the columns of P m satisfy a three-term recurrence relation; the columns form an orthonormal basis of the Krylov subspace Similarly, multiplying (1.4) by A from the left-hand side yields The columns of Q m form an orthonormal basis of the Krylov subspace with q 1 := Ap 1 . We remark that since the columns of Q m are not, in general, orthogonal to Ar m , the decomposition (2.7) typically is not a Lanczos tridiagonalization of Then, analogously to (1.2), The equations (2.12) suggest that an approximate singular triplet {σ is sufficiently small. Specifically, our numerical method accepts {σ for a user-specified value of δ, where we have used (2.1). The quantity A in (2.13) is easily approximated by the singular value σ

Augmented Lanczos bidiagonalization methods.
It is well known that the implicitly restarted Arnoldi and Lanczos tridiagonalization methods described by Sorensen [29] can suffer from numerical instability due to propagated round-off errors. The instability can delay or prevent convergence of desired eigenvalues and eigenvectors; see Lehoucq and Sorensen [21] for a discussion and remedies. Morgan [25] showed that the implicitly restarted Arnoldi method by Sorensen [29] can be implemented by augmenting the available Krylov subspace basis by certain Ritz vectors. Such an implementation can be less sensitive to propagated round-off errors than the implementation in [29]. Recently, Wu and Simon [33] described a so-called thickrestarted Lanczos tridiagonalization method for the symmetric eigenvalue problem. The method is based on augmenting Krylov subspaces by certain Ritz vectors; it is simple to implement and is mathematically equivalent to the implicitly restarted Lanczos tridiagonalization method of Sorensen [29]. This paper presents thick-restarted Lanczos bidiagonalization methods. We remark that thick-restarting techniques have also been used in the context of the Jacobi-Davidson and Arnoldi methods for eigenvalue computations; see Stathopoulos and Saad [31] and Stathopoulos, Saad, and Wu [32] for discussions and analyses.

Augmentation by Ritz vectors.
Let the partial Lanczos bidiagonalization (1.3)-(1.4) be available, and assume that we are interested in determining the k largest singular triplets of A, where k < m. Note that the approximate right singular The observation that the right-hand sides are parallel to r m for all j forms the basis of the restarted Lanczos tridiagonalization method by Wu and Simon [33], as well as of the restarted Lanczos bidiagonalization method of this subsection.
We j , 1 ≤ j ≤ k, associated with the k largest Ritz values be available, assume that r m = 0, and introduce the matrix In view of (2.2), the last column ofP k+1 is parallel to the residual error (3.1). It follows from (2.11) that where the remainderr k is orthogonal to the vectorsũ j ) T Ap m+1 can be evaluated inexpensively by using the right-hand side of (ũ We may assume that the vectorr k is nonvanishing, because otherwise we can terminate the iterations; see below. Introduce the matrices Thus,B k+1 may have nonvanishing entries only on the diagonal and in the last column.
We turn to the matrix which we would like to express in terms ofP k+1 andB T k+1 . This will give an analogue of the decomposition (1.4). The first k columns of (3.8) are linear combinations of the vectorsṽ (A) j and p m+1 ; specifically, where we have used (2.2). The last column of (3.8) is orthogonal to the Ritz vectors v and therefore it can be expressed as it follows from (3.10) thatγ 1 = r k . This observation, together with (3.9) and (3.10), gives the expression (3.11) which is the desired analogue of the decomposition (1.4). We remark thatf k+1 can be computed from (3.10).
The similarity of the decompositions (3.7) and (3.11), and (1.3)-(1.4), suggests that it may be possible to append new columns to the matricesP k+1 andQ k+1 in a way similar to Lanczos bidiagonalization. We will show that this is indeed the case. For notational simplicity, denote the columns ofP k+1 andQ k+1 byp j andq j , respectively, i.e., We may assume thatf k+1 = 0, because otherwise it follows from (3.7) and (3.11) that the singular values ofB k+1 are singular values of A, and we are done. Thus, let β k+1 := f k+1 andp k+2 :=f k+1 /β k+1 . Then the matrixP k+2 := [P k+1 ,p k+2 ] has orthonormal columns. Letα (3.12) whereα k+2 > 0 is a scaling factor, such thatq k+2 is of unit length. We comment on the possibility ofα k+2 vanishing below. Equation (3.11) yields (3.13) and substituting (3.13) into (3.12) shows that , and defineB k+2 ∈ R (k+2)×(k+2) by first appending the columnβ k+1 e k+1 and then the rowα k+2 e T k+2 toB k+1 , i.e., It now follows from (3.7) and (3.14) that We turn to the derivation of a decomposition of the form (3.13) with k+1 replaced by k + 2. Letβ whereβ k+2 > 0 is a scaling factor, such thatp k+3 is of unit length. We may assume that a positive coefficientβ k+2 exists; otherwise we are done; see below. Substituting (3.15) into (3.16) yields which, together with (3.13), shows that The decompositions (3.15) and (3.17) are analogous to (3.7) and (3.13). We therefore can continue in the same fashion by appending new columns to the matricesP j and Q j , and new rows and columns to the matricesB j , for j = k + 2, k + 3, . . . . After a total of m − k steps, we obtain the decompositions (3.18) where the matricesP m andQ m have orthonormal columns and The method now proceeds by first computing the singular value decomposition of B m and then determining the k largest approximate singular triplets of A from the k largest singular triplets ofB m ; cf. (2.11). These triplets define new decompositions of the form (3.7) and (3.11), from which we compute new decompositions of the form (3.18). The computations proceed in this manner until sufficiently accurate approximations of the k largest singular triplets of A have been determined. An algorithm is presented in subsection 3.3 below.
We remark that the computations are analogous for determining the k smallest singular triplets of A. The vectorsṽ 2) should then be replaced by the right approximate singular vectors of the k smallest available approximate singular triplets of A. These approximate singular triplets are used to define decompositions (3.7) and (3.11), from which we compute decompositions (3.18) in the same manner as described above. We are interested in the k smallest singular triplets of the matrixB m in the decompositions (3.18). These triplets yield approximations of the k smallest triplets of A. The computations are continued until sufficiently accurate approximations of the k smallest singular triplets of A have been found. However, we note that when the k smallest singular triplets of A are desired, it can be advantageous to augment by harmonic Ritz vectors instead. This is discussed in subsection 3.2.
Finally, we comment on the cases whenr k vanishes in (3.4) and when the lefthand sides of (3.12) and (3.16) vanish. In all these cases, one can show that the singular values of the matrixB j also are singular values of A and that the singular triplets ofB j yield singular triplets of A.
The eigenpairs {θ j ,ŵ j } of (3.19) can be computed without forming the matrix B T m B m as follows. Let Then (3.19) can be expressed as where we may choose the eigenvectors w j to be orthonormal. Let B m,m+1 be the matrix in (2.3), and note that m+1 , and let them be enumerated so that This enumeration differs from the one for the singular values of B m (cf. (2.9)), because in the present subsection we are concerned with the computation of the k < m smallest singular triplets of A. Throughout this subsection, we use the following simplified notation: The k smallest singular triplets of B m,m+1 determine the matrices where U k and V k have orthonormal columns, and We refer to (3.25) as a partial singular value decomposition of B m,m+1 . It follows from (3.22) and (3.25) that {(σ j ) 2 , u j } is an eigenpair of (3.21), and (3.20) shows that {(σ j ) 2 , B −1 m u j } is an eigenpair of (3.19). Thus, the eigenpairs of (3.19) and (3.21) associated with the k smallest eigenvalues can be determined from the partial singular value decomposition (3.25). Gu and Eisenstat [12] describe how the singular value decomposition of B m,m+1 can be computed by updating the singular value decomposition of B m ; see also Bunch and Nielsen [6]. However, when A is large and m is small, the computational effort required for determining the singular value decompositions of B m and B m,m+1 is negligible. We will therefore not dwell on the computation of (3.25).
The harmonic Ritz vector of A T A associated with the harmonic Ritz valueθ j is given byv j := P mŵj ; (3.26) see, e.g., [24,27]. Morgan and Zeng [26] recently pointed out that the residual errors associated with different harmonic Ritz pairs {θ j ,v j } are parallel. We show this result for the problem at hand, because this property is central for our augmentation method. Thus, using (2.4), (3.20), (3.21), and (3.26), we obtain It is convenient to define the scaled residual vector where we have used (2.2).

Equations (1.3) and (3.28) yield
In the derivation of our method, we assume only that the matrix B m is upper triangular, because B m has this property after restarts. Below we comment on possible simplifications when B m is upper bidiagonal. LetQ The columns of this matrix are orthonormal, and we define the Fourier coefficientŝ The vectorα is orthogonal to the columns ofQ k , and the scaling factorα k+1 > 0 is chosen so that q k+1 is of unit length. It follows that Thus,Q k+1 has orthonormal columns, andB k+1 is the product of two upper triangular matrices, one of which has nonzero entries only on the diagonal and in the last column. In particular,B k+1 is upper triangular. The decomposition (3.31) is the desired analogue of (3.7). When B m is given by (1.6), one can show thatα k+1 = α m+1 , q k+1 = q m+1 , andĉ k = 0.
We now derive an analogue of the decomposition (3.11). LetQ k be given by (3.30). Then (3.25) yields It follows from the decomposition on the left-hand side of (3.25) that and therefore

Then (3.40) can be written in the form
which is an analogue of (2.3). Given the decompositions (3.31) and (3.40), we can proceed analogously as in subsection 3.1 to compute the decompositions whereP m ∈ R n×m has orthonormal columns with leading n × (k + 2) submatrixP k+2 , Q m ∈ R ×m has orthonormal columns with leading × (k + 1) submatrixQ k+1 , and , where the singular values are enumerated as in (3.23). In this case, we switch from augmentation by harmonic Ritz vectors to augmentation by Ritz vectors. Details are provided in the following subsection.

An augmented Lanczos bidiagonalization algorithm.
We describe an algorithm for the computation of a few of the largest or smallest singular triplets of a large matrix A. The algorithm is based on augmentation by Ritz vectors or harmonic Ritz vectors as described in the previous subsections. The Boolean variable harmonic suggests the type of augmentation used. If harmonic is true and B m is not too illconditioned, then the augmentation scheme of subsection 3.2 is applied; otherwise augmentation is carried out according to subsection 3.1.

Input: A ∈ R ×n or functions for evaluating matrix-vector products with the matrices A and A T ,
Append m − k columns to the matrices P and Q, and m − k rows and columns to the matrix B. Denote the matrices so obtained by P m , Q m , and B m , respectively. Determine a new residual vector and denote it by r m . 6. Goto 2.
The above algorithm is a simplification of the actual computations carried out. For instance, the algorithm exits when a matrix B m that is numerically singular has been detected, but the available decompositions of A can be used to determine singular triplets of A; see the discussion in section 1. Moreover, the number of augmented vectors used at each restart is typically larger than the number of desired singular triplets. Assume that k of the desired k singular triplets have been found. We then augment by k + k (instead of k) singular triplets, where k is chosen as large as possible, such that k ≤ k and k + k ≤ m − 3. The term −3 secures that at least three orthogonalization steps can be carried out between restarts. This approach has been advocated by Lehoucq [20] in the context of the implicitly restarted Arnoldi method. It often yields faster convergence without increasing the memory requirement. Finally, our implementation enforces two-sided reorthogonalization when κ(B m ) > −1/2 . MATLAB code is available at the authors' home pages.

Numerical examples.
All computations were carried out using MATLAB version 6.5 R13 on a Dell 530 workstation with two 2.4 GHz (512k cache) Xeon processors and 2 GB (400 MHz) of memory running under the Windows XP operating system. Machine epsilon is = 2.2 · 10 −16 .
We compare our methods, outlined by Algorithm 3.1, with methods recently proposed by Hochstenbach [14,13] and Kokiopoulou, Bekas, and Gallopoulos [17], as well as with the MATLAB internal function svds and the scheme proposed in [2]. Hochstenbach [14,13] presents a Jacobi-Davidson method. This is a powerful scheme when a good preconditioner for the linear system of equations that has to be solved is available. In our computed examples, we assume that no good preconditioner is known, and we apply the method without preconditioner. The linear system of equations is solved by the GMRES iterative method. The MATLAB implementation 1 jdsvd offers several extraction choices, such as standard, u-harmonic, v-harmonic, double-harmonic, and refined. In our numerical examples, refined extraction often gave best accuracy. This is consistent with results reported by Hochstenbach [14]. We refer to the Jacobi-Davidson method with refined extraction as jdsvd(Ref). The code jdsvd is still under development, and we used the version available to us at the time of the numerical experiments.
Our methods are mathematically, but not numerically, equivalent to the methods proposed by Kokiopoulou, Bekas, and Gallopoulos [17]. We used the MATLAB implementation irlanb by Kokiopoulou, Bekas, and Gallopoulos [17] in our comparison. The code irlanb calls Larsen's MATLAB code lanbpro [18] to compute partial Lanczos bidiagonalizations with partial reorthogonalization. irlanb is designed for computing a few of the smallest singular triplets but not for computing a few of the largest ones. The code therefore is not used in Examples 4 and 5 below. Ritz values and harmonic Ritz values can be used as shifts. We refer to irlanb with these shift selections as irlanb(R) and irlanb(H), respectively. The code irlanb is still under development, and we report results for the version available to us at the time of the numerical experiments.
The internal MATLAB function svds uses FORTRAN codes of ARPACK [22]. It calls an eigenvalue routine to compute eigenvalue-eigenvector pairs associated with positive eigenvalues of the symmetric matrix The matrix Z has the eigenvalues n , as well as -n zero eigenvalues, where we as usual assume that ≥ n. The eigenvectors of Z yield both the right and left singular vectors of A. The method requires the computation of eigenpairs associated with eigenvalues near zero when determining the smallest singular triplets of A. This can be difficult when Z has many positive and negative eigenvalues of large magnitude. For this reason, the svds function uses a shift-and-invert approach for computing the smallest singular triplets. This requires factorization of the matrix Z, and therefore the function svds typically demands much more storage than the other methods in our comparison. Since svds does not yield the number of linear systems of equations that have to be solved, we report only the CPU time required.
The MATLAB code irblsvds implements an implicitly restarted block-Lanczos method applied to the matrix (4.1) for the computation of a few singular triplets of A. The method and code are described in [2,3]. 2 The matrix Z is used only for evaluation of matrix-vector products; in particular, the matrix Z is not factored. The code irblsvds is not, in general, well suited for computing the smallest singular triplets of A when −n is large, because then Z has −n zero eigenvalues, and the code may determine these eigenvalues and associated eigenvectors instead of eigenpairs associated with tiny singular triplets of A. Unless indicated otherwise, we use the default value 3 for the block-size in our experiments with irblsvds. The largest number of consecutive block-Lanczos steps is chosen so that irblsvds has about the same storage requirement as the other codes. We note that of the methods used in the examples of this section, only the ones of the present paper are based on augmented matrix formulations.
The schemes of subsections 3.1 and 3.2 are implemented by the MATLAB code irlba. 2 The execution of irlba is determined by certain user-specified parameters; see Table 4.1. We refer to the augmentation method of subsection 3.1, based on Ritz vectors, as irlba(R) and to the augmentation method of subsection 3.2, based on harmonic Ritz vectors, as irlba(H). The scheme irlba(R) with one-and two-sided full reorthogonalization is referred to as irlba(R1) and irlba(R2), respectively. The analogous implementations of irlba(H) are denoted by irlba(H1) and irlba(H2). One-sided full reorthogonalization reorthogonalizes the columns of the smaller of the matrices P m and Q m . Thus, when ≥ n, the columns of P m are reorthogonalized.
The codes irblsvds, irlanb, jdsvd, and svds allow a user to choose numerous parameters that affect their performance. Unless stated otherwise, we use the default values for the parameters. Except for the function svds, we choose the parameters that affect storage so that all codes require about the same maximum computer storage.
It is impossible to use the same starting vector for all the methods, since some routines work with A and A T and others with the matrix Z. To make our comparison less dependent on the choice of starting vector, we record the best results for each method over five runs using default random starting vector(s) generated by each code. We use the same starting vector when the same method is applied with different parameter values. The reported number of matrix-vector products is the total number of matrix-vector product evaluations with A and A T for irlba, irlanb, and jdsvd and with Z for irblsvds. Each code was instructed to determine only one singular triplet, the smallest one. This corresponds to the parameter values sigma = SS and k = 1 for irlba. We allowed each of the codes irlba, irlanb, jdsvd, and irblsvds about the same amount of storage as required for carrying out 20 Lanczos bidiagonalization steps. In particular, the largest number of consecutive block-Lanczos steps allowed by irblsvds was limited to 7 (with block-size 3). This corresponds to about the same storage requirement as 21 Lanczos bidiagonalization steps. We let adjust := 4 for irbla; see Table 4.1. This forces both the irlba and irlanb codes to apply the same number of bidiagonalization steps in the first restart. We chose the tolerance 10 −6 for all codes, i.e., δ := 10 −6 in (2.13). For the jdsvd code we report only refined extraction, because this extraction method was the fastest. Figure 4.1 displays the CPU times required for the different methods. The MATLAB svds function is seen to be fastest. This may depend on the fact that svds is not coded in MATLAB and that it is based on a shift-and-invert approach. For large problems shift-and-invert may require unacceptable amounts of memory and execution time; however, when computation and storage of factors of matrices of the form Z − τ I, where τ is a scalar, is feasible, then this approach is attractive. Among the methods that use only the matrices A and A T for evaluating matrix-vector products, irlba is seen to be competitive. normally distributed entries with mean zero and variance one. The command A(:, 1) = A(:, 10) overwrites column 1 by column 10 and secures that the matrix obtained has a zero singular value. This example illustrates that augmentation by harmonic Ritz vectors can give substantially faster convergence than augmentation by Ritz vectors. The code irlba was used with k = 1, δ = 10 −16 , reorth = TWO, and steps = 30. Figure 4.2 shows the convergence to zero of the smallest computed singular value. The vertical axis displays the absolute error in the smallest computed singular value, i.e., the smallest computed singular value, and the horizontal axis shows the number of restarts. The cross-over label indicates when irlba switched from augmenting with harmonic Ritz vectors to augmenting with Ritz vectors. As mentioned in subsection 3.2, augmentation by harmonic Ritz vectors requires the solution of a linear system of equations with the matrix B m (while augmentation by Ritz vectors does not). We switch from augmentation by harmonic Ritz vectors to augmentation by Ritz vectors when the condition number of the matrix B m is larger than the square root of the reciprocal of machine epsilon; cf. Algorithm 3.1.
We remark that computing a singular triplet with a numerically vanishing singular value is not always possible since the approximated left singular vectorũ (A) j lives in the Krylov subspace (2.8), which is restricted to the range of A. However, in the presence of round-off errors, irlba is often able to successfully compute singular triplets associated with zero singular values. Example 3 (smallest singular values). We consider the 1033 × 320 matrix WELL1033 and the 1850 × 712 matrix WELL1850 from the set LSQ in the Harwell-Boeing sparse matrix collection [9]. These matrices arise from surveying problems. We would like to determine the six smallest singular triplets of these matrices and select the parameters for the different codes accordingly. For instance, for irlba we let sigma := SS and k := 6. All codes are allowed the amount of storage required for 40 Lanczos bidiagonalization steps. The number of implicit QR-steps in irlanb is chosen to be 31; this choice is consistent with the default value of the parameter adjust of irlba. It forces irbla and irlanb to apply the same number of Lanczos bidiagonalization steps in the first restart. The tolerance for all methods is set to δ = 10 −6 . irlanb with shifts chosen to be harmonic Ritz values gave better results than when Ritz values were used as shifts. We therefore report only results for the former. The restarted block-Lanczos method irblsvds used block-size 4 and was allowed to carry out at most 10 consecutive block-Lanczos steps between restarts. jdsvd gave the best results for refined extraction. Therefore we do not report results for other extraction methods. The MATLAB svds function was unable to find any of the desired singular triplets to requested accuracy for either of the matrices.  all six singular values with the fewest matrix-vector product evaluations and the least CPU time for both matrices. Example 4 (largest singular values). The matrices MEDLINE, CRANFIELD, and CISI are standard term-by-document test matrices and can be obtained from the Cornell SMART FTP server 3 or the TMG web page. 4 They are of size 5735 × 1033, 4563 × 1398, and 5544 × 1460, respectively. We also consider the term-by-document matrix HYPATIA 5 of size 11390 × 1265 with 109056 nonzero terms from the web server at the Department of Mathematics, University of Rhode Island. HYPATIA was created in the same manner as standard term-by-document test matrices. All test matrices use the local term frequency (i.e., number of times a word occurs on a website) and have no global weighting or normalization. The parameter values chosen for the different methods are consistent with our desire to determine the 10 largest singular triplets for each of these matrices; for irlba we let sigma :=LS and k := 10. The available storage is assumed to be large enough to simultaneously store all vectors generated during 20 consecutive steps of Lanczos bidiagonalization. The tolerance δ in the stopping criterion is 10 −6 . Since we seek to determine the largest singular triplets, irbla uses the augmentation method of section 3.1. We report the results for one-sided and two-sided full reorthogonalization. The code irblsvds was used with block-size 4 and was allowed to carry out at most five consecutive block-Lanczos steps between restarts. The fastest extraction method for jdsvd was refined. Table  4.3 displays the performance of the methods and illustrates the competitiveness of irlba. The code irlanb is not part of our comparison, since it is designed for the computation of only a few of the smallest singular triplets. Example 5 (condition number). We would like to determine the condition number κ(A) := σ  6 Thus, A has ones across the top row and μ on the subdiagonal; the remaining matrix entries are zero. This matrix often is used to illustrate the drawback of forming A T A in least-squares computations; see [19]. The Läuchli matrix A is nonsingular; it has the simple singular value n + μ 2 and the singular value μ of multiplicity n − 1 giving the condition number κ(A) = 9.490724975767860 · 10 9 , and therefore A T A is numerically singular. irlba only implicitly works with the matrix A T A and is able to compute κ(A). Specifically, we let sigma :=LS and SS, k := 1, and we allow the storage required for all vectors generated by 20 consecutive steps of Lanczos bidiagonalization. The tolerance δ in the stopping criterion is set to machine epsilon . Two-sided full reorthogonalization was employed. We augmented by Ritz vectors when computing the largest singular value and, until the matrices B m became ill-conditioned, by harmonic Ritz vectors when determining the smallest singular value. irlba required 0.703 and 0.796 seconds of CPU time to compute the largest and smallest singular values of A, respectively, and gave the condition number 9.490724975767925 · 10 9 with a relative error of 6.83 · 10 −15 . Thus, the fact that A T A is numerically singular does not cause irlba problems.