randKrylov is a MATLAB library that features randomized Krylov methods for solving large linear systems and eigenvalue problems. The functions randgmres
and randeigs
can be over 4x and over 10x faster, respectively, than the built-in analogues gmres
and eigs
, while providing the same (or even greater) accuracy and robustness.
randgmres
can provide solutions to linear systems several times faster than the built-in gmres
when the GMRES method takes many inner iterations:
if gmres
. At the same time, the RGS yields as accurate solution with a very high probability randgmres
uses a new randomized criterion for the solution certification, which proves to be more robust than the criterion used in gmres
. In particular, when gmres
reports error stagnation, randgmres
can report the stagnation much sooner, or even detect the absence of stagnation and reduce the error to a greater extent (see experiments).
randeigs
can solve nonsymmetric eigenvalue problems over 10x faster than the built-in eigs
. The reduction in computational cost is achieved primarily due to orthogonalizing the Krylov basis with the RGS process. The speedup is greater when a large Krylov basis is used. By default, randeigs
re-orthogonalizes the computed Krylov basis with a Cholesky QR step at the end of the computation. Such re-orthogonalization should have only minor cost and is employed to ensure that randeigs
provides the same solution as eigs
. For many problems this step may be unnecessary and can be ignored by specifying the corresponding flag. In this case randeigs
will provide a randomized Rayleigh-Ritz solution which can have a similar quality as the regular Rayleigh-Ritz solution from eigs
.
An additional distinguishing feature of randgmres
and randeigs
is the ability to perform the dominant operations in single precision that can halve the memory usage and speed up the computations.
x = randgmres(A,b)
attempts to solve n x n linear system Ax = b.
By default, we use unrestarted method with m = min(n/10,500) total
iterations.
x = randgmres(afun,b)
takes a function handle afun instead of matrix A,
where afun(t) outputs the matrix-vector products At.
To incorporate preconditioning with preconditioner P = (M1M2)-1,
use afun = @(x) A*(M2\(M1\x))
or afun = @(x) M2\(M1\(A*x))
.
In the following syntaxes, the matrix A can be replaced by Afun.
x = randgmres(A,b,restart)
restarts the algorithm every m = restart
iterations. If restart is n or []
then randgmres
uses unrestarted method.
x = randgmres(A,b,restart,tol)
specifies desired accuracy of the solution
measured by relative residual error. If tol is []
then we take tol = 1e-6.
x = randgmres(A,b,restart,tol,maxit)
specifies the maximum number of
outer iterations. If maxit is []
then randgmres
uses the default value
min(n/restart,10).
x = randgmres(A,b,restart,tol,maxit,x0)
specifies the initial guess.
If x0 is []
then randgmres
takes x0 as n x 1 zero vector.
x = randgmres(A,b,restart,tol,maxit,x0,thetagenfun)
specifies
parameterless function thetagenfun()
that draws a random sketching
matrix Theta, given as function handle, i.e., Theta(t) = Theta t.
By default, we take Theta as SRHT matrix with k = 2m log(n)/log(m)
rows.
x = randgmres(A,b,restart,tol,maxit,x0,thetagenfun,lssolver)
specifies
the method used for solving the sketched least-squares problems in
randgmres
. The options include '3reorth', '5reorth' (default),
'20reorth' corresponding to, repsectively, 3, 5, or 20 Richardson
iterations, or 'CG' corresponding to 20 iterations of CG, applied to the
normal equation.
x = randgmres(A,b,restart,tol,maxit,x0,thetagenfun,lssolver,lowprecision)
specifies a flag whether to compute the Krylov basis vectors in single
precision, while performing other (minor) operations in double
precision. By default, lowprecision = 0, i.e., all the operations are
performed in double precision.
[x,output] = randgmres(___)
returns the output containing the estimated
residual and stability measure at each iteration, and stagnation/convergence
lags.
D = randeigs(A)
returns 6 largest eigenvalues of a square matrix A.
D = randeigs(afun)
takes a function handle afun instead of matrix A,
where afun(t) outputs the matrix-vector products At.
In the following syntaxes, the matrix A can be replaced by afun.
D = randeigs(A,B)
solves the generalized eigenvalue problem
Ax = []
, then
randeigs solves the standard eigenvalue problem Ax =
[V,D] = randeigs(A,B)
returns a diagonal matrix D composed of the largest
eigenvalues of A, and a matrix V whose columns are the corresponding
eigenvectors.
[V,D,flag] = randeigs(A,B)
also returns a convergence flag indicating
the convergence of the eigenvalues.
randeigs(A,B,K)
returns K largest eigenvalues.
randeigs(A,B,K,method)
returns K eigenvalues. The options for method
are
same as in built-in eigs function:
'largestabs' or 'smallestabs' - largest or smallest magnitude
'largestreal' or 'smallestreal' - largest or smallest real part
'bothendsreal' - K/2 values with largest and
smallest real part, respectively
(one more from largest if K is odd)
For nonsymmetric problems, method
can also be:
'largestimag' or 'smallestimag' - largest or smallest imaginary part
'bothendsimag' - K/2 values with largest and
smallest imaginary part, respectively
(one more from largest if K is odd)
If method = sigma
(scalar), then randeigs finds the eigenvalues
closest to sigma
.
randeigs(A,B,K,sigma,name,value)
specifies additional options given by the option's name followed by its value. The variable name can be:
'Tolerance' - convergence tolerance. Default value: 1e-14.
'MaxIterations' - maximum number of iterations. Default value: 10
'SubspaceDimension' - size of Krylov subspace. Default value:
max(2*K,500).
'StartVector' - starting vector. Default value: n x 1 random vector.
'Display'- display diagnostic messages. Default value: 0.
'InvertOperator' - if method = sigma (scalar) or method = 'smallestabs',
then we take method = 'largestabs', lambda = sigma+1/lambda
and A = (A-sigma*B)^(-1) computed implicitly with Cholesky or lu
decomposition. In this case, if used, Afun must output products
with (A-sigma*B)^(-1). Default value: 1.
'SketchingMatrixFun' - specifies parameterless function thetagenfun()
that draws a random sketching matrix Theta, given as function handle,
i.e., Theta(t) = Theta*t. By default, we take Theta as SRHT matrix with
k = 2*m*log(n)/log(m) rows, where m is the subspace dimension.
'LSsolver' - method used for solving the sketched least-squares problems
The options include '3reorth', '5reorth' (default), '20reorth'
corresponding to, repsectively, 3, 5, or 20 Richardson iterations, or 'CG'
corresponding to 20 iterations of CG, applied to the normal equation.
'LowPrecision' - compute the Krylov basis vectors in single precision,
while performing other (minor) operations in double precision.
By default, LowPrecision = 0, i.e., all the operations are performed in
double precision.
'ExactProj' - augment rand. Arnoldi algorithm with Cholesky QR of the
Krylov basis matrix. Performing this step provides a classical
Rayleigh-Ritz approximation of the eigenpairs, which can be preferred to
its randomized variant due to the strong optimality guarantees. For some
problems, however, this can be an unecessary expensive computation.
Default value: 1.
Here is a runtime/robustness comparison of randgmres
with the built-in gmres
.
The tests were performed in MATLAB R2021b on a node with 192GB of RAM and 2x Cacade Lake Intel Xeon 5218 16 cores 2.4GHz processor.
The linear systems were taken from the SuiteSparse matrix collection with random right-hand-side vectors. The linear systems ML_Geer, Ga41As41H72 and vas_stokes_1M were preconditioned with ilu
factorization. For SiO2 and atmosmodd, the randgmres
function was executed in the low-precision mode (i.e., by taking lowprecision = 1
) requiring half the memory used by gmres
.
The following table reports the observed runtimes and residual errors of gmres
and randgmres
for varying inputs. It is revealed that randgmres
has provided an accurate solution up to 4x faster than gmres
.
For the ML_Geer, Ga41As41H72, and vas_stokes_1M systems, the use of a large number randgmres
was able to identify the stagnation up to 60x faster than gmres
due to the new randomized criterion.
For the ML_Geer system with gmres
reported an error stagnation with a residual error of 3x10-5, while randgmres
refined the solution to a much greater extent (in addition to being thrice as fast), which demonstrates the robustness of randgmres
.
system | size | restart | maxit | gmres time |
gmres error |
randgmres time |
randgmres error |
---|---|---|---|---|---|---|---|
ML_Geer | 1.5x106 | 1500 | 5 | 3500s | 3x10-5 | 1300s | 3.4x10-10 |
ML_Geer | 1.5x106 | 500 | 50 | 4500s | 9.8x10-10 | 2500s | 5.7x10-10 |
ML_Geer | 1.5x106 | 200 | 150 | 23300s | 9.8x10-1 | 380s | 1.1x100 |
Ga41As41H72 | 2.7x105 | 2000 | 1 | 1200s | 1.1x10-7 | 300s | 1.9x10-7 |
Ga41As41H72 | 2.7x105 | 500 | 50 | 6600s | 2.4x10-2 | 890s | 2.8x10-2 |
Ga41As41H72 | 2.7x105 | 200 | 150 | 4890s | 4.4x10-2 | 260s | 5.5x10-2 |
vas_stokes_1M | 1.1x106 | 800 | 1 | 820s | 1.1x10-12 | 340s | 6.13x10-13 |
vas_stokes_1M | 1.1x106 | 400 | 10 | 1300s | 6.5x10-13 | 770s | 4.4x10-13 |
vas_stokes_1M | 1.1x106 | 200 | 40 | 3400s | 7.7x10-3 | 290s | 9.8x10-3 |
SiO2 | 1.5x105 | 400 | 5 | 90s | 1x10-10 | 40s | 2.7x10-11 |
atmosmodd | 1.3x106 | 300 | 5 | 210s | 9.7x10-11 | 130s | 2.4x10-11 |
The table below shows the average runtimes taken by a single inner iteration in gmres
and randgmres
for varying inputs.
system | size | restart | maxit | gmres time |
randgmres time |
---|---|---|---|---|---|
ML_Geer | 1.5x106 | 1500 | 5 | 2.6s per it. | 0.78s per it. |
ML_Geer | 1.5x106 | 500 | 50 | 1.42s per it. | 0.72s per it. |
Ga41As41H72 | 2.7x105 | 2000 | 1 | 0.67s per it. | 0.17s per it. |
vas_stokes_1M | 1.1x106 | 800 | 1 | 1.03s per it. | 0.43s per it. |
vas_stokes_1M | 1.1x106 | 400 | 10 | 0.57s per it. | 0.36s per it. |
SiO2 | 1.5x105 | 400 | 5 | 0.11s per it. | 0.04s per it. |
atmosmodd | 1.3x106 | 300 | 5 | 0.36s per it. | 0.2s per it. |
Next we compare randeigs
with built-in eigs
. The tests again were executed in MATLAB R2021b on a node with 192GB of RAM and 2x Cacade Lake Intel Xeon 5218 16 cores 2.4GHz processor. The goal was to compute the lu
decomposition, therefore they were shifted using A = speye(n,n)-Ain/(lambdamax+0.1)
, with Ain denoting the input matrix of size n and lambdamax
the magnitude of the largest-magnitute eigenvalue of Ain . Then the eigenpairs of A with the largest-magnitude eigenvalues were computed (taking method = largestabs
). The matrix t2em, on the other hand, could be efficiently inverted with lu
. Consequently, its eigenpairs with the smallest eigenvalues were computed directly without shifting (taking A = Ain
and method = smallestabs
). In the experiments the options 'SubspaceDimension'
, 'MaxIterations'
, 'Tolerance'
and 'ExactProj'
were taken as m, maxiter, tol and 1.
In all the tests, randeigs
and eigs
provided the same solutions up to machine precision. Their runtimes are shown below.
system | size | K | m | maxit | tol | eigs time |
randeigs time |
---|---|---|---|---|---|---|---|
ML_Geer | 1.5x106 | 20 | 1000 | 50 | 10-10 | 151200s | 13400s |
ML_Geer | 1.5x106 | 20 | 400 | 150 | 10-10 | 66800s | 13300s |
ML_Geer | 1.5x106 | 20 | 100 | 500 | 10-10 | 26200s | -- |
vas_stokes_1M | 1.1x106 | 50 | 1500 | 25 | 10-10 | 177500s | 13000s |
vas_stokes_1M | 1.1x106 | 50 | 200 | 2000 | 10-10 | 31100s | -- |
t2em | 9.2x105 | 300 | 800 | 5 | 10-10 | 2300s | 280s |
t2em | 9.2x105 | 300 | 600 | 5 | 10-10 | 1600s | 290s |