DominantSparseEigenAD is an extension of PyTorch that handles reverse-mode automatic differentiation of dominant eigen-decomposition process.
In many researches and applications involving matrix diagonalization, typically in the context of eigenvalue problem in quantum mechanics, only a small proportion of eigenvalues (e.g., the smallest ones) and corresponding eigenvectors are of practical interest. This library provides corresponding primitives in the framework of PyTorch to automatically differentiate this process, without any direct access to the full spectrum of the matrix being diagonalized.
$ pip install DominantSparseEigenAD
Please check out the examples directory.
- Hamiltonian engineering in 1D coordinate space. A 1D potential is fitted in order to match a given ground-state wave function.
- Exact-diagonalization investigation of quantum phase transition in 1D transverse field Ising model (TFIM). Various orders of derivative of the ground-state energy and the fidelity susceptibility of TFIM are computed by differentiating through the (dominant) exact diagonalization process.
- Gradient-based optimization of the ground-state energy of TFIM using matrix product states (MPS). See symmetric.py for a simple demonstration, in which the transfer matrix, whose largest-amplitude eigenvalue and corresponding eigenvector are desired, is assumed to be symmetric. For a more complete implementation allowing for any (diagonalizable) transfer matrices, see general.py.
The library provides several torch.autograd.Function
primitives for typical use cases. Specifically, the matrix to be diagonalized can be either assumed to be (real) symmetric or not. It can also be represented either as a normal torch.Tensor
or in a sparse form, such as a function or scipy LinearOperator
.
Typically, to make use of a primitive, simply call the apply
method of it. For example, the following code
dominantsymeig = DominantSymeig.apply
will create a function named dominantsymeig
corresponding to the primitive DominantSymeig
, which can then be directly invoked in a computation process.
from DominantSparseEigenAD.symeig import DominantSymeig
dominantsymeig = DominantSymeig.apply
# Usage
dominantsymeig(A, k)
This primitive is used in the case where the matrix is assumed to be real symmetric and represented in "normal" form as a torch.Tensor
. In current version, it will use the Lanczos algorithm to return a tuple (eigval, eigvector)
of the smallest eigenvalue and corresponding eigenvector, both represented as torch.Tensor
s.
Parameters:
-
A:
torch.Tensor
- the matrix to be diagonalized. -
k:
int
- the number of Lanczos vectors requested. In typical applications,k
is far less than the dimension$N$ of the matrixA
. The choice of several hundreds fork
may be satisfactory for$N$ up to 100000. Note thatk
should never exceeds$N$ in any cases.
Only the gradient of the matrix A
will be computed when performing backward AD. The argument k
does not require computing gradients.
import DominantSparseEigenAD.symeig as symeig
symeig.setDominantSparseSymeig(A, Aadjoint_to_padjoint)
dominantsparsesymeig = symeig.DominantSparseSymeig.apply
# Usage
dominantsparsesymeig(p, k, dim)
This primitive is used in the case where the matrix is assumed to be real symmetric and represented in "sparse" form as a normal function. In current version, it will use the Lanczos algorithm to return a tuple (eigval, eigvector)
of the smallest eigenvalue and corresponding eigenvector, both represented as torch.Tensor
s. The matrix should be seen as depending on some parameters of interest.
Parameters:
-
p:
torch.Tensor
- the parameter tensor that the matrix to be diagonalized depends on. -
k:
int
- the number of Lanczos vectors requested. In typical applications,k
is far less than the dimension$N$ of the matrix. The choice of several hundreds fork
may be satisfactory for$N$ up to 100000. Note thatk
should never exceeds$N$ in any cases. -
dim:
int
- the dimension of the matrix.
Only the gradient of the parameter p
will be computed when performing backward AD. The other two arguments k
and dim
do not require computing gradients.
Important Note: To make this primitive work properly, two additional quantities, A
and Aadjoint_to_padjoint
, should be provided by users before the primitive is actually available in the running session: (See the second line of the code above)
-
A
is the matrix to be diagonalized. As noted above,A
should be represented in "sparse" form as a function, which is mathematically known as a linear map that receives a vector as input and returns another vector as output. Both the input and output vectors should betorch.Tensor
s. For example, a diagonal matrix whose diagonal elements are stored in atorch.Tensor
a
can be represented as a function below:def diagonal(v): return a * v
-
Aadjoint_to_padjoint
is another function that receives the adjoint$\overline{A}$ of the matrixA
(i.e., gradient with respect toA
of some scalar loss) as input and returns the adjoint$\overline{p}$ of the parameter(s)p
as output. This function receives two vectors of equal size represented astorch.Tensor
s,v1
andv2
, and computes the adjoint ofp
assuming that the adjoint ofA
can be written as $$ \overline{A} = v_1 v_2^T. $$ For clarity, consider two simple examples:-
A
can be written as the perturbation form: $$ A = A_0 + p A_1. $$ where the parameter$p$ in current case is a scalar. Then we have $$ \begin{align} \overline{p} &= \mathrm{Tr}\left(\overline{A}^T \frac{\partial A}{\partial p}\right) \ &= \mathrm{Tr}\left(v_2 v_1^T A_1\right) \ &= v_1^T A_1 v_2. \end{align} $$ Hence, suppose the matrix$A_1$ is coded as a functionA1
, then the functionAadjoint_to_padjoint
can be implemented as follows:def Aadjoint_to_padjoint(v1, v2): return A1(v2).matmul(v1)
-
A
can be written as $$ A = A_0 + D(\mathbf{p}). $$ where$D$ is a diagonal matrix whose diagonal elements correspond to the parameters$\mathbf{p}$ (which is a vector of size equal to the dimension of the matrixA
). SinceA
depends on$\mathbf{p}$ only through the diagonal matrix$D$ , one can similarly follow the derivation above and obtains $$ \overline{\mathbf{p}} = v_1 \circ v_2. $$ where "$\circ$" denotes the Hadamard pairwise product. The code thus reads:def Aadjoint_to_padjoint(v1, v2): return v1 * v2
-
from DominantSparseEigenAD.eig import DominantEig
dominanteig = DominantEig.apply
# Usage
dominanteig(A, k)
This primitive is used in the case where the matrix is generally non-symmetric and represented in "normal" form as a torch.Tensor
. In current version, it will use the Lanczos algorithm to return a tuple (eigval, leigvector, reigvector)
of the largest-amplitude eigenvalue and corresponding left and right eigenvector, all represented as torch.Tensor
s.
Note: There exist some gauge freedom regarding the normalization of the eigenvectors. For convenience, the conditions
Note: Since PyTorch does not have a native support of complex numbers, users of this primitive have to ensure that the desired largest-amplitude eigenvalue (hence also the corresponding eigenvectors) is real. If this is not the case, an error will be raised.
Parameters:
-
A:
torch.Tensor
- the matrix to be diagonalized. -
k:
int
- the number of Lanczos vectors requested. In typical applications,k
is far less than the dimension$N$ of the matrixA
. The choice of several hundreds fork
may be satisfactory for$N$ up to 100000. Note thatk
should never exceeds$N$ in any cases.
Only the gradient of the matrix A
will be computed when performing backward AD. The argument k
does not require computing gradients.
import DominantSparseEigenAD.eig as eig
eig.setDominantSparseEig(A, AT, Aadjoint_to_padjoint)
dominantsparseeig = eig.DominantSparseEig.apply
# Usage
dominantsparseeig(p, k)
This primitive is used in the case where the matrix is generally non-symmetric and represented in "sparse" form as a scipy LinearOperator. In current version, it will use the Lanczos algorithm to return a tuple (eigval, leigvector, reigvector)
of the largest-amplitude eigenvalue and corresponding left and right eigenvector, all represented as torch.Tensor
s. The matrix should be seen as depending on some parameters of interest.
Note: There exist some gauge freedom regarding the normalization of the eigenvectors. For convenience, the conditions
Note: Since PyTorch does not have a native support of complex numbers, users of this primitive have to ensure that the desired largest-amplitude eigenvalue (hence also the corresponding eigenvectors) is real. If this is not the case, an error will be raised.
Parameters:
-
p:
torch.Tensor
- the parameter tensor that the matrix to be diagonalized depends on. -
k:
int
- the number of Lanczos vectors requested. In typical applications,k
is far less than the dimension$N$ of the matrix. The choice of several hundreds fork
may be satisfactory for$N$ up to 100000. Note thatk
should never exceeds$N$ in any cases.
Only the gradient of the parameter p
will be computed when performing backward AD. The argument k
does not require computing gradients.
Important Note: To make this primitive work properly, three additional quantities, A
, AT
and Aadjoint_to_padjoint
, should be provided by users before the primitive is actually available in the running session: (See the second line of the code above)
-
A
,AT
are the matrix to be diagonalized and its transpose, respectively. As noted above,A
andAT
should be represented in "sparse" form as scipy LinearOperators, which receive a vector as input and returns another vector as output. -
Aadjoint_to_padjoint
is another function that receives the adjoint$\overline{A}$ of the matrixA
(i.e., gradient with respect toA
of some scalar loss) as input, and returns the adjoint$\overline{p}$ of the parameter(s)p
as output. The input should be of the form((u1, v1), (u2, v2), (u3, v3))
, i.e., a tuple of three pairs. The us and vs are all vectors represented asnp.ndarrays
and have size(N,)
, whereN
is the dimension of the matrixA
. This function computes the adjoint ofp
assuming that the adjoint ofA
can be written as $$ \overline{A} = u_1 v_1^T + u_2 v_2^T + u_3 v_3^T. $$ The final result of the adjoint$\overline{p}$ should be returned as atorch.Tensor
.See also the primitive DominantSparseSymeig for some simple examples. For a more complete application, see the VUMPS example.
The present interfaces of the dominant eigensolver primitives are unlikely to cover all needs in real applications. Some useful improvements and further extensions may be made in the following aspects:
-
Generalization of relevant results to the case of complex numbers. (possibly with some AD tools other than PyTorch)
-
Generalization to the case of multiple eigenvalues and corresponding eigenvectors.
-
Implementing reverse-mode AD of truncated SVD by following the similar spirit.
@article{PhysRevB.101.245139,
title = {Automatic differentiation of dominant eigensolver and its applications in quantum physics},
author = {Xie, Hao and Liu, Jin-Guo and Wang, Lei},
journal = {Phys. Rev. B},
volume = {101},
issue = {24},
pages = {245139},
numpages = {14},
year = {2020},
month = {Jun},
publisher = {American Physical Society},
doi = {10.1103/PhysRevB.101.245139},
url = {https://link.aps.org/doi/10.1103/PhysRevB.101.245139}
}