/DominantSparseEigenAD

Reverse-mode AD of dominant sparse eigensolver using Pytorch.

Primary LanguagePythonApache License 2.0Apache-2.0

DominantSparseEigenAD

Build Status

DominantSparseEigenAD is an extension of PyTorch that handles reverse-mode automatic differentiation of dominant eigen-decomposition process.

In many researches and applications involving matrix diagonalization, typically in the context of eigenvalue problem in quantum mechanics, only a small proportion of eigenvalues (e.g., the smallest ones) and corresponding eigenvectors are of practical interest. This library provides corresponding primitives in the framework of PyTorch to automatically differentiate this process, without any direct access to the full spectrum of the matrix being diagonalized.

Installation

$ pip install DominantSparseEigenAD

Examples

Please check out the examples directory.

Usage

The library provides several torch.autograd.Function primitives for typical use cases. Specifically, the matrix to be diagonalized can be either assumed to be (real) symmetric or not. It can also be represented either as a normal torch.Tensor or in a sparse form, such as a function or scipy LinearOperator.

Typically, to make use of a primitive, simply call the apply method of it. For example, the following code

dominantsymeig = DominantSymeig.apply

will create a function named dominantsymeig corresponding to the primitive DominantSymeig, which can then be directly invoked in a computation process.

DominantSymeig

from DominantSparseEigenAD.symeig import DominantSymeig

dominantsymeig = DominantSymeig.apply

# Usage
dominantsymeig(A, k)

This primitive is used in the case where the matrix is assumed to be real symmetric and represented in "normal" form as a torch.Tensor. In current version, it will use the Lanczos algorithm to return a tuple (eigval, eigvector) of the smallest eigenvalue and corresponding eigenvector, both represented as torch.Tensors.

Parameters:

  • A: torch.Tensor - the matrix to be diagonalized.
  • k: int - the number of Lanczos vectors requested. In typical applications, k is far less than the dimension $N$ of the matrix A. The choice of several hundreds for k may be satisfactory for $N$ up to 100000. Note that k should never exceeds $N$ in any cases.

Only the gradient of the matrix A will be computed when performing backward AD. The argument k does not require computing gradients.

DominantSparseSymeig

import DominantSparseEigenAD.symeig as symeig

symeig.setDominantSparseSymeig(A, Aadjoint_to_padjoint)
dominantsparsesymeig = symeig.DominantSparseSymeig.apply

# Usage
dominantsparsesymeig(p, k, dim)

This primitive is used in the case where the matrix is assumed to be real symmetric and represented in "sparse" form as a normal function. In current version, it will use the Lanczos algorithm to return a tuple (eigval, eigvector) of the smallest eigenvalue and corresponding eigenvector, both represented as torch.Tensors. The matrix should be seen as depending on some parameters of interest.

Parameters:

  • p: torch.Tensor - the parameter tensor that the matrix to be diagonalized depends on.
  • k: int - the number of Lanczos vectors requested. In typical applications, k is far less than the dimension $N$ of the matrix. The choice of several hundreds for k may be satisfactory for $N$ up to 100000. Note that k should never exceeds $N$ in any cases.
  • dim: int - the dimension of the matrix.

Only the gradient of the parameter p will be computed when performing backward AD. The other two arguments k and dim do not require computing gradients.

Important Note: To make this primitive work properly, two additional quantities, A and Aadjoint_to_padjoint, should be provided by users before the primitive is actually available in the running session: (See the second line of the code above)

  • A is the matrix to be diagonalized. As noted above, A should be represented in "sparse" form as a function, which is mathematically known as a linear map that receives a vector as input and returns another vector as output. Both the input and output vectors should be torch.Tensors. For example, a diagonal matrix whose diagonal elements are stored in a torch.Tensor a can be represented as a function below:

    def diagonal(v):
        return a * v
  • Aadjoint_to_padjoint is another function that receives the adjoint $\overline{A}$ of the matrix A (i.e., gradient with respect to A of some scalar loss) as input and returns the adjoint $\overline{p}$ of the parameter(s) p as output. This function receives two vectors of equal size represented as torch.Tensors, v1 and v2, and computes the adjoint of p assuming that the adjoint of A can be written as $$ \overline{A} = v_1 v_2^T. $$ For clarity, consider two simple examples:

    1. A can be written as the perturbation form: $$ A = A_0 + p A_1. $$ where the parameter $p$ in current case is a scalar. Then we have $$ \begin{align} \overline{p} &= \mathrm{Tr}\left(\overline{A}^T \frac{\partial A}{\partial p}\right) \ &= \mathrm{Tr}\left(v_2 v_1^T A_1\right) \ &= v_1^T A_1 v_2. \end{align} $$ Hence, suppose the matrix $A_1$ is coded as a function A1, then the function Aadjoint_to_padjoint can be implemented as follows:

      def Aadjoint_to_padjoint(v1, v2):
          return A1(v2).matmul(v1)
    2. A can be written as $$ A = A_0 + D(\mathbf{p}). $$ where $D$ is a diagonal matrix whose diagonal elements correspond to the parameters $\mathbf{p}$ (which is a vector of size equal to the dimension of the matrix A). Since A depends on $\mathbf{p}$ only through the diagonal matrix $D$, one can similarly follow the derivation above and obtains $$ \overline{\mathbf{p}} = v_1 \circ v_2. $$ where "$\circ$" denotes the Hadamard pairwise product. The code thus reads:

      def Aadjoint_to_padjoint(v1, v2):
          return v1 * v2

DominantEig

from DominantSparseEigenAD.eig import DominantEig

dominanteig = DominantEig.apply

# Usage
dominanteig(A, k)

This primitive is used in the case where the matrix is generally non-symmetric and represented in "normal" form as a torch.Tensor. In current version, it will use the Lanczos algorithm to return a tuple (eigval, leigvector, reigvector) of the largest-amplitude eigenvalue and corresponding left and right eigenvector, all represented as torch.Tensors.

Note: There exist some gauge freedom regarding the normalization of the eigenvectors. For convenience, the conditions $l^T r = 1$ and $r^T r = 1$ are imposed, where $l, r$ are the left and right eigenvector, respectively.

Note: Since PyTorch does not have a native support of complex numbers, users of this primitive have to ensure that the desired largest-amplitude eigenvalue (hence also the corresponding eigenvectors) is real. If this is not the case, an error will be raised.

Parameters:

  • A: torch.Tensor - the matrix to be diagonalized.
  • k: int - the number of Lanczos vectors requested. In typical applications, k is far less than the dimension $N$ of the matrix A. The choice of several hundreds for k may be satisfactory for $N$ up to 100000. Note that k should never exceeds $N$ in any cases.

Only the gradient of the matrix A will be computed when performing backward AD. The argument k does not require computing gradients.

DominantSparseEig

import DominantSparseEigenAD.eig as eig

eig.setDominantSparseEig(A, AT, Aadjoint_to_padjoint)
dominantsparseeig = eig.DominantSparseEig.apply

# Usage
dominantsparseeig(p, k)

This primitive is used in the case where the matrix is generally non-symmetric and represented in "sparse" form as a scipy LinearOperator. In current version, it will use the Lanczos algorithm to return a tuple (eigval, leigvector, reigvector) of the largest-amplitude eigenvalue and corresponding left and right eigenvector, all represented as torch.Tensors. The matrix should be seen as depending on some parameters of interest.

Note: There exist some gauge freedom regarding the normalization of the eigenvectors. For convenience, the conditions $l^T r = 1$ and $r^T r = 1$ are imposed, where $l, r$ are the left and right eigenvector, respectively.

Note: Since PyTorch does not have a native support of complex numbers, users of this primitive have to ensure that the desired largest-amplitude eigenvalue (hence also the corresponding eigenvectors) is real. If this is not the case, an error will be raised.

Parameters:

  • p: torch.Tensor - the parameter tensor that the matrix to be diagonalized depends on.
  • k: int - the number of Lanczos vectors requested. In typical applications, k is far less than the dimension $N$ of the matrix. The choice of several hundreds for k may be satisfactory for $N$ up to 100000. Note that k should never exceeds $N$ in any cases.

Only the gradient of the parameter p will be computed when performing backward AD. The argument k does not require computing gradients.

Important Note: To make this primitive work properly, three additional quantities, A, AT and Aadjoint_to_padjoint, should be provided by users before the primitive is actually available in the running session: (See the second line of the code above)

  • A, AT are the matrix to be diagonalized and its transpose, respectively. As noted above, A and AT should be represented in "sparse" form as scipy LinearOperators, which receive a vector as input and returns another vector as output.

  • Aadjoint_to_padjoint is another function that receives the adjoint $\overline{A}$ of the matrix A (i.e., gradient with respect to A of some scalar loss) as input, and returns the adjoint $\overline{p}$ of the parameter(s) p as output. The input should be of the form ((u1, v1), (u2, v2), (u3, v3)), i.e., a tuple of three pairs. The us and vs are all vectors represented as np.ndarrays and have size (N,), where N is the dimension of the matrix A. This function computes the adjoint of p assuming that the adjoint of A can be written as $$ \overline{A} = u_1 v_1^T + u_2 v_2^T + u_3 v_3^T. $$ The final result of the adjoint $\overline{p}$ should be returned as a torch.Tensor.

    See also the primitive DominantSparseSymeig for some simple examples. For a more complete application, see the VUMPS example.

Outlook

The present interfaces of the dominant eigensolver primitives are unlikely to cover all needs in real applications. Some useful improvements and further extensions may be made in the following aspects:

  • Generalization of relevant results to the case of complex numbers. (possibly with some AD tools other than PyTorch)

  • Generalization to the case of multiple eigenvalues and corresponding eigenvectors.

  • Implementing reverse-mode AD of truncated SVD by following the similar spirit.

To cite

@article{PhysRevB.101.245139,
  title = {Automatic differentiation of dominant eigensolver and its applications in quantum physics},
  author = {Xie, Hao and Liu, Jin-Guo and Wang, Lei},
  journal = {Phys. Rev. B},
  volume = {101},
  issue = {24},
  pages = {245139},
  numpages = {14},
  year = {2020},
  month = {Jun},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevB.101.245139},
  url = {https://link.aps.org/doi/10.1103/PhysRevB.101.245139}
}