/cuPolar

Newton's and Halley's Method for the Matrix Polar Decomposition using CUDA

Primary LanguageCudaMIT LicenseMIT

cuPolar - Matrix Polar Decomposition using CUDA

License: MIT GitHub Release DOI

Copyright: Maximilian Behr

License: The software is licensed under under MIT. See LICENSE for details.

cuPolar is a CUDA library implementing Newton and Halley's method for the Polar Decomposition of a nonsingular square matrix $A=QH$, where $Q$ is unitary and $H$ is hermitian positive semidefinite.

cuPolar supports real and complex, single and double precision matrices.

Available Functions

Single Precision Functions

int cupolar_sNewtonBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_sNewton(const int n, const float *A, void *d_buffer, void *h_buffer, float *Q, float *H);

int cupolar_sHalleyBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_sHalley(const int n, const float *A, void *d_buffer, void *h_buffer, float *Q, float *H);

Double Precision Functions

int cupolar_dNewtonBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_dNewton(const int n, const double *A, void *d_buffer, void *h_buffer, double *Q, double *H);

int cupolar_dHalleyBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_dHalley(const int n, const double *A, void *d_buffer, void *h_buffer, double *Q, double *H);

Complex Single Precision Functions

int cupolar_cNewtonBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_cNewton(const int n, const cuComplex *A, void *d_buffer, void *h_buffer, cuComplex *Q, cuComplex *H);

int cupolar_cHalleyBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_cHalley(const int n, const cuComplex *A, void *d_buffer, void *h_buffer, cuComplex *Q, cuComplex *H);

Complex Double Precision Functions

int cupolar_zNewtonBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_zNewton(const int n, const cuDoubleComplex *A, void *d_buffer, void *h_buffer, cuDoubleComplex *Q, cuDoubleComplex *H);

int cupolar_zHalleyBufferSize(const int n, size_t *d_bufferSize, size_t *h_bufferSize);
int cupolar_zHalley(const int n, const cuDoubleComplex *A, void *d_buffer, void *h_buffer, cuDoubleComplex *Q, cuDoubleComplex *H);

Algorithm

cuPolar implements the Newton's method with scaling as well as Halley's method with scaling for the approximation of the polar decomposition. The matrix $A$ must be square and nonsingular.

See also Algorithm 3.1

Byers, R., & Xu, H. (2008). A new scaling for Newton's iteration for the polar decomposition and its backward stability. SIAM Journal on Matrix Analysis and Applications, 30(2), 822-843.

and Equations 3.14-3.16

Nakatsukasa, Y., Bai, Z., & Gygi, F. (2010). Optimizing Halley's iteration for computing the matrix polar decomposition. SIAM Journal on Matrix Analysis and Applications, 31(5), 2700-2720.

Installation

Prerequisites:

  • CMake >= 3.23
  • CUDA >= 11.4.2
  mkdir build && cd build
  cmake ..
  make
  make install

Usage and Examples

We provide examples for all supported matrix formats:

File Data
example_cupolar_sNewton.cu real, single precision matrix
example_cupolar_dNewton.cu real, double precision matrix
example_cupolar_cNewton.cu complex, single precision matrix
example_cupolar_zNewton.cu complex, double precision matrix
example_cupolar_sHalley.cu real, single precision matrix
example_cupolar_dHalley.cu real, double precision matrix
example_cupolar_cHalley.cu complex, single precision matrix
example_cupolar_zHalley.cu complex, double precision matrix