/sparseconverter

Converter matrix for a range of array formats in Python, focusing on sparse arrays.

MIT LicenseMIT

sparseconverter

Converter matrix for a range of array formats (backends) in Python, focusing on sparse arrays.

This library is targeted at projects that want to support a wide range of array formats as input, output or for calculations. All array libraries already do support format detection, creation and export from and to various formats, but with different APIs, different sets of formats and different sets of supported features -- dtypes, shapes, device classes etc.

As an example, efficient conversion from sparse.COO to cupyx.scipy.sparse.coo_matrix can be done via cupyx.scipy.sparse.coo_matrix(sparse.COO.to_scipy_sparse()). However, both scipy.sparse.coo_matrix and cupyx.scipy.sparse.coo_matrix only support 2D arrays. On top of that, cupyx.scipy.sparse.coo_matrix only supports floating point dtypes and bool.

This project creates an unified API for all conversions between the supported formats and takes care of details such as using an efficient intermediate format, reshaping and dtype conversion.

Features

  • Supports Python 3.6 - 3.10
  • Defines constants for format identifiers
  • Various sets to group formats into categories:
    • Dense vs sparse
    • CPU vs CuPy-based
    • nD vs 2D backends
  • Efficiently detect format of arrays, including support for subclasses
  • Get converter function for a pair of formats
  • Convert to a target format
  • Find most efficient conversion pair for a range of possible inputs and/or outputs

Supported array formats

Still TODO

  • cupyx.sparse formats with dtype bool
  • PyTorch arrays
  • SciPy sparse arrays as opposed to SciPy sparse matrices.

Notes

This project is developed primarily for sparse data support in LiberTEM. For that reason it includes the backend CUDA, which indicates a NumPy array, but targeting execution on a CUDA device.