/torch-adata

Create PyTorch Datasets from AnnData

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

torch-adata-logo

PyPI pyversions PyPI version Documentation Status Code style: black

Create PyTorch Datasets from AnnData

Installation

Install from PYPI (current version: 0.0.24):

pip install torch-adata

Install the developer version:

git clone https://github.com/mvinyard/torch-adata.git; cd torch-adata;
pip install -e .

The main API

The primary class is the AnnDataset. This is a subclass of the widely-used torch.utils.data.Dataset. The PyTorch Dataset module enables us to take advantage of built-in multiprocessing and other organizational tricks that ultimately standardize workflows and enable reproducibility.

torch-adata-concept-overview

import anndata as a
import torch_adata

adata = a.read_h5ad("/path/to/data.h5ad")
dataset = torch_adata.AnnDataset(adata, use_key="X_pca", groupby="time", obs_keys=["affinity"])
[ torch-adata ]: AnnDataset object with 7131 samples
----------------------------------------------------
Grouped by: 'time' with attributes:
 - X (use_key = 'X_pca') torch.Size([3, 7131, 50])
 - obs: affinity: torch.Size([3, 7131, 1])

There is an additional approach to this dubbed AnnLoader, highlighted by Sergei Rybakov in Interfacing pytorch models with anndata

For more information, please visit the documentation!

Problem? Open an issue