A pretrained single cell gene expression language model.
- clone this repository:
git clone git@github.com:keiserlab/exceiver.git
- install lightweight packaging tool:
conda install flit
- install this repo in a new environment:
flit install -s
see notebooks/example.ipynb
for loading pretrained models:
from exceiver.models import Exceiver
model = Exceiver.load_from_checkpoint("../pretrained_models/exceiver/pretrained_TS_exceiver.ckpt")
- downlaod the Tabula Sapiens dataset from figshare, specifically
TabulaSapiens.h5ad.zip
(careful this link will likely begin download: https://figshare.com/ndownloader/files/34702114) - run
scripts/preprocess.py
:
python preprocess.py --ts_path /path/to/download/TabulaSapiens.h5ad
--out_path /path/to/prepocessed/TabulaSapiens
pytorch_lightning
makes distributed training easy and CLI access to a host of hyperparameters by running scripts/train.py
:
python train.py --name MODELNAME
--data_path /path/to/prepocessed/TabulaSapiens
--logs path/to/model/logs
--frac 0.15
--num_layers 1
--nhead 4
--query_len 128
--batch_size 64
--min_epochs 5
--max_epochs 10
--strategy ddp
--gpus 0,1,2,3