PyTorch implementation of CANAL, a universal cell-type annotation tool that continuously fine-tunes a pretrained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges.
Detailed usage of CANAL can be seen at readthedocs page
- Prepare the preprocessed scRNA-seq data:
gene_align
andnormalize
in thepreprocess
module are required to obtain AnnData objects for network inputs.- Run CANAL at the initial stage: use
CANAL_model.train
in themodel
module by settingcurrent_stage=1
. The model is initialized by the pre-trained model checkpoint on the Panglao dataset- Run CANAL at the incremental stage: use
CANAL_model.train
in themodel
module by settingcurrent_stage
≥1. The model is initialized by the model trained at previous stage- Predict cell types of the test data: use
CANAL_model.predict
in themodel
module to obtain the predicted cell types of the test data- Evaluate model performance: If true cell types of the test data is available, use
CANAL_model.evaluation
in themodel
module to evaluate the performance of current fine-tuned model
There are four examples in the Tutorial
to run CANAL:
- Tutorial 1: Preprocess the raw scRNA-seq datasets
- Tutorial 2: Run CANAL with data stream from various batches
- Tutorial 3: Run CANAL with data stream from different tissues
- Tutorial 4: Apply CANAL on test data with novel cells
lambda
: default 0.1, the strength of representation distillation loss
L
: default 1000, the size of example bank
Link | Description |
---|---|
https://drive.google.com/drive/folders/1BMf-N-k-3aCEY7CJvUcK9nZZ2UD7p3C0?usp=sharing | Datasets of the pancreas experiemnts |
https://drive.google.com/drive/folders/1CaBySV_EFAPPrlpSevEewFds5cjJxC_T?usp=sharing | Datasets of the cross-tissue experiemnts |
https://drive.google.com/drive/folders/1OGMWxR7qTWd_p21d57EyNWv5X48BNN0M?usp=sharing | Datasets of the human immune experiemnts |
The detailed information and download URL of pre-trained model checkpoint, gene2vec embedding and the Panglao dataset used for pre-training can be seen at: https://github.com/TencentAILabHealthcare/scBERT
If you have any questions, please contact: wanhui1997@pku.edu.cn