scTransSort: Transformers for intelligent annotation of cell types by gene embeddings
The repository contains the source code for the paper scTransSort. Due to github's limit on uploading file size, the trained model is over 500M, so I only uploaded the core code. If you need a trained model for your research, you can contact me by email s20070042@s.upc.edu.cn.
The neural network model is implemented using TensorFlow 2.4.0 and the code is written in python 3.6.
scTransSort accepts scRNA-seq data format: CSV
Take an example of mouse datasets (MCA, https://figshare.com/articles/dataset/MCA_DGE_Data/5435866) analyzed in the manuscript.
read mouse_Brain753_data.csv
The preprocessed files generated in this step can be found in the uploaded data/example data set.csv
In preprocessing, parameters are used:
- filetype defines file type (CSV))
- geneSelectnum selects a number of most variant genes. The default gene number is 2000
- code preprocessing.py
We take an example of an analysis in mouse_Brain753. Here we use parameters to demo purposes:
- batch-size defines batch-size of the cells for training.here we set as 64.
- epoch defines epochs in feature autoencoder, here we set as 100.
- initial_lr defines the Initial learning rate, here we set as 0.001.
- weight_decay defines the weight_decay, here we set as 1e-4.
If you want to reproduce results in the manuscript, please use default parameters.
Souce code: Github
Author email: s20070042@s.upc.edu.cn
We are continuing adding new features. Bug reports or feature requests are welcome.