/DeepSEM

Primary LanguageJupyter NotebookMIT LicenseMIT

DeepSEM

About

This directory contains the code and resources of the following paper:

"Modeling Gene Regulatory Networks Using Neural Network Architectures" publish in Nature Computational Science (doi:10.1038/s43588-021-00099-8)

Overview of the Model

We introduce DeepSEM, a deep-learning-based approach with novel neural network architecture that can infer gene regulatory network, embed scRNA-seq data, and simulate realistic scRNA-seq data by interpreting different modules.

Dependencies

  • python 3.7
  • pytorch==1.2.0
  • scanpy==1.6.0
  • numpy==1.14.5
  • pandas==1.0.0
  • scikit-learn==0.23.2

All dependencies can be installed within a few minutes.

Tutorial

We provide tree tutorial as shown in directory tutorial/{GRN_inference_tutorial.ipynb,Embedding_tutorial.ipynb, Simulation_tutorial.ipynb} for introducing the usage of DeepSEM and reproducing the main result of our paper.

Usage

DeepSEM take data as input file in tsv, csv, 10X format, or h5ad format provided by Scanpy (genes in columns and cells in rows for tsv and csv). The output of DeepSEM is varying for different tasks.

  • GRN Inference task. A tsv file including TF, Target, and predicted GRN edge importance.
  • Embedding. A h5ad file including the embedding genetated by DeepSEM which are shown in "X" of the AnnData and the low dimension representation which are shown in "obsm['X_pca']".
  • Simulation. A h5ad file including the simulation result generated by DeepSEM which are shown in "X" of the AnnData.

We also provide default hyper-parameters in main.py. Using -h option or read Hyperparmeter.MD which introduces the hyper-parameters and provides suggestion for hyper-parameter tuning.

Command to run DeepSEM

  • Gene Regulation Inference (including cell type specific GRN and cell type non-specific GRN). Note that this is the script for non-ensemble version. We recommend to use ensemble streagy by repeating training process for K times (K=10 in our papaer) and use average of the absolute adjacent matrices as final prediction. Details are shown in tutorial/GRN_inference_tutorial.ipynb.
      python main.py --task celltype_GRN --data_file <scGNA-seq path> --save_name <output path> --setting test
      python main.py --task non_celltype_GRN --data_file <scGNA-seq path> --save_name <output path> --setting test
    use --setting test to infer GRN instead of benchmarking.
  • Embedding
      python main.py --task embedding --data_file <scGNA-seq path> --save_name <output path>
  • Simulation
    python main.py --task simulation --data_file <scRNA-seq path> --save_name <output path>

Baseline methods

Some notation are incorrect in published paper. $X = W^TX + Z$ should be $X = XW^T+Z$ (equtation 1), $H_Z = (I-W)^{-1}Z$ should be $H_Z = (I-W^T)^{-1}Z$, and $L = −E_{q(X)} [log p(X|Z)] + \beta KL(q(Z|X)||p(Z)) + \alpha ||W||_1$ should be $L= −E_Z [log p(X|Z)] + \beta KL(q(Z|X)||p(Z)) + \alpha ||W||_1$ (equtation 5).

If you have any question, please feel free to contact to me.
Email: shuht96@gmail.com

License

DeepSEM is licensed under the MIT License.