/scGNN2.0

Primary LanguageJupyter NotebookMIT LicenseMIT

scGNN2.0

About

This repository contains the source code for scGNN2.0

Installation

Installation Tested on Ubuntu 16.04, CentOS 7, MacOS catalina with Python 3.8 on one NVIDIA RTX 2080Ti GPU.

From Source

Start by grabbing this source codes:

git clone https://github.com/OSU-BMBL/scGNN2.0.git
cd scGNN2.0

Use python virutal environment with conda(https://anaconda.org/)

conda create -n scgnnEnv python=3.8 pip
conda activate scgnnEnv
pip install -r requirements.txt

If want to use LTMG (Recommended but Optional, will takes extra time in data preprocessing):

conda install r-devtools
conda install -c cyz931123 r-scgnnltmg

Quick Start

The scGNN2.0 not only accepts normal scRNA-seq data format: CSV and 10X, but also provide an interface for R and Seurat users. The scGNN2.0 also accepts SeuratObject and Rdata as input.

CSV format

We provide a dataset Goolam in folder sampleData. To run scGNN2.0 with Goolam as input (The docs of the arguments can be found in the scGNN_v2.py):

mkdir outputs
python -W ignore scGNN_v2.py \
    --given_cell_type_labels \
    --load_use_benchmark \
    --load_dataset_dir ./sampleData \
    --load_dataset_name Goolam\
    --output_dir ./outputs

10X format

Take an example of liver cellular landscape study from human cell atlas(https://data.humancellatlas.org/). Click the download link of 'homo_sapiens.mtx.zip' in the page, and get 4d6f6c96-2a83-43d8-8fe1-0f53bffd4674.homo_sapiens.mtx.zip. (It looks like they does not provide direct download link anymore)

mkdir liver
cd liver
mv ~/Download/4d6f6c96-2a83-43d8-8fe1-0f53bffd4674.homo_sapiens.mtx.zip .
unzip 4d6f6c96-2a83-43d8-8fe1-0f53bffd4674.homo_sapiens.mtx.zip
cd ..

To run scGNN2.0 with 10X data:

python scGNN_v2.py --load_from_10X 4d6f6c96-2a83-43d8-8fe1-0f53bffd4674.homo_sapiens.mtx \
    --output_dir ./outputs \
    --total_epoch 31 --feature_AE_epoch 500 300 \
    --output_intermediate 

SeuratObject

Seurat is an famous R toolkit for single cell genomics. Our program provides an interface for Seurat users.

If now you have a SeuratObject, then you can export raw counts into .csv file from SeuratObject:

write.table(as.matrix(GetAssayData(object = yourSeuratObject, slot = "counts")), 
        '~/counts.csv', 
        sep = ',', row.names = T, col.names = T, quote = F)

Please note the default input matirx of scGNN2.0 is a cell (rows) by gene (columns) matrix. In Seurat, GetAssayData returns a matrix where rows are features (genes) and columns are cells. If you want to transpose this matrix to get a cell-by-gene matrix, you can use the t() function in R, which transposes matrices. Here's how you can modify your command:

write.table(t(as.matrix(GetAssayData(object = yourSeuratObject, slot = "counts"))),
            '~/counts.csv',
            sep = ',', row.names = TRUE, col.names = TRUE, quote = FALSE)

So before run scGNN2.0, please make sure your input matrix is a cell by gene matrix.

Then run scGNN2.0 project from this csv file:

python scGNN_v2.py --load_seurat_object ~/counts.csv \
    --output_dir your_output_dir \
    --total_epoch 31 --feature_AE_epoch 500 300 \
    --output_intermediate 

Rdata format

To run scGNN2.0 project from a Rdata file:

python scGNN_v2.py --load_rdata ~/data.Rdata \
    --output_dir your_output_dir \
    --total_epoch 31 --feature_AE_epoch 500 300 \
    --output_intermediate 

To generate results in Rdata format, you need to specify output_rdata:

python scGNN_v2.py --load_rdata ~/counts.Rdata \
    --output_dir your_output_dir --output_rdata

Visualization

We provide the visualization code in Viz/plot_test.pynb, including heat map, Graph, Cell-Cell Graph and Sankey Diagram.

Visualization with seurat

You can also load the cell embeddings generated by scGNN2.0 and visualize it in Seurat:

embeddings=read.csv(file =your_embedding_path ,header=TRUE,row.names = 1)
pbmc[["pca"]] <- CreateDimReducObject(embeddings = as.matrix(embeddings), key = "embedding_", assay = DefaultAssay(pbmc))
pbmc <- RunUMAP(pbmc, dims = 1:128)
DimPlot(pbmc, reduction = "umap")