This repository contains scripts to run and benchmark scRNA-seq cell cluster labeling methods and is a companion to our paper 'Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data' (Diaz-Mejia JJ et al (2019) [Preprint at F1000Research https://doi.org/10.12688/f1000research.18490.1].
Script name | Task(s) |
---|---|
subsamples_gene_classes_and_runs_enrichment_scripts.R |
Main wrapper to run and benchmark cell cluster labeling methods |
Script name | Task(s) |
---|---|
obtains_CIBERSORT_for_MatrixColumns.pl |
Runs CIBERSORT using gene expression signatures and a matrix with average gene expressions per gene, per cell cluster |
obtains_GSEA_for_MatrixColumns.pl |
Runs GSEA using gene expression signatures and a matrix with average gene expressions per cell cluster |
obtains_GSVA_for_MatrixColumns.R |
Runs GSVA using gene expression signatures and a matrix with average gene expressions per cell cluster |
obtains_METANEIGHBOR_for_MatrixColumns.R |
Runs MetaNeighborUS using gene expression signatures, a matrix with average gene expressions per cell cluster, and reference cell types. Note: modifications were made to the R library(metaneighbor) source code. Check bin/r_programs/obtains_METANEIGHBOR_for_MatrixColumns.R for details |
obtains_ORA_for_MatrixColumns.pl |
Runs ORA using gene expression signatures and a matrix with average gene expressions per cell cluster |
Script name | Task(s) |
---|---|
obtains_performance_plots_from_cluster_labelings.pl |
Compiles results from cell type labeling methods and obtains ROC and PR curves plots and AUC's |
obtains_ROC_and_PR_curves_from_matrix_with_gold_standards.R |
Obtains ROC and PR curve plots, ROC AUC and PR AUC values from a matrix of reference labels in column 2 and predictions in columns 3 to N |
Script name | Task(s) |
---|---|
obtains_permuted_samples_from_gmt.R |
Subsamples genes from gene expression signatures in the form of gene sets |
propagates_permuted_gmt_files_to_profile.R |
Propagates subsampling from signatures in the form of gene sets to those in the form of gene expression profiles |
Script name | Task(s) |
---|---|
obtains_average_gene_expression_per_cluster.R |
Obtains a matrix with average gene expressions per cell cluster from scRNA-seq data, and cell cluster assignments) |
-
To see the
help
of R scripts run them like:
Rscript ~/path_to_script/script.R -h
-
To see the help of Perl scripts, make the files executable with
chmod +x script.pl
and run them like:
~/path_to_script/script.pl
-
Check each script code for dependencies and further documentation.
-
To install all R packages use:
install.packages(c("optparse", "vioplot", "GSA", "data.table", "precrec", "ROCR", "Seurat", "dplyr", "Rserve", "e1071", "colorRamps", "stats"))
to install packages from CRAN.
And:
install.packages("BiocManager")
BiocManager::install(c("preprocessCore", "GSVA", "qvalue"))
to install packages from Bioconductor
Used R version 3.5.1 -
To install Perl script dependencies download
perl_modules
directory from this repository
and add it to yourPERL5LIB
environment variable.
Other Perl modules required are:Date::Calc
which can be installed from CPAN
Used Perl version 5 -
The following Java scripts are needed:
CIBERSORT.jar
can be obtained from https://cibersort.stanford.edu/download.php
gsea-3.0.jar
can be obtained from http://software.broadinstitute.org/gsea/downloads.jsp
Used Java version 1.8.0_162
- Three scRNA-seq datasets that can be used as inputs for these scrips, from liver cells (MacParland et al, 2018), peripheral blood mononuclear cells (PBMCs) (Zheng et al, 2017) and retinal neurons (Shekhar et al, 2016), were processed, curated and deposited into Zendo: https://doi.org/10.5281/zenodo.2575050
Version 1.0 http://doi.org/10.5281/zenodo.2583161
Version 2.0 http://doi.org/10.5281/zenodo.3350461
Please click here to report an 'New Issue' (i.e. report bugs or request features). https://github.com/jdime/scRNAseq_cell_cluster_labeling/issues
Javier Diaz (https://github.com/jdime)