Benchmark codes used in MAESTRO paper

We provided the codes used in evaluating scATAC-seq clustering, automatic cell-type annotation, integration between scRNA-seq and scATAC-seq using different methods, and integration evaluation using different peak-RP model in this repository. While the dataset for evaluation is a big large, users can download the data at here, and put the data in the same directory with the benchmark codes.

MAESRO paper can be found at citation

Clustering benchmark

The data and R codes can be found at clustering_benchmark directory. Including three types of comparisons.

clustering_benchmark_methods_simulated_data ScATAC-seq clustering using scABC, cisTopic, snapATAC, LSI-based(MAESTRO) on simulated dataset from 10 different bulk ATAC-seq dataset. We sampled the reads at 1000, 2500, 5000, and 10000. The clustering performance was evaluated using Nomalized Mututal Information between cluster labels and original sample type
clustering_benchmark_methods_public_data ScATAC-seq clustering using scABC, cisTopic, snapATAC, LSI-based(MAESTRO) on published microfluidic cellline, HSC and 10X Genomics PBMC dataset. For cellline and HSC dataset, clustering accuracy was evaluated using NMI between cluster labels and original cell labels. For PBMC, clustering accuracy was evaluated using RAGI.
clustering_benchmark_peaks ScATAC-seq clustering using different peak sets, single-cell peak only, ENCODE CCRE only or single-cell peak combined with ENCODE CCRE.For cellline and HSC dataset, clustering accuracy was evaluated using NMI between cluster labels and original cell labels.

Cell-type annotation benchmark

The data can R codes be found at annotation_benchmark directory. We benchmarked the performance of automatic celltype annotation using SCINA, Garnett and MAESTRO on sorted PBMC dataset from Zheng et,al. The cell-types were annotated using LM22 or a simple gene signature from Garnett. The performance was evaluated using median F1-score between true labels and annotated cell-type labels. Codes for benchmarking were adopted from scRNAseq_Benchmark.

Integration between scRNA-seq and scATAC-seq using different methods

The data and R codes can be found at integration_benchmark directory. We benchmarked the integration performance using MAESTRO peak-RP model, Seurat gene activity score, snapATAC-seq genebody score, and cicero score. The performance was evaluated by the distribution of cell-type label prediction score, and spearman correlation between scRNA-seq and scATAC-seq gene activity score on both top 2000 highly variable genes and all genes.

Integration between scRNA-seq and scATAC-seq using different peak-RP models

The data and R codes can be found at integration_RPmodel_benchmark directory. We benchmarked the integration performance of different MAESTRO peak-RP models. The performance was evaluated by the distribution of cell-type label prediction score, and spearman correlation between scRNA-seq and scATAC-seq gene activity score on top 2000 highly variable genes.

chenfeiwang/MAESTRO_benchmark

Benchmark codes used in MAESTRO paper

Clustering benchmark

Cell-type annotation benchmark

Integration between scRNA-seq and scATAC-seq using different methods

Integration between scRNA-seq and scATAC-seq using different peak-RP models