Generate Data for Publication Figures

Question

Generate Data for Publication Figures

Closed this issue 3 years ago · 0 comments

We have several figures that we want to generate for the v1 maxATAC publication. We are going to use our pool of ENCODE data to show results for models that can be benchmarked.

Figures that need to be finalized:

Figure: Training Data Overview

~~Cumulative training data figure for all available experiments that pass QC from ENCODE + GEO~~
Heatmap of samples that are derived from ENCODE for benchmarking
Schematic overview of maxATAC

Figure: Model Performance

AUPR by # of training cell types for best models
Performance compared to MOODS (motif scanning)
Performance compared to the average ChIP-seq signal

Figure: Approach

Normalization results
Test different random regions ratios: 0,.25, .5, .75, 1
Test shuffle cell type KO with best random regions ratio
Test reverse complement KO with best random regions ratio
Test double KO with best random regions ratio

Figure: maxATAC application to scATAC

Application to scATAC-seq from ArchR

HighLoading
LowLoading

Correlation of number of fragments to prediction AUPR
Correlation of number of cells to prediction AUPR
Correlation of number of cells to delta prediction AUPR
Correlation of number of fragments to delta prediction AUPR
Correlation of number of cells to log2 prediction AUPR
Correlation of number of fragments to log2 prediction AUPR
Correlation of median number of fragments per cell to delta prediction AUPR
Performance in scATAC-seq data
- Use multiple cell types in addition to GM12878
Performance in scATAC-seq data compared to motifs scanning

Figure: Comparison to ChromVar

Performance compared to ChromVar in HBTE cells
Umap of data
Schematic overview of experimental design

Figure: maxATAC in-situ mutagenesis

Schematic overview
Example of altered TF binding prediction based on donor sample

Figure: maxATAC model Selection

Epoch selection method
Model validation on chr2 vs chr1
Thresholding for peaks