Generate Data for Publication Figures
Closed this issue · 0 comments
tacazares commented
We have several figures that we want to generate for the v1 maxATAC publication. We are going to use our pool of ENCODE data to show results for models that can be benchmarked.
Figures that need to be finalized:
Figure: Training Data Overview
-
Cumulative training data figure for all available experiments that pass QC from ENCODE + GEO - Heatmap of samples that are derived from ENCODE for benchmarking
- Schematic overview of maxATAC
Figure: Model Performance
- AUPR by # of training cell types for best models
- Performance compared to MOODS (motif scanning)
- Performance compared to the average ChIP-seq signal
Figure: Approach
- Normalization results
- Test different random regions ratios: 0,.25, .5, .75, 1
- Test shuffle cell type KO with best random regions ratio
- Test reverse complement KO with best random regions ratio
- Test double KO with best random regions ratio
Figure: maxATAC application to scATAC
- Application to scATAC-seq from ArchR
- HighLoading
- LowLoading
- Correlation of number of fragments to prediction AUPR
- Correlation of number of cells to prediction AUPR
- Correlation of number of cells to delta prediction AUPR
- Correlation of number of fragments to delta prediction AUPR
- Correlation of number of cells to log2 prediction AUPR
- Correlation of number of fragments to log2 prediction AUPR
- Correlation of median number of fragments per cell to delta prediction AUPR
- Performance in scATAC-seq data
- Use multiple cell types in addition to GM12878
- Performance in scATAC-seq data compared to motifs scanning
Figure: Comparison to ChromVar
- Performance compared to ChromVar in HBTE cells
- Umap of data
- Schematic overview of experimental design
Figure: maxATAC in-situ mutagenesis
- Schematic overview
- Example of altered TF binding prediction based on donor sample
Figure: maxATAC model Selection
- Epoch selection method
- Model validation on chr2 vs chr1
- Thresholding for peaks