SeATAC: A tool for exploring the chromatin landscape and the role of pioneer factors
Wuming Gong, Nikita Dsouza and Daniel J. Garry
Abstract
Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) reveals chromatin accessibility across the genome. Currently no method specifically detects differential chromatin accessibility. Here, SeATAC uses a conditional variational autoencoder model to learn the latent representation of ATAC-seq V-plots and outperforms MACS2 and NucleoATAC on six separate tasks. Applying SeATAC to several pioneer factor induced differentiation or reprogramming ATAC-seq datasets suggests that induction of these factors not only relaxes the closed chromatin but also decreases chromatin accessibility of 20% to 30% of their target sites. SeATAC is a novel tool to accurately reveal genomic regions with differential chromatin accessibility from ATAC-seq data. SeATAC is available at https://github.com/gongx030/seatac as an R package. The preprint can be found at bioRxiv. Additionally, SeATAC has been used to investigate how Etv2 shape the chromatin landscape in MEF reprogramming and limb development.
Main Figures
Figure 1
Figures
Link
A full V-plot has a width of 640 bp genomic region and a height of 640 bp of fragment sizes. An array of 5 x 10 pixels are aggregated together and become a single larger pixel, resulting in a 128 x 64 pixels image.
The violin plot shows the AUC (area under ROC) of SeATAC, NucleoATAC and MACS2 on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001.
The ROC curve for recovering nucleosome positions from ATAC-seq with 0.1%, 1% and 10% of the sequencing reads randomly sampled from the full dataset (GM12878).
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001.
The heatmaps show the nucleosome density of ~5,000 sampled NOR and NFR regions estimated by SeATAC & NucleoATAC on a 10% down-sampled dataset and NucleoATAC signal on the full dataset (black) & a MNase-seq dataset on GM12878.
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001.
The Venn diagrams show the number of Etv2 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC Etv2 induced EB differentiation
The aggregated V-plot includes 1,626, 222 and 2,305 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced EB differentiation
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Etv2 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC.
Dot plots comparing the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC.
The barplots show the genomic distribution of Etv2 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in EB differentiation or MEF reprogramming.
The aggregated V-plot include 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming.
The heatmaps showing Etv2, Brg1, H3K27ac ChIP-seq of 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at day 2.5 EB (Brg1 and H3K27ac), 3 hours post Etv2 induction (Etv2), and 12 hours post Etv2 induction (Etv2, Brg1 and H3K27ac).
Brachyury (T) and Mycn (f) have Etv2 motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS)
The density plots show the observed (red) and corrected (green) fragment size distribution of 13 samples from a human hematopoietic differentiation ATAC-seq data (GSE96771).
The plot shows the AUC of SeATAC, NucleoATAC and MACS2 at different shift sizes (from 10 to 100) used to generate the synthetic data for evaluating task #1.
The plots show the AUC (area under ROC) of SeATAC on 523 ATAC-seq samples from 20 studies at (a) total read counts (Total QNAMEs), (b) mitochondria rate, (c) proper pair rate, (d) unmapped rate, (e) has unmapped mate rate, (f) non-redundant fraction, (g) PCR bottleneck coefficient 1, and (h) PCR bottleneck coefficient 2.
The area under ROC (AUC) of three tools, SeATAC, NucleoATAC and MACS2 on the regions over promoter region (column wise) and latent dimensions (row wise).
The aggregated V-plot includes: 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions, respectively. The heatmap color indicates the estimated read density.
The line plots include: mean signal of H3K27ac, h3K4me1, H3K4me3 signals of 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions.
The aggregated V-plot includes 2,776, 116 and 1,449 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced MEF reprogramming
The heatmaps show the Etv2, Brg1, H3K27ac ChIP-seq of 3,996 and 1,307 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at undifferentiated MEFs (Brg1 and H3K27ac), 1 day post-Etv2 induction (Etv2), and 7 days post-Etv2 induction (Etv2, Brg1 and H3K27ac).
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Ascl1 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC.
The aggregated V-plot includes 8,658, 7,687 and 7,708 Ascl1 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Ascl1 induced MEF reprogramming (undifferentiated MEFs vs. 22 days post Ascl1 induction).
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have OSK motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC.
The aggregated V-plot includes 5,826, 1,355 and 6,371 OSK motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of OSK induced MEF reprogramming
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC
The barplots show the genomic distribution of Ascl1 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming.
The aggregated V-plot include 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming.
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming.
The V-plot show Ascl1 motifs with decreased chromatin accessibility at the promoters (-5,000 - +1,000bp region flanking the TSS) of four genes (Hmga2, Elf4, Egfr and Hes1) that are down-regulated during the Ascl1 induced MEF reprogramming.
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC
The barplots show the genomic distribution of OSK binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming.
The aggregated V-plot include 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming.
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming.
The barplots show the percent of genes that were down-regulated, up-regulated or not changed between undifferentiated MEFs and 7 hours post OSK induction.
Maf and Smad3 have OSK motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS).
Example regions with significantly increased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples. (f-j) Example regions with significantly decreased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples.