/seatac_manuscript

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

SeATAC: A tool for exploring the chromatin landscape and the role of pioneer factors

Wuming Gong, Nikita Dsouza and Daniel J. Garry

Abstract

Assay for Transposase-Accessible Chromatin with sequencing (ATAC-seq) reveals chromatin accessibility across the genome. Currently no method specifically detects differential chromatin accessibility. Here, SeATAC uses a conditional variational autoencoder model to learn the latent representation of ATAC-seq V-plots and outperforms MACS2 and NucleoATAC on six separate tasks. Applying SeATAC to several pioneer factor induced differentiation or reprogramming ATAC-seq datasets suggests that induction of these factors not only relaxes the closed chromatin but also decreases chromatin accessibility of 20% to 30% of their target sites. SeATAC is a novel tool to accurately reveal genomic regions with differential chromatin accessibility from ATAC-seq data. SeATAC is available at https://github.com/gongx030/seatac as an R package. The preprint can be found at bioRxiv. Additionally, SeATAC has been used to investigate how Etv2 shape the chromatin landscape in MEF reprogramming and limb development.

Main Figures

Figure 1

Figures Link
A full V-plot has a width of 640 bp genomic region and a height of 640 bp of fragment sizes. An array of 5 x 10 pixels are aggregated together and become a single larger pixel, resulting in a 128 x 64 pixels image. Figure 1a R

Figure 2

Figures Link
The ROC curves for SeATAC, NucleoATAC and MACS2 with a shift size of 50 bp. Figure 2c R
The violin plot shows the AUC (area under ROC) of SeATAC, NucleoATAC and MACS2 on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. Figure 2d R
The AUC of SeATAC, NucleoATAC and MACS2 at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). Figure 2e R

Figure 3

Figures Link
The ROC curve for recovering nucleosome positions from ATAC-seq with 0.1%, 1% and 10% of the sequencing reads randomly sampled from the full dataset (GM12878). Figure 3a R
The heatmaps shows the nucleosome density estimated by SeATAC (blue) and NucleoATAC (purple) on a 1% down-sampled dataset. Figure 3b R
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. Figure 3c R
The AUC of SeATAC and NucleoATAC at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). Figure 3d R

Figure 4

Figures Link
The ROC curve for detecting nucleosome changes from ATAC-seq with 10% of the sequencing reads from the full dataset (GM12878). Figure 4a R
The raw and estimated V-plot of a NFR (chr1:113162059-113162698) and a NOR (chr2:226653061-226653700) region are shown Figure 4b R
The heatmaps show the nucleosome density of ~5,000 sampled NOR and NFR regions estimated by SeATAC & NucleoATAC on a 10% down-sampled dataset and NucleoATAC signal on the full dataset (black) & a MNase-seq dataset on GM12878. Figure 4c R
The violin plot shows the AUC (area under ROC) of SeATAC and NucleoATAC on 523 ATAC-seq samples from 20 studies. *** Wilcoxon rank sum test p-value < 0.001. Figure 4d R
The AUC of SeATAC and NucleoATAC at different read counts cutoff from 1 to 20 (the minimum reads in a V-plot). Figure 4e R

Figure 5

Figures Link
The Venn diagrams show the number of Etv2 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC Etv2 induced MEF. Figure 5a R
The Venn diagrams show the number of Etv2 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC Etv2 induced EB differentiation Figure 5b R
The aggregated V-plot includes 1,626, 222 and 2,305 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced EB differentiation Figure 5c R
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Etv2 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. Figure 5d R

Figure 6

Figures Link
Dot plots comparing the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC. Figure 6a R
The barplots show the genomic distribution of Etv2 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in EB differentiation or MEF reprogramming. Figure 6b R
The aggregated V-plot include 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. Figure 6c R
The heatmaps showing Etv2, Brg1, H3K27ac ChIP-seq of 3,000 and 1,623 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at day 2.5 EB (Brg1 and H3K27ac), 3 hours post Etv2 induction (Etv2), and 12 hours post Etv2 induction (Etv2, Brg1 and H3K27ac). Figure 6d R
The barplots show the percent of genes that were down-regulated, up-regulated or not changed between day 2.5 EB and 12 hours post Etv2 induction. Figure 6e R
Brachyury (T) and Mycn are significantly down-regulated during the Etv2 induced differentiation. Fgure 6f R
Brachyury (T) and Mycn (f) have Etv2 motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS) Figure 6g R

Supplementary Figures

Figure S1

Figures Link
The density plots show the observed (red) and corrected (green) fragment size distribution of 13 samples from a human hematopoietic differentiation ATAC-seq data (GSE96771). Figure S1a R

Figure S2

Figures Link
The plot shows the AUC of SeATAC, NucleoATAC and MACS2 at different shift sizes (from 10 to 100) used to generate the synthetic data for evaluating task #1. Figure S2a R

Figure S3

Figures Link
The plots show the AUC (area under ROC) of SeATAC on 523 ATAC-seq samples from 20 studies at (a) total read counts (Total QNAMEs), (b) mitochondria rate, (c) proper pair rate, (d) unmapped rate, (e) has unmapped mate rate, (f) non-redundant fraction, (g) PCR bottleneck coefficient 1, and (h) PCR bottleneck coefficient 2. Figure S3a-h R

Figure S4

Figures Link
The area under ROC (AUC) of three tools, SeATAC, NucleoATAC and MACS2 on the regions over promoter region (column wise) and latent dimensions (row wise). Figure S4b R
The area under ROC (AUC) of three tools on 17 paired RNA-seq / ATAC-seq datasets. Figure S4c R

Figure S5

Figures Link
The aggregated V-plot includes: 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions, respectively. The heatmap color indicates the estimated read density. Figure S5a R
The line plots include: mean signal of H3K27ac, h3K4me1, H3K4me3 signals of 728 and 1,633 NFKB1 binding sites with increased chromatin accessibility in GM12878 compared with K562 at distal and promoter regions. Figure S5b R
The mean squared error of observed and predicted histone modification signals. Figure S5d R

Figure S6

Figures Link
The aggregated V-plot includes 2,776, 116 and 1,449 Etv2 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Etv2 induced MEF reprogramming Figure S6a R
The heatmaps show the Etv2, Brg1, H3K27ac ChIP-seq of 3,996 and 1,307 Etv2 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility at undifferentiated MEFs (Brg1 and H3K27ac), 1 day post-Etv2 induction (Etv2), and 7 days post-Etv2 induction (Etv2, Brg1 and H3K27ac). Figure S6b R
The UCSC genome browser track show the ATAC-seq density near the Etv2 motifs at the promoters of Brachyury (T) and Mycn. Figure S6c T
Mycn

Figure S7

Figures Link
The Venn diagrams show the number of Ascl1 motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC. Figure S7a R
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have Ascl1 motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. Figure S7b R
The aggregated V-plot includes 8,658, 7,687 and 7,708 Ascl1 motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of Ascl1 induced MEF reprogramming (undifferentiated MEFs vs. 22 days post Ascl1 induction). Figure S7c R

Figure S8

Figures Link
The Venn diagrams show the number of OSK motifs with increased chromatin accessibility identified by SeATAC, MACS2 and NucleoATAC. Figure S8a R
The barplots show the Gene Ontology (GO) terms that are significantly associated with the genes which promoters (-5,000 - +1,000bp region flanking the TSS) have OSK motifs with increased chromatin accessibility, identified by SeATAC, MACS2 and NucleoATAC. Figure S8b R
The aggregated V-plot includes 5,826, 1,355 and 6,371 OSK motifs with increased chromatin accessibility identified by SeATAC only, MACS2 only and NucleoATAC only in ATAC-seq data of OSK induced MEF reprogramming Figure S8c R

Figure S9

Figures Link
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC Figure S9a R
The barplots show the genomic distribution of Ascl1 binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming. Figure S9b R
The aggregated V-plot include 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. Figure S9c R
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 24,098 and 7,071 Ascl1 binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming. Figure S9d R
The V-plot show Ascl1 motifs with decreased chromatin accessibility at the promoters (-5,000 - +1,000bp region flanking the TSS) of four genes (Hmga2, Elf4, Egfr and Hes1) that are down-regulated during the Ascl1 induced MEF reprogramming. Figure S9e R

Figure S10

Figures Link
The dot plots compare the changes of motif associated chromatin accessibility estimated by chromVAR (x-axis) and the difference of the percent of TFBS with decreased or increased chromatin accessibility estimated by SeATAC Figure S10a R
The barplots show the genomic distribution of OSK binding sites with decreased (NFR->NOR) or increased (NOR->NFR) chromatin accessibility in MEF reprogramming. Figure S10b R
The aggregated V-plot include 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during MEF reprograming. Figure S10c R
The heatmaps show the MNase-seq, H3K27m3, H3K36m3, H3K9ac, H3K79me2, H3K4me2, H3K4me1 and P300 ChIP-seq signals in undifferentiated MEFs of 15,825 and 4,935 OSK binding sites that have increased (NOR->NFR) or decreased (NFR->NOR) chromatin accessibility during the MEF reprogramming. Figure S10d R
The barplots show the percent of genes that were down-regulated, up-regulated or not changed between undifferentiated MEFs and 7 hours post OSK induction. Figure S10e R
Maf and Smad3 are significantly down-regulated during the OSK induced MEF reprogramming. Figure S10f R
Maf and Smad3 have OSK motifs that become significantly less accessible during the differentiation at their promoter region (-5,000 - +1,000bp region flanking the TSS). Figure S10g R

Figure S11

Figures Link
Example regions with significantly increased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples. (f-j) Example regions with significantly decreased chromatin accessibility from undifferentiated MEFs to D7 Flk1+ samples. Figure S11a-j R