scATAC-seq-analysis-notes

my notes for scATACseq analysis

paper to read

ATAC-seq QC

protocols

An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues

Fragment length distribution

A blog post by Xi Chen

The successful construction of a ATAC library requires a proper pair of Tn5 transposase cutting events at the ends of DNA. In the nucleosome-free open chromatin regions, many molecules of Tn5 can kick in and chop the DNA into small pieces; around nucleosome-occupied regions, Tn5 can only access the linker regions. Therefore, in a normal ATAC-seq library, you should expect to see a sharp peak at the <100 bp region (open chromatin), and a peak at ~200bp region (mono-nucleosome), and other larger peaks (multi-nucleosomes)

GreenleafLab/NucleoATAC#18

there might be some artifact with how the aligner deals with fragments where the forward read and reverse read are exact reverse complements of each other. I know that Bowtie (1 but not 2) has some issue with those reads.

ATAC-seq

Some may notice that the peaks produced look both like peaks produced from the TF ChIP-seq pipeline as well as the histone ChIP-seq pipeline. This is intentional, as ATAC-seq data looks both like TF data (narrow peaks of signal) as well as histone data (broader regions of openness).

peak calling

--shift -100 --extsize 200 will amplify the 'cutting sites' enrichment from ATAC-seq data. So in the end, the 'peak' is where Tn5 transposase likes to attack. The fact is that, although many information such as the insertion length and the other mate alignment is ignored, such result is still usable. Especially when the short fragment population is extremely dominant, the final output won't be off much.

macs2 --nomodel --keepdup all --shift -100 --extsize 200
macs2 -f BAMPE
  • generich written by John, previous labmates at Harvard FAS informatics. will take a look!

motif analysis

Dimension Reduction

clustering

copy-number

  • Alleloscope is a method for allele-specific copy number estimation that can be applied to single cell DNA and ATAC sequencing data (separately or in combination), allowing for integrative multi-omic analysis of allele-specific copy number and chromatin accessibility for the same cell.

footprint

nucleosome positioning

pipelines

integrate scATAC and scRNAseq

predicting ATAC peak target gene