/atacseq-heatmaps

Tutorial Code on creating sample comparison heatmaps from ATAC-Seq data

Primary LanguageJupyter Notebook

Constructing Heat Maps to visualize Sequencing Data

Ravi Mandla
Last Updated on 04/03/2020

Heat maps are a great tool to compare the read counts of sequencing experiments in different populations, often used in RNA-seq experiments to visualize the gene expression differences between different populations (there's a great tutorial on doing through the galaxy project). However, the use of heat maps in such a way is not restricted to just RNA-seq experiments, and can be used in a wide range of sequencing experiments. I recently worked on a project where I was asked to create such a heat map comparing ATAC-seq data from two distinct cell populations, and struggled to find online resources on doing such experiments. Additionally, I found almost no python code on analyzing sequencing data, with most being written in R. I hope this tutorial will be useful for those in similar situations :)

Steps

The general outline for conducting such an analysis goes as follows...

  1. Obtain genomic coordinates for BAM peaks
  2. Create a list of consensus sequences, with sequences present in all samples, as well as their read count
  3. Use thresholds to limit the amount of consensus sequences one is analyzing
  4. Normalize your read counts
  5. Standardize the normal values to z-scores
  6. Plot the data

Example

I go through all of the above steps, as well as a tutorial using sampled data in the tutorial.ipynb file in Python. Check it out if you get stuck.

Resources

Here are a bunch of resources I found which helped me with this project:

The Galaxy Project's RNA-seq heatmap tutorial

EdgeR User Guide

Dave Tang's EdgeR Normalisation guide

Dave Tang's Pheatmap guide

Kamil Slowikowski's Pheatmap guide

Pheatmap documentation

Zuguang Gu's ComplexHeatmap Reference Book

Normalizing RNAseq Data

DESeq2 tutorial