Heat maps are a great tool to compare the read counts of sequencing experiments in different populations, often used in RNA-seq experiments to visualize the gene expression differences between different populations (there's a great tutorial on doing through the galaxy project). However, the use of heat maps in such a way is not restricted to just RNA-seq experiments, and can be used in a wide range of sequencing experiments. I recently worked on a project where I was asked to create such a heat map comparing ATAC-seq data from two distinct cell populations, and struggled to find online resources on doing such experiments. Additionally, I found almost no python code on analyzing sequencing data, with most being written in R. I hope this tutorial will be useful for those in similar situations :)
The general outline for conducting such an analysis goes as follows...
- Obtain genomic coordinates for BAM peaks
- Create a list of consensus sequences, with sequences present in all samples, as well as their read count
- Use thresholds to limit the amount of consensus sequences one is analyzing
- Normalize your read counts
- Standardize the normal values to z-scores
- Plot the data
I go through all of the above steps, as well as a tutorial using sampled data in the tutorial.ipynb
file in Python. Check it out if you get stuck.
Here are a bunch of resources I found which helped me with this project:
The Galaxy Project's RNA-seq heatmap tutorial
Dave Tang's EdgeR Normalisation guide
Kamil Slowikowski's Pheatmap guide