Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation
Nathan Harmston*1,2,3, Elizabeth Ing-Simmons1,2,4, Ge Tan1,2, Malcolm Perry1,2, Matthias Merkenschlager2,4, Boris Lenhard*1,2,5
1 Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, UK.
2 Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, UK.
3 Program in Cardiovascular and Metabolic Disease, Duke-NUS Graduate Medical School, 8 College Road, Singapore 169857, Singapore.
4 Lymphocyte Development, MRC London Institute of Medical Sciences, London W12 0NN, UK.
5 Sars International Centre for Marine Molecular Biology, University of Bergen, N-5008 Bergen, Norway.
* Correspondence should be addressed to NH (nathan.harmston@duke-nus.edu.sg) and BL (b.lenhard@imperial.ac.uk)
Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. Here, we show that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of non-coding conservation exhibit distinct properties compared to TADs devoid of extreme non-coding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years.
All figures and results in the manuscript can be reproduced from the R scripts within this repository.
Script | Main figures | Supplementary figures |
---|---|---|
plot_grbs_figure1.Rmd | Figure 1 | FigureS1 |
plot_grbs_figureS2.Rmd | Figure S2 | |
calculate_grb_tad_overlaps_human.Rmd | Figure 2 | Figure S3 |
calculate_grb_tad_overlaps_fly.Rmd | Figure 2 | Figure S3 |
calculate_pvalue_distances.Rmd | Figure S3 | |
grbs_h3k27ac.Rmd | Figure S4 | |
ContactDomains.Rmd | Figure S5 | |
plot_grbs_figure3.Rmd | Figure 3 | |
plot_grbs_figureS6.Rmd | Figure S6 | |
repeat_analysis.Rmd | Figure 4 | Figure S8 |
chromatin_colour.Rmd | Figure 4 | Figure S8 |
ctcf_analysis.Rmd | Figure 4 | Figure S8 |
dev_analysis.Rmd | Figure 4 | Figure S9 |
genome_comparison.Rmd | Figure 5 |