/MethodsMolBiol

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Analysis and visualization of chromatin folding

Overview

3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are spatially close in the cell nucleus. Overall, these interaction matrices have been used to study how the genome folds within the nucleus, that is one of the most fascinating problems in modern biology. The rigorous analysis of the paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in the space. Currently, there is a huge expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1 kb resolution (see for example: http://hic.umassmed.edu/welcome/welcome.php ; http://promoter.bx.psu.edu/hi-c/view.php or http://www.aidenlab.org/data.html ).

This tutorial is about the use of TADbit, a software designed and developed to manage all the dimensionalities of the Hi-C data:

  • 1D - Map paired-end sequences to generate Hi-C interaction matrices
  • 2D - Normalize matrices and identify constitutive domains (compartments, TADs)
  • 3D - Generate populations of model structures which reproduce the Hi-C interaction matrices
  • 4D - Compare samples at different time points

Material

Core pipeline Annex

Target Audience

The tutorial is designed for experimental researchers and bioinformaticians at the graduate and post-graduate levels which are interested in studying the genome spatial organization.

Technical pre-requisites

Recommended Linux and basic Python programming skills, graduate level in Life Sciences. All hands-on will be given at 3 levels of computational expertise (web platform, command-line tool and python scripting).