/3DAROC

#DAROC Course, starting in 2016

Primary LanguageJupyter Notebook

3DAROC

Course description

3C-based methods, such as Hi-C, produce a huge amount of raw data as pairs of DNA reads that are spatially close in the cell nucleus. Overall, these interaction matrices have been used to study how the genome folds within the nucleus, that is one of the most fascinating problems in modern biology. The rigorous analysis of the paired-reads using computational tools has been essential to fully exploit the experimental technique, and to study how the genome is folded in the space. Currently, there is a huge expansion on the wealth of data on genome structure with the availability of many datasets of Hi-C experiments down to 1 kb resolution (see for example: http://hic.umassmed.edu/welcome/welcome.php ; http://promoter.bx.psu.edu/hi-c/view.php or http://www.aidenlab.org/data.html ). In this course, participants will learn to use TADbit, a software designed and developed to manage all the dimensionalities of the Hi-C data:

  • 1D - Map paired-end sequences to generate Hi-C interaction matrices
  • 2D - Normalize matrices and identify constitutive domains (compartments, TADs)
  • 3D - Generate populations of model structures which reproduce the Hi-C interaction matrices
  • 4D - Compare samples at different time points

Participants can bring specific biological questions and/or their own 3C data to analyze during the course. At the end of the course, participants will be familiar with the TADbit software, and will be able to fully analyze Hi-C data. Note: Although the TADbit software is central in this course, alternative software will be discussed for each part of the analysis.

Instructors

Marc A. Marti-Renom obtained a Ph.D. in Biophysics from the Universidad Autonoma de Barcelona where he worked on protein folding under the supervision of B. Oliva, F.X. Aviles and M. Karplus. After that, he went to the US for a postdoctoral training on protein structure modeling at the Sali Lab (Rockefeller University) as the recipient of the Burroughs Wellcome Fund fellowship. Later on, Marc was appointed Assistant Adjunct Professor at UCSF. Between 2006 and 2011, he headed of the Structural Genomics Group at the CIPF in Valencia (Spain). Currently, Marc is an ICREA research professor and leads the Structural Genomics Group at the National Center for Genomic Analysis - Centre for Genomic Regulation (CNAG-CRG) in Barcelona. His group is broadly interested on how RNA, proteins and genomes organize and regulate cell fate. Finally, Marc is an Associate Editor of the PLoS Computational Biology journal and has published over 90 articles in international peer-reviewed journals.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

François Serra obtained his Degree in Biology, specialized in Physiology and Neurophysiology, his Master's Degree in Structural genomics and bioinformatics (Strasbourg I University, France) and it's PhD in Evolutionary Genomics in the Department of Bioinformatics at the CIPF (Valencia). He is now part of the Structural Genomic team of Marc Marti-Renom at CNAG and at CRG (Barcelona). His main research interests are grounded on comparative genomics and evolution with a special focus on the effect of evolution in the structural arrangement of genomes. He has taught MEPA and 3DMOG for GTPB, and also in similar courses at CIPF (Valencia, ES) and the Department of Genetics of the University of Cambridge (UK).

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

Marco Di Stefano obtained his Ph.D. in Biophysics from SISSA in Trieste (Italy) working on Physics-based structural models of chromosomes to study the relationship between gene co-expression and gene co-localization. He currently works as a post-doctoral researcher in the structural genomics group of Marc Marti-Renom at CNAG-CRG (Barcelona). His main research interest is to integrate biopolymer physics and experimental techniques, such as imaging and 3C, to characterize constitutive mechanisms of chromosome folding. He is involved in the YRM initiative (http://www.yrmr.it/drupal/) for young Physicists. He has taught 3DAROC16 for GTPB.

Affiliation: Centro Nacional de Análisis Genómico (CNAG) and Center for Genomic Regulation (CRG), Barcelona, ES

Target Audience

The course is designed for experimental researchers and bioinformaticians at the graduate and post-graduate levels which are interested in studying the genome spatial organization.

It is likely that the participants to this course aim at getting involved in generating Hi-C data for chromosome structure determination, or that they just want to be able to correctly interpret and analyse publicly available data.

Course Pre-requisites

Recommended Linux and basic Python programming skills, graduate level in Life Sciences. All hands-on will be given at 3 levels of computational expertise (web platform, command-line tool and python scripting).

TADbit API

This tutorial is associated with a specific version of TADbit, if you wish to reproduce exactly the results in the notebooks you should use the version of TADbit tagged 3DAROC_2016.

To install this version do:

git clone https://github.com/3dgenomes/tadbit
cd tadbit
git checkout tags/3DAROC_2018
sudo python setup.py install

TADbit tools

Most of the tasks of the "core pipeline" can be tunned directly from command line (without any python), using TADbit tool. Have a look to the commands, and the metadata of the results.

For now TADbit tool is not incuded in the general documetation, as it is still "under construction". Use it carefully, and don't hesitate to repport any weird behaviour you observe.

Virtual research environment

With small datasets TADbit core pipeline can be runned through a new Virtual Research Environment (VRE), hosted by the MuG project.

This might also be the best place to try TADkit for visualizing genomes in 3D together with interactions matrices and any other genomic track.

Course Timetable

(provisional)

Day #1 Monday, May 21st
09:30 - 10:00 Welcome and introductions
10:00 - 11:00 Overview on structure determination
11:00 - 11:30 Coffee Break
11:30 - 12:30 3D modeling of genomes and genomic domains
12:30 - 14:00 Lunch Break
14:00 - 15:00 Introduction to Linux and Python: the Jupyter notebook
15:00 - 16:00 Next Generation Sequencing (NGS) and data handling
16:00 - 16:30 Tea Break
16:30 - 18:00 From raw data to Hi-C contact matrices
Day #2 Tuesday, May 22nd
09:30 - 11:00 Morning wrap-up: what have we done so far?
Multiscale Genomics: From genomes to structures
11:00 - 11:30 Coffee Break
11:30 - 12:30 Coarse-Grained DNA and Chromatin Dynamics
12:30 - 14:00 Lunch Break
14:00 - 16:00 Nucleosome positioning
16:00 - 16:30 Tea Break
16:30 - 18:00 Nucleosome Dynamics
Day #3 Wednesday, May 23rd
09:30 - 11:00 Morning wrap-up: what have we done so far?
Chromatin structure and Hi-C data
11:00 - 11:30 Coffee Break
11:30 - 12:30 Integrative modeling applied to chromatin
12:30 - 14:00 Lunch Break
14:00 - 16:00 Biological applications (I)
16:00 - 16:30 Tea Break
16:30 - 18:00 Hi-C contact matrices: filtering and normalization
Day #4 Thursday, May 24th
09:30 - 11:00 Morning wrap-up: what have we done so far?
Biological applications (II)
11:00 - 11:30 Coffee Break
11:30 - 12:30 Compartment detection and analysis
12:30 - 14:00 Lunch Break
14:00 - 16:00 Topologically Associated Domains detection and analysis
16:00 - 16:30 Tea Break
16:30 - 18:00 Comparison between experiments
Day #5 Friday, May 25th
09:30 - 11:00 Morning wrap-up: what have we done so far?
Biological applications (III)
11:00 - 11:30 Coffee Break
11:30 - 12:30 3D Modeling of real Hi-C data with TADbit (I)
12:30 - 14:00 Lunch Break
14:00 - 16:00 3D Modeling of real Hi-C data with TADbit (II)
16:00 - 16:30 Tea Break
16:30 - 18:00 Final wrap-up session

Course material

Lectures (pdf) Core pipeline (notebooks) Annex (notebooks)
Day1
  • [Hi-C Quality check](/Notebooks/00-Hi-C quality check.ipynb)
  • Mapping
  • [Parsing mapped reads](/Notebooks/02-Parsing mapped reads.ipynb)
  • [Software installation](/Notebooks/A0-Preparing your computer for HiC data analysis.ipynb)
  • [Prepare reference genome](/Notebooks/A1-Preparation reference genome.ipynb)
  • [Download Hi-C experiment](/Notebooks/A2-Download published Hi-C experiments.ipynb)
Day2
  • [Filterind reads](/Notebooks/03-Filtering mapped reads.ipynb)
  • [Normalization](Notebooks/04-Bin-filtering and normalization.ipynb)
  • [Compare/merge experiments](/Notebooks/A3-Compare and merge Hi-C experiments.ipynb)
Day3
  • [Compartments and TADs](/Notebooks/05-Compartments and TADs.ipynb)
  • [Align and compare TADs](/Notebooks/A4-Align and compare TADs.ipynb)
Day4
  • [Parameter optimization](/Notebooks/06a-Modeling - parameter optimization.ipynb)
  • [Model optimization](/Notebooks/06b-Modeling - model optimization.ipynb)
  • [Analysis of 3D models](/Notebooks/A5-Modeling - analysis of 3D models.ipynb)
Day5

Feedback

Feedback (1: not clear; 5: very clear)
Day1
  • Integrative modeling: xx
  • FASTQ/Hi-C quality check: xx
  • Mapping: xx
  • Day2
  • Genome organization: xx
  • 3D modeling: xx
  • Filtering/normalization: xx
  • Day3
  • Validation 3D models: xx
  • Compartment/TAD calling: xx
  • TADbit usage: xx
  • Day4
    Day5