What is this repository for?
This repository contains the following executable programs. (CHRomatin EVALuation program for heatmaps)
- this is the actually dynamic programming algorithm part of the program. It calculates the optimal and suboptmal structures of chromatin the Boltzmann distribution of the principal structures.
- this is a driver for carrying out analysis of the bed files with corresponding heat maps. For a simple example, see the directory tests/test_analyze_loops in this distribution and, for complete calculations of AB compartments and CCDs using this tool, see the directory results_in_manuscript/ in this distribution.
- generates a heatmap of the specified input heatmap
- generates a heatmap file based on user specified contacts (and weights if desired). For information on the input format, run -hExFile -hExSeq
- generates a poly(A) sequence of a specified length. These sequences are used to help generate 3D structures from the *.simres files generated while running
- converts the SimRNA3.21 generated pdb output files into a single bead model (for building 3D representations of chromatin .... presently, the coordinates are not rescaled).
Chreval is the main driver for calculating the free energy of observed heatmaps. Chreval is a dynamic programming method for determining the most probably chromatin structure arrangement as well as the distribution of chromatin structure arrangements as a function of free energy of various motifs found in the structure.
Analyze_loops is for handling a group of heatmap files with the properly. The format is Przemek's bed format. The program calls Chreval. You should make sure that the files listed in the bed file also exist in your directory. Please see the directories tests and results_in_manuscript for examples of using this tool.
Make_heatmap is a tool to generate heatmaps either from *.heat files or the output from Chreval *.clust
My_generation is a program to generate heatmaps. Entries must be given in dot bracket format. An example is provided in the command line by using the flag -hExFile.
- how to run Chreval?
To run, the minimum information you need is a heat map data file. This file contains experimental data from a source (particularly ChIA-PET but possibly Hi-C). This data (particularly ChIA-PET) is highly correlated with positions of the CTCF binding proteins and cohesin. The CTCF sites largely correspond to regions that form loops and typically regulate the chromatin.
The heat maps consists of a symmetric matrix where the indices (i,j), corresponding to row and column positions, indicate points on the chromatin chain where segment i and j interact with each other. For simplicity, we assume i < j, and look only at the upper triangle. A reflection of matrix is found across the diagonal. The intensity of a given point (i,j) is proportional to the frequency that this particular interaction was encountered in the experimental data, so small numbers mean only weak interactions, and large numbers mean strong interactions.
Once you obtain a heat make, or have created one using, you can run this program with the following command line example: -f myexample.heat
Specifically, in the directory "tests", the following command can be used to calculate the example heatmap:
cd tests -f chr10_64313472_64921344_res5kb.heat
or, another example
cd tests/test_analyze_loops/eheat_files -f chr1_1890973_2316695.eheat
The file chr10_64313472_64921344_res5kb.heat has the labeling information "chrN_x_y_res5kb.heat", where N is the chromosome number, x is the starting position and y is the ending position. Further, the "res5kb" indicates the resolution of the grid that was generated. If the grid size is smaller or larger, an option can be used to change this grid size from the default (presently 5 kb). The file must end with the extension heat or the newer form "eheat" (extended heatmap file).
The output directory from chreval in the above example will be chr10_64313472_64921344_res5kb; i.e., the directory name "chrN_x_y_res5kb". This directory contains (in separate files) the top layer of suboptimal structures within some specifable energy range from the mimumum free energy (default is 10 kcal/mol) or a fractional percentage of the free energy. These files have the extension "DBN" and can be read by the 3rd party program VARNA. Additionally, Chevral is set up to provide heatmap, pairing info, and restraint files for SimRNA 3D calculations. Two additional files are chrN_x_y_res5kb_BDwt.clust that contains a matrix with the Boltzmann probabilities for different interactions and chrN_x_y_res5kb_summary.txt that contains a shorthand list of the secondary structures.
There are a variety of additional options. Please run -h
to obtain additional information on additional command line options.
- How to run analyze_loops?
Examples of using are provided in the directories "tests/test_anal_loops" and "results_in_manuscript" included in this distribution.
cd tests/test_analyze_loops -ff test_loops.CTCF.withAandB.annotated.bed
The program will look up files in the same directory as the *.bed file and try to compute the free energy using the object Chreval.
For more information on how to run the program, please run -h
- How to run Other programs
Here are some additional command line arguments of the other programs in this set. chr1_1890973_2316695.eheat + extended heatmap files
- makes a 2D heat map of the file. contain more detailed information. See the directory "tests/test_analyze_loops/eheat_files" chr10_64313472_64921344_res5kb.heat
- makes a heat map of the file.
- How to run
Here is an example. -seq ".ABCDE.((..)).abcde" "{.................}"
Note that it doesn't matter that one of the structures overlaps the other one.
- How to run the SimRNA packages to obtain 3D structures from Chreval outputs?
You must download the executable version of SimRNA from the following website
Copy the "data" directory and config.dat file to a separate directory where you want to build the 3D structure.
Copy the relevant simres file to that same directory.
in data, change all the values in histograms3D_3.list to 0, except for the last four that have 0.1 in as the
cat histograms3D_3.list ./data/AA3.hist 0.0 ./data/AC3.hist 0.0
./data/A_3_exvol.hist 0.1 ...
in the config.dat file, change the parameter ETA_THETA_WEIGHT from 0.4 to 0.0
cat config.dat ...
Build a sequence of the proper length (N) for poly(A) N > myseq.seq
Now run a replica exchange Monte Carlo simulation using SimRNA
SimRNA3 -s myseq.seq -r myseq.simres -c config.dat -E 10 -o myseq >& myseq.log &
To generate sequences, use the following
cat myseq_x_??.trafl >> myseq_x.trafl clustering myseq_x.trafl 0.1 15.0
this generates the files
to convert the first trajectory in this file to a pdb representation:
SimRNA_trafl2pdbs myseq_x_01-000001.pdb myseq_x.trafl 1
this generates the file myseq_x_thrs15.00A_clust01-000001.pdb
Now we convert that to a single bead representation myseq_x_thrs15.00A_clust01-000001.pdb
which generates a file
this pdb file can be viewed in a recognizable for using chimera, vmd, or pymol 10 aaaaaaaaaa SimRNA_pdboutput.pdb
- generates a PDB file formatted in a way so that (currently) the phosphate atom is treated as the main binding interaction. The particular atom can be changed in the program, and even more than one atom displayed, if actually desired.
Version 1.0
The package can run as is, but some applications may require the installation of the following python packages, if not already installed.
matplotlib, numpy, random and argparse
Standard python 3.6 and higher
matplot, numpy, random and argparse; otherwise, none
- How to run tests
some examples are provided in the directory test
Who do I talk to?
to consult about bugs and other issues of the code, please contact
Wayne Dawson
Laboratory of Functional and Structural Genomeics
Center of New Technologies
University of Warsaw
Banacha 2C, 02-098 Warsaw
