This is a snakemake
workflow to normalize Hi-C matrices by genomic distance
using HiFive
-
Make sure you have conda installed.
-
Then create two environments to run the pipeline:
- Workflow environment:
conda env create -n hic-workflow --file conda-workflow.lst
- HiFive Py2 environment:
conda env create -n hic-hifive --file conda-hifive.lst
- Workflow environment:
-
Activate the workflow environment:
conda activate hic-workflow
-
Create alias for
snakemake
so that it remains accessible even if environment is unloaded:alias snakemake=$(which snakemake)
-
Edit
config.yaml
to- specify samtools binary in
samtools_bin
(you can find this out by typingwhich samtools
) - adjust
min_interactions
that indicates the minimum number of ditags between two genomic regions so that they are considered to be interacting with each other (you can specify multiple thresholds, which is encouraged) - adjust
resolution
that determines the coarseness of the hic-map (you can specify here multiple resolutions which is encouraged) - adjust
super_resolution_factors
that smoothes the matrix by taking into account the ditags of neighboring genomic regions (you can specify multiple factors which is encouraged)
- specify samtools binary in
-
Create
dataset.cfg
- Copy template:
cp dataset.cfg.template dataset.cfg
- Fill out the settings in
[global]
, then create entries for each replicate/condition, each time specifying the paths for read pair 1 & 2.
- Copy template:
-
Load HiFive environment:
conda activate hic-hifive
-
Run
snakemake
:snakemake -j<number of threads you can afford>