This code is used to scaffold your assemblies using Hi-C data. To use the code first run the following command:
make
To run the code, you will need Samtools, Bedtools, pbcore and Networkx.
After this, there will be two binary files generated, one for break_contigs.cpp
and one for triangle_plot.cpp
. Now you can run the code as follows:
The primary file to run the pipeline is run.py which has following options
python run.py -h
usage: run.py [-h] -a ASSEMBLY -m MAPPING -d DIR [-b MISSASSEMBLY]
optional arguments:
-h, --help show this help message and exit
-a ASSEMBLY, --assembly ASSEMBLY
assembled contigs
-m MAPPING, --mapping MAPPING
mapping of read to contigs in bam format
-d DIR, --dir DIR output directory for results
-b MISSASSEMBLY, --missassembly MISSASSEMBLY
add flag to find and break misassemblies from the
contigs
The fasta file containing final scaffolds will be generated in your output directory as scaffold.fasta
. Each step in the pipeline is easy to run on its own as well. You can tweak the parameters in the files to suit your data and run the whole pipeline.
##Results
We ran this tool on the assembly of NA12878 with N50 = 1.55 Mb and used 725 million hi-c reads for scaffolding. Our final scaffold had N50 = 80 Mb. Here are the dotplots for the chromosomes