fanagislab/EndHiC

endhic to juicebox

Opened this issue · 5 comments

cyycyj commented

Dear developers,

I want to express my appreciation for EndHiC, as it has proven to be an exceptionally fast software tool, surpassing the performance of other tools like 3d-dna. I am currently working on the assembly of a plant genome, which is approximately 450Mb in size with a heterozygosity level of 1.29%.

When I used EndHiC to scaffold haplotype 1 and haplotype 2, it successfully produced 15 chromosomes. However, when I attempted to scaffold the primary contig, I obtained only 13 chromosomes.

Now, I would like to proceed with the pipeline to convert EndHiC results into Juicebox-compatible file formats and manually separate the chromosomes using Juicebox. However, I've come across some instructions that are not entirely clear to me. Specifically, I'm unsure about the terms "contig.fa" and "draft.fa." Could you please provide clarification on these terms?

Thank you for your assistance.

`(4)

##convert EndHiC .cluster file into juicebox .assembly file
cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly

##index the contig sequence file
bwa index draft.fa

##generate the enzyme cutting sites file draft_MboI.txt
juicer/misc/generate_site_positions.py MboI draft draft.fa

##generate the Hi-C reads alignment file aligned/merged_nodups.txt
##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/;
juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU

##generate the hic input file draft.hic for viewing in juicebox
3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt

For more instructions, please refer to the help pages of juicer and juicebox.
(4) Convert EndHiC result to juicebox compatible file formats, which can be viewed in Juicebox

##convert EndHiC .cluster file into juicebox .assembly file
cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly

##index the contig sequence file
bwa index draft.fa

##generate the enzyme cutting sites file draft_MboI.txt
juicer/misc/generate_site_positions.py MboI draft draft.fa

##generate the Hi-C reads alignment file aligned/merged_nodups.txt
##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/;
juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU

##generate the hic input file draft.hic for viewing in juicebox
3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt

For more instructions, please refer to the help pages of juicer and juicebox.`

@cyycyj Thank you for your feedbacks on using EndHiC. The terms "contig.fa" and "draft.fa" refer to the same file of original contig assembly, for example the primary contig generated by hifiasm.

cyycyj commented

Dear Prof. Fan and Wangsen,

Thank you for your help, and I'll try it one more time. By the way, does the endhic has contig break or check function? I found that hifiasm assemble a quite long contig (about 1.5 times of the longest chromosome, you can also find detail in formation here: https://github.com/chhylp123/hifiasm/issues/546#issue-1964635479 ), maybe break it will be right.

Best regards

cyycyj commented

By the way, maybe the juicer pipeline you wrote in the tutorial is based on juicer version 1.x? I recommend adding a tip about -- assembly parameter, it is necessary for juicer 2.0 to generate merged_nodups file.