endhic to juicebox

Question

endhic to juicebox

Opened this issue a year ago · 5 comments

Dear developers,

I want to express my appreciation for EndHiC, as it has proven to be an exceptionally fast software tool, surpassing the performance of other tools like 3d-dna. I am currently working on the assembly of a plant genome, which is approximately 450Mb in size with a heterozygosity level of 1.29%.

When I used EndHiC to scaffold haplotype 1 and haplotype 2, it successfully produced 15 chromosomes. However, when I attempted to scaffold the primary contig, I obtained only 13 chromosomes.

Now, I would like to proceed with the pipeline to convert EndHiC results into Juicebox-compatible file formats and manually separate the chromosomes using Juicebox. However, I've come across some instructions that are not entirely clear to me. Specifically, I'm unsure about the terms "contig.fa" and "draft.fa." Could you please provide clarification on these terms?

Thank you for your assistance.

`(4)

##convert EndHiC .cluster file into juicebox .assembly file
cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly

##index the contig sequence file
bwa index draft.fa

##generate the enzyme cutting sites file draft_MboI.txt
juicer/misc/generate_site_positions.py MboI draft draft.fa

##generate the Hi-C reads alignment file aligned/merged_nodups.txt
##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/;
juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU

##generate the hic input file draft.hic for viewing in juicebox
3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt

For more instructions, please refer to the help pages of juicer and juicebox.
(4) Convert EndHiC result to juicebox compatible file formats, which can be viewed in Juicebox

##convert EndHiC .cluster file into juicebox .assembly file
cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly

##index the contig sequence file
bwa index draft.fa

##generate the enzyme cutting sites file draft_MboI.txt
juicer/misc/generate_site_positions.py MboI draft draft.fa

##generate the Hi-C reads alignment file aligned/merged_nodups.txt
##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/;
juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU

##generate the hic input file draft.hic for viewing in juicebox
3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt

For more instructions, please refer to the help pages of juicer and juicebox.`

Answer 1 · 2023-10-26T01:12:07.000Z

Wangsen, Please answer these questions. ***@***.*** From: Andrew Chen Date: 2023-10-25 17:15 To: fanagislab/EndHiC CC: Subscribed Subject: [fanagislab/EndHiC] endhic to juicebox (Issue #7) Dear developers, I want to express my appreciation for EndHiC, as it has proven to be an exceptionally fast software tool, surpassing the performance of other tools like 3d-dna. I am currently working on the assembly of a plant genome, which is approximately 450Mb in size with a heterozygosity level of 1.29%. When I used EndHiC to scaffold haplotype 1 and haplotype 2, it successfully produced 15 chromosomes. However, when I attempted to scaffold the primary contig, I obtained only 13 chromosomes. Now, I would like to proceed with the pipeline to convert EndHiC results into Juicebox-compatible file formats and manually separate the chromosomes using Juicebox. However, I've come across some instructions that are not entirely clear to me. Specifically, I'm unsure about the terms "contig.fa" and "draft.fa." Could you please provide clarification on these terms? Thank you for your assistance. `(4) ##convert EndHiC .cluster file into juicebox .assembly file cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly ##index the contig sequence file bwa index draft.fa ##generate the enzyme cutting sites file draft_MboI.txt juicer/misc/generate_site_positions.py MboI draft draft.fa ##generate the Hi-C reads alignment file aligned/merged_nodups.txt ##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/; juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU ##generate the hic input file draft.hic for viewing in juicebox 3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt For more instructions, please refer to the help pages of juicer and juicebox. (4) Convert EndHiC result to juicebox compatible file formats, which can be viewed in Juicebox ##convert EndHiC .cluster file into juicebox .assembly file cluster_to_juciebox_assembly.pl contigs.fa.len z.EndHiC.A.results.summary.cluster > draft.assembly ##index the contig sequence file bwa index draft.fa ##generate the enzyme cutting sites file draft_MboI.txt juicer/misc/generate_site_positions.py MboI draft draft.fa ##generate the Hi-C reads alignment file aligned/merged_nodups.txt ##prepare data: put the HiC reads under ./fastq/; put the contig sequence file and index files under ./reference/; juicer/CPU/juicer.sh -S early -g draft -s MboI -z ./references/draft.fa -y ./draft_MboI.txt -p ./references/draft.fa.size -t 50 -D juicer/CPU ##generate the hic input file draft.hic for viewing in juicebox 3d-dna/visualize/run-assembly-visualizer.sh draft.assembly merged_nodups.txt For more instructions, please refer to the help pages of juicer and juicebox.` — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2023-10-26T11:43:47.000Z

@cyycyj Thank you for your feedbacks on using EndHiC. The terms "contig.fa" and "draft.fa" refer to the same file of original contig assembly, for example the primary contig generated by hifiasm.

Answer 3 · 2023-10-27T07:36:31.000Z

Dear Prof. Fan and Wangsen,

Thank you for your help, and I'll try it one more time. By the way, does the endhic has contig break or check function? I found that hifiasm assemble a quite long contig (about 1.5 times of the longest chromosome, you can also find detail in formation here: https://github.com/chhylp123/hifiasm/issues/546#issue-1964635479 ), maybe break it will be right.

Best regards

Answer 4 · 2023-10-27T09:27:14.000Z

By the way, maybe the juicer pipeline you wrote in the tutorial is based on juicer version 1.x? I recommend adding a tip about -- assembly parameter, it is necessary for juicer 2.0 to generate merged_nodups file.

Answer 5 · 2023-10-29T09:55:46.000Z

The contig break function by Hi-C data is not very convincible. So if contig error exists in Hifiasm result, I suggest you break the contig based on multiple information, such as the graph structure of Hifiasm, Hi-C heatmap within contig, etc. ***@***.*** From: Andrew Date: 2023-10-27 15:36 To: fanagislab/EndHiC CC: fanwei; Comment Subject: Re: [fanagislab/EndHiC] endhic to juicebox (Issue #7) Dear Prof. Fan and Wangsen, Thank you for your help, and I'll try it one more time. By the way, does the endhic has contig break or check function? I found that hifiasm assemble a quite long contig (about 1.5 times of the longest chromosome, you can also find detail in formation here: chhylp123/hifiasm#546 (comment) ), maybe break it will be right. Best regards — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>