marbl/SALSA

Can software assemble genomes at the chromosome level

Closed this issue · 6 comments

Hello, I found that this software can only assemble the genome to the scaffold level, but can not obtain the genome at the chromosome level. So if we need a genome at the chromosome level, do we need to use, for example, 3d-dna software?
Thank you!

The genome will be scaffolded as much as the HiC data allows, which depends on the HiC input and assembly quality. Salsa is generally conservative but all HiC scaffolding tools are not generally guaranteed to make chromosomes. This is why the VGP uses Salsa followed by curation to finalize the scaffolds and add any missed joins.

Glad to hear from you!
Do you mean that if we have enough data, we can complete genome assembly at the chromosome level?

As you recommended, I now use the process you recommended to process my HIC data ”Arima Genomics“, but the result is not ideal.
When I use the 3.A step to combine two bams it showed that
"The read id's of the two files do not match up at line number 9. Files should be from the same sample and sorted in identical order."
l have tried to solve but still can't run, can you recommend another way to solve this problem or another pipiline to prepare hic reads for salsas?

It's not a matter of "enough data", it depends on the input assembly quality, the library quality (fraction of long-distance links, fraction of cross-chromosome links, etc) as well as other factors.

As for your mapping error, it seems your input files do not have matching pairs of reads. The files you are mapping should have the same reads in the same sorted order which should produce bam files which have the reads in sorted order as well. Check that this is the case for you data.

Hello, I have used samtools merge to create two BAM files and am still trying your software. Thank you very much for your reply.
I have read your article and the introduction of the software. Our current HIC data is 150x and hifi data is 35x, so I think the amount of data is sufficient. I wonder if your software can complete chromosome level assembly like 3D DNA?

SALSA produces chromosome-scale scaffolds when possible, it tends to be be conservative. It's fundamentally not any different than 3D DNA so both produce "chromosome-scale" scaffolds but you're welcome to try scaffolding with both and evaluating the results. I assume you've resolved the issue with your reads being incorrectly paired so I'll close this issue.