Two ideas on optimization
Opened this issue · 1 comments
shelkmike commented
For large eukaryotic genomes, the file overlap.paf may be very large. I think, VeChat can be optimized in two ways to deal with this:
- Instead of making overlap.paf, it can make overlap.paf.gz . This can be achieved by compressing the output of fpa with " | gzip -1 >". Racon can take gzipped files with overlaps as input.
- It's probably worth to add a parameter that sets the minimum overlap length. If reads' N50 is, for example, 20 kbp, the minimum overlap can be safely raised from the default 500 bp to, for example, 5000 bp. It will not only decrease the size of the paf file, but also probably accelerate the error correction by avoiding consideration of short overlaps.
asan-emirsaleh commented
Hello @HaploKit ! Vechat is a super tool that greatly helps to resolve some complex cases, but is computation- and space-epensive. Is gzipped input considered to be implemented?