marbl/meryl

feature request: filter reads with kmers from bam file?

kevfengler227 opened this issue · 5 comments

Would it be possible to add a feature to run meryl-lookup exclude/include on a BAM instead of a fasta and output BAM? This would be very useful for filtering reads from PacBio or ONT data in their original BAM format without going through FASTA intermediary. Or at least just output a list of reads from the fasta instead of generating the filtered fasta?

Thanks,
KF

To that end, does meryl-lookup find homo-polymer compressed kmers in the reads when the database is made with compressed kmers?

It appears that is does not. for removal of long reads this may be very beneficial.

Both are excellent suggestions, and the tools are in dire need of a refresh. We'll (hopefully) get it done late winter/early spring.

BAM support shouldn't be too hard.

Compressed kmer support needed a bit more engineering effort than I wanted to put into the current version, but will definitely be in the next version.

Thanks! Even in it's current form, meryl is a godsend for identifying unique kmers from a target sequence and removing reads that contain those kmers. But these two enhancements would be awesome.

I would still be very much interested in these two enhancements. Looking forward to next version.