wdecoster/nanolyse

feature request - NanoLyse and save?

devonorourke opened this issue · 10 comments

Hey Wouter,

I have a couple of question about trying to run the NanoLyse program in a slightly different way...

I'd imagine that the intended purpose was as follows:

  1. Start with a fresh flow cell. Run through the 1D Lambda control experiment and load that library on the fresh flow cell for a few hours. If things look fine, stop the run, wash the flow cell, and get it ready for the next library.
  2. Using that same washed flow cell, construct a second library using your typical source material, and collect your data.
  3. Use NanoLyse to extract out the Lambda reads from your second library. Smile that other folks have generated very useful tools.

Now a slight modification of what's going on - let's say I wanted to know how many Lambda reads were present in my run - would there be a way to retain the reads that map to the Lambda reference? The goal here wouldn't necessarily require generating a lambda.fastq.gz file, rather, the goal would be simply to count the number of reads that mapped to the lambda reference to know exactly how much of your washed flow cell continued to contain lambda DNA carried over. I guess I'm interested in this feature as it will help me test how well the washing is working in general.

You might also imagine for folks that are doing barcoding experiments it might be useful to know how these proportions change if they load a single barcoded library on a flow cell first, then wash it then load a second barcoded library on the same flow cell next, etc.

Hi devonorourke,

The intended purpose was mainly to filter out the ONT lambda DNA control fragment (DCS), but your application is also a nice use case. So you would like to get the number of reads that aligned to the lambda/contaminant genome? That would definitely be feasible to add.

Cheers,
Wouter

After filtering this file is a fastq file, and the lambda reads are just discarded. So samtools won't be very useful here I think. I can increment a count for every lambda read identified and report that back to the user, is that okay @devonorourke?

Yes, and bbsplit could be used to align to multiple genomes separately. Plenty of options.

But since the output of NanoLyse is just a fastq file and not a bam you can't use samtools for counting.

NanoLyse checks if an alignment exists between the read and the lambda genome (or user-specified genome) and if no alignments are found it writes the read to stdout. So there is no sam/bam stage - just does-it-map without writing to a file.

I added a counter with the number of reads removed in v0.5.0 (available on PyPI). Please let me know if you have more feedback.

Great - any need to pass an additional argument to activate the counter, or is that operational by default?
Thanks!

It's enabled by default and will print to stderr (and as such not interfere with redirecting the output of NanoLyse to a file or pipe).