feature request - NanoLyse and save?

Question

feature request - NanoLyse and save?

devonorourke opened this issue 7 years ago · 10 comments

Hey Wouter,

I have a couple of question about trying to run the NanoLyse program in a slightly different way...

I'd imagine that the intended purpose was as follows:

Start with a fresh flow cell. Run through the 1D Lambda control experiment and load that library on the fresh flow cell for a few hours. If things look fine, stop the run, wash the flow cell, and get it ready for the next library.
Using that same washed flow cell, construct a second library using your typical source material, and collect your data.
Use NanoLyse to extract out the Lambda reads from your second library. Smile that other folks have generated very useful tools.

Now a slight modification of what's going on - let's say I wanted to know how many Lambda reads were present in my run - would there be a way to retain the reads that map to the Lambda reference? The goal here wouldn't necessarily require generating a lambda.fastq.gz file, rather, the goal would be simply to count the number of reads that mapped to the lambda reference to know exactly how much of your washed flow cell continued to contain lambda DNA carried over. I guess I'm interested in this feature as it will help me test how well the washing is working in general.

You might also imagine for folks that are doing barcoding experiments it might be useful to know how these proportions change if they load a single barcoded library on a flow cell first, then wash it then load a second barcoded library on the same flow cell next, etc.

Answer 1 · 2017-12-18T05:22:54.000Z

Hi devonorourke,

The intended purpose was mainly to filter out the ONT lambda DNA control fragment (DCS), but your application is also a nice use case. So you would like to get the number of reads that aligned to the lambda/contaminant genome? That would definitely be feasible to add.

Cheers,
Wouter

Answer 2 · 2017-12-18T07:40:30.000Z

A simple samtools view piped to wc will give this if you send results to the stdout io file:-)

Answer 3 · 2017-12-19T07:43:14.000Z

After filtering this file is a fastq file, and the lambda reads are just discarded. So samtools won't be very useful here I think. I can increment a count for every lambda read identified and report that back to the user, is that okay @devonorourke?

Answer 4 · 2017-12-19T08:22:11.000Z

Hi Wout, When you map you usually get mappings for all reads including not mapped and only need to filter the bam with samtools view -f INT only include reads with all of the FLAGs in INT present [0] = mapped -F INT only include reads with none of the FLAGS in INT present [0] = unmapped My own spikefilter tool allows mapping and keep or filter the lambda in two separate runs (https://github.com/Nucleomics-VIB/nanopore-tools <https://github.com/Nucleomics-VIB/nanopore-tools>) Best Stephane <http://www.nucleomics.be/> Stephane Plaisance, Ph.D Address: Herestraat 49, O&N4, Post Box 816, Room nr. 404-24 / 08.428, 3000 Leuven - Belgium Tel: +32 (0)16 37 31 26 Lync: +32 (0)16 32 00 60 Fax: +32 (0)16 37 31 29 Web: www.nucleomics.be <http://www.nucleomics.be/>

…

On 19 Dec 2017, at 08:43, Wouter De Coster ***@***.***> wrote: After filtering this file is a fastq file, and the lambda reads are just discarded. So samtools won't be very useful here I think. I can increment a count for every lambda read identified and report that back to the user, is that okay @devonorourke <https://github.com/devonorourke>? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA0ZlIRC-tDhOi6ZU38wAjI4CayKP2OZks5tB2kSgaJpZM4RD6Yg>. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/wdecoster/nanolyse","title":"wdecoster/nanolyse","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in ***@***.*** in #2: After filtering this file is a fastq file, and the lambda reads are just discarded. So samtools won't be very useful here I think. I can increment a count for every lambda read identified and report that back to the user, is that okay @devonorourke?"}],"action":{"name":"View Issue","url":"#2 (comment)"}}}

Answer 5 · 2017-12-19T08:27:21.000Z

Yes, and bbsplit could be used to align to multiple genomes separately. Plenty of options.

But since the output of NanoLyse is just a fastq file and not a bam you can't use samtools for counting.

Answer 6 · 2017-12-19T08:30:15.000Z

Yes but ;-) In my script I convert the BAM to fastQ and apply the filter there, do you not map first to BAM before creating fatsQ similarly? S <http://www.nucleomics.be/> Stephane Plaisance, Ph.D Address: Herestraat 49, O&N4, Post Box 816, Room nr. 404-24 / 08.428, 3000 Leuven - Belgium Tel: +32 (0)16 37 31 26 Lync: +32 (0)16 32 00 60 Fax: +32 (0)16 37 31 29 Web: www.nucleomics.be <http://www.nucleomics.be/>

…

On 19 Dec 2017, at 09:27, Wouter De Coster ***@***.***> wrote: Yes, and bbsplit <http://seqanswers.com/forums/showthread.php?t=41288> could be used to align to multiple genomes separately. Plenty of options. But since the output of NanoLyse is just a fastq file and not a bam you can't use samtools for counting. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA0ZlMXwTl4kC3dxM5DlETO4Gen7M8yQks5tB3NqgaJpZM4RD6Yg>. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/wdecoster/nanolyse","title":"wdecoster/nanolyse","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in ***@***.*** in #2: Yes, and [bbsplit](http://seqanswers.com/forums/showthread.php?t=41288) could be used to align to multiple genomes separately. Plenty of options. \r\n\r\nBut since the output of NanoLyse is just a fastq file and not a bam you can't use samtools for counting."}],"action":{"name":"View Issue","url":"#2 (comment)"}}}

Answer 7 · 2017-12-19T08:37:13.000Z

NanoLyse checks if an alignment exists between the read and the lambda genome (or user-specified genome) and if no alignments are found it writes the read to stdout. So there is no sam/bam stage - just does-it-map without writing to a file.

Answer 8 · 2017-12-21T12:20:21.000Z

I added a counter with the number of reads removed in v0.5.0 (available on PyPI). Please let me know if you have more feedback.

Answer 9 · 2017-12-21T12:31:11.000Z

Great - any need to pass an additional argument to activate the counter, or is that operational by default?
Thanks!

Answer 10 · 2017-12-21T12:38:32.000Z

It's enabled by default and will print to stderr (and as such not interfere with redirecting the output of NanoLyse to a file or pipe).