how did you determine these parameters?
Huanle opened this issue · 4 comments
Hi Ryan,
Just wonder how did you determine the values of these parameters:
initial_trim_size = 10 trim_increment = 25 stdev_threshold = 20 look_forward_windows = 5 window_count_threshold = 4
in trim_signal.py?
Based on experience?
If i am going to process some direct RNA sequencing data, do i need change these values?
Thanks.
Huanle
Just trial and error - nothing too fancy. That whole function ideally wouldn't have to exist, but some reads start with too much open pore signal, so if I don't trim it off I can miss the barcode signal. Even so, it's a messy process, and I know that it doesn't get it right each time, so training sets can have some bogus data. I'm sure that function could be improved upon. That's why when classifying a read, Deepbinner scans multiple signal windows to look further into the read.
As a side note, are you doing barcoding with direct RNA sequencing? I see on the kit page on ONT's site it says 'Barcoding kits in development'. Are they available?
Ryan
thanks Ryan.
Yes. I am doing barcoding with direct RNA sequencing with the kit you pointed to.
Huanle
Okay, interesting. Something to consider: when we do barcoding with the 1D ligation kits (whole genome DNA), I see a small fraction of reads (maybe about 1% or so) that seem to have the wrong barcode. My hypothesis is that there are unligated barcode sequences left over, and then when the samples are pooled and the adapter is ligated on, some of these barcodes get ligated onto the wrong sample's DNA.
This 'barcode switching' at 1% is probably not a problem for WGS, but could maybe be an issue for transcriptomes. Remember the kerfuffle caused by Illumina barcode issues? Again, I don't know if this will happen in your data - just be wary of it as a possibility.
Ryan