comprna/RATTLE

Stranding Information Format for --rna Clustering

danphillips28 opened this issue · 1 comments

Dear Rattle developers,

I have been very excited to learn rattle and have been having a nice time playing around with it. However, when I initially got the clustering to work (on my relatively large file of cDNA reads: 16G and >10M reads) I noticed it was taking a very long time >10 weeks (bench-marking with files of up to 1M reads however ran without issue and in decent time).

I have reoriented my reads using PYCHOPPER, so that I can run clustering in --rna mode and save rattle checking both strands, making things quicker. However, after adding stranding information my file which used to run on clustering is now returning the classic error "terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc". (Note, this happens instantly after rattle has read-in the file, and also that I have filtered <150bp reads.) Same input file, same code, only difference is the addition of either strand=+ or strand=-. I was hoping you could take a look at the formatting of my input file and let me know if anything stands out. Below is a read from my file, with hidden characters shown.

@115:753|8dba43f8-5a3c-4e42-85d3-9df6cd15dd90 runid=56ac9bfdb573abcf8f7248525869f8463f05c360 sampleid=Morex_PolyA_190619 read=230084 ch=231 start_time=2019-06-20T04:17:59Z strand=-$
GGGGCTTTTTCTTCAGACAGACATAGCTTGAGAGAGAGAGCGAGAGCGCGAGCTTAGAGGTAATATTTTCTGGCAGCCTCCATGATCACCTCGCCGATTGTGGCGCTGACGAGCCTGCTCGTCCCTCTCCCGCCGGGCCTCCTTCGCCGTCGTCTGCAGCGGTGGCGGCGAAGATCAAGGTCGACAAGCCCCTCGGACTCGAGGCGGCTTGACCGTCGACATCGACGCCAACGGCAGGGAAGGTCGGCAAGAAGGGTGTCTACCAGTTTGTTGACAAGTACGGCGCCAACGTCGATACGCAGCCCAATCTACACGCCAGAGGAATGGTCCGAATCTGGTGACCGCTACGCCGGTGGAACGACCGGGCTTCTTATCTGGGCCGTCACCCTCGCCGGCCTGCCTGGCAGCGGCGCCCTCCTCGTCTACAACACCAGCGCTTCCGCCGGCTAAGAAAGCATCTACCTGTAACACGGGGCCGAGTTTCTACTCTGTAATCTACGTAGCTACCATGTGTATGTATGTCACCTAACGATGCAAAGTAATTCATCCTGAACAGATGGTTTGTATGCAACGCTGTTACTATACACTTCCTACCTGGTAAATGTATAATCGGTGGACATTAAAAAAAAAAAAAAAAA$
+$

Please let me know if you have any thoughts.
Thanks,
Dan