filter strand invasion
Lixinyoung opened this issue · 4 comments
Hello!
Very important tool. When I check your code, I think the first step to filter the strand invasion in function_getreads
, you should take strandness into consideration, just as your second step(SCAFE method) to remove strand invasion. Like following:
if generefdf.loc[geneid]['Strand']=='+':
reads1_umi=[r for r in reads1_umi if editdistance.eval(fastqFile.fetch(start=r.reference_start-14, end=r.reference_start-1, region='chr'+str(self.generefdf.loc[geneid]['Chromosome'])),'TTTCTTATATGGG') >3 ]
elif generefdf.loc[geneid]['Strand']=='-':
reads1_umi=[r for r in reads1_umi if editdistance.eval(fastqFile.fetch(start=r.reference_end, end=r.reference_end+13, region='chr'+str(self.generefdf.loc[geneid]['Chromosome'])),'CCCATATAAGAAA') >3 ]
By the way, I wonder how can I use Squence Features(Convolution Model) to filter false positive clusters? I didn't find the corresponding parameters.
Yes, thank you very much to point out this error! I agree with you!
For the sequence features, we did not embed it to the CamoTSS for convenient. Because this feature model has already been good performance! But we have the pretrained CNN model on the PBMC dataset. I will submit it to the model folder later.
Thanks!
Ruiyan
Thanks for the prompt reply. I understand :)
Hi xinyoung,
I submit the pretrained sequence model to the CamoTSS/model folder and also attach the code use this pretrained model on test data in the notebook/use_pretrained_sequence.py.
Best regards,
Ruiyan