filter strand invasion

Question

filter strand invasion

Lixinyoung opened this issue 10 months ago · 4 comments

Hello!
Very important tool. When I check your code, I think the first step to filter the strand invasion in function_getreads, you should take strandness into consideration, just as your second step(SCAFE method) to remove strand invasion. Like following:

if generefdf.loc[geneid]['Strand']=='+':
    reads1_umi=[r for r in reads1_umi if editdistance.eval(fastqFile.fetch(start=r.reference_start-14, end=r.reference_start-1, region='chr'+str(self.generefdf.loc[geneid]['Chromosome'])),'TTTCTTATATGGG') >3 ]
elif generefdf.loc[geneid]['Strand']=='-':
    reads1_umi=[r for r in reads1_umi if editdistance.eval(fastqFile.fetch(start=r.reference_end, end=r.reference_end+13, region='chr'+str(self.generefdf.loc[geneid]['Chromosome'])),'CCCATATAAGAAA') >3 ]

Answer 1 · 2023-12-23T14:20:56.000Z

By the way, I wonder how can I use Squence Features(Convolution Model) to filter false positive clusters? I didn't find the corresponding parameters.

Answer 2 · 2023-12-24T08:45:32.000Z

Yes, thank you very much to point out this error! I agree with you!
For the sequence features, we did not embed it to the CamoTSS for convenient. Because this feature model has already been good performance! But we have the pretrained CNN model on the PBMC dataset. I will submit it to the model folder later.
Thanks!
Ruiyan

Answer 3 · 2023-12-24T09:04:37.000Z

Thanks for the prompt reply. I understand :)

Answer 4 · 2023-12-24T09:14:32.000Z

Hi xinyoung,

I submit the pretrained sequence model to the CamoTSS/model folder and also attach the code use this pretrained model on test data in the notebook/use_pretrained_sequence.py.

Best regards,
Ruiyan