xjtu-omics/msisensor-pro

pro scan context_length = 9 failed, shorter `-c` induces systematic errors

ruolin opened this issue · 4 comments

Hello, I am using pro to scan homopolymer sites. When I increase the context length from 5(default) to 9, the homopolymer A results from *_all got dropped. When I use context length 8, however, everything is fine. There is an intrinsic problem with using short context lenght. That's why I increase to 9. If you are interested, we can discuss the problem. But that is for another topic.
The command I use, pro version 1.2.0

msisensor-pro scan  -d $HG19 -o hg19_hp.tsv -l 8 -c 9 -p 1
msisensor-pro pro -d hg19_hp.tsv -t $BAM -c 1 -x 1 -b 4 -o regular -e $BED  -i 0.1 -l 5

This is the problem with using a short context to scan a pattern, in this case the 5-mer (as in the default). When the contexts happen more than 1 time in the read, there is a problem.
Screen Shot 2021-04-21 at 1 14 07 PM

I just check the code. It seems that the context length cannot be large than 8, since you use bit16 to store the context.

bit16_t flankH = 0;
bit16_t flankT = 0;

Hello, I am using pro to scan homopolymer sites. When I increase the context length from 5(default) to 9, the homopolymer A results from *_all got dropped. When I use context length 8, however, everything is fine. There is an intrinsic problem with using short context lenght. That's why I increase to 9. If you are interested, we can discuss the problem. But that is for another topic.
The command I use, pro version 1.2.0

msisensor-pro scan  -d $HG19 -o hg19_hp.tsv -l 8 -c 9 -p 1
msisensor-pro pro -d hg19_hp.tsv -t $BAM -c 1 -x 1 -b 4 -o regular -e $BED  -i 0.1 -l 5

Thx,I will update this in next version and you are welcome to pull a request!

This is the problem with using a short context to scan a pattern, in this case the 5-mer (as in the default). When the contexts happen more than 1 time in the read, there is a problem.

Scan model is difficult to solve this kind of complex regions in genome now, but it has little effect on MSI detection . If you have some ideas, i am very glad to discuss with you!