specificity values > 1 for certain kmers
vineetbansal opened this issue · 1 comments
vineetbansal commented
When using the following kmer file:
id,sequence,pam,chromosome,position,sense
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCT,NGG,unknown,0,+
with guidescan enumerate
(version v2.1.6
) against hg38_noalt.index
(mismatches 3
and alt-pam NAG
), we get the following sam line:
AAGACTGTGCGCTAATCTCT_1 0 unknown 0 100 23M * 0 0 AAGACTGTGCGCTAATCTCTNGG * k0:i:1 k1:i:0 k2:i:0 k3:i:3 of:H:c53bf70d00000000000000000000000092ef3a47ffffffff010000000000000092ef3a47ffffffff020000000000000092ef3a47ffffffff9ba7545c000000009c2c699e000000002afe3c2000000000030000000000000092ef3a47ffffffff sp:f:2.391802
or the following csv lines (succinct
mode):
id,sequence,match_chrm,match_position,match_strand,match_distance,specificity
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr1,234306480,+,0,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr9,12562870,+,3,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr19,3281519,+,3,2.391802
AAGACTGTGCGCTAATCTCT_1,AAGACTGTGCGCTAATCTCTNGG,chr3,49718166,+,3,2.391802
There's clearly something wrong here since specificity is reported > 1.
vineetbansal commented
The actual sequence found in the fna
using grep
is AAGACTGTGCGCTAATCTCTTAG
(i.e. with the alt-pam), indicating that the match
reported in both the csv/sam cases is incorrect (the NGG
was automatically added). All such detected cases of specificity > 1 seem to be with matches that have the NAG
PAM.