Unexpected PAM truncation
RussBainer opened this issue · 2 comments
Hi JP and team, I'm trying to make a new CrisprNuclease
object based on an enzyme that has been shown to have a more permissive pam sequence, which I initially tried to encode by specifying more pams and weights. When I did this, I found that the pams appear to be internally capped at 4:
> pams
[1] "(3/3)ACC" "(3/3)CCC" "(3/3)TCC" "(3/3)GCC" "(3/3)ACA" "(3/3)CCA" "(3/3)TCA" "(3/3)GCA" "(3/3)ACG" "(3/3)CCG" "(3/3)TCG" "(3/3)GCG"
[13] "(3/3)ACT" "(3/3)CCT" "(3/3)TCT" "(3/3)GCT"
> pw
[1] 0.40 0.40 0.40 0.40 0.43 0.43 0.43 0.43 0.32 0.32 0.32 0.32 0.30 0.30 0.30 0.30
>
> eNme2c <- CrisprNuclease("eNme2c",
+ targetType="DNA",
+ pams=pams,
+ weights=pw,
+ metadata=list(description="eNme2c nuclease, Cas9 variant from Neisseria meningitidis"),
+ pam_side="3prime",
+ spacer_length=20)
>
> pams(eNme2c)
DNAStringSet object of length 4:
width seq names
[1] 3 ACA ACA
[2] 3 CCA CCA
[3] 3 TCA TCA
[4] 3 GCA GCA
This does not happen when I try to make a simple Nuclease
object, but is introduced when turn that into a CrisprNuclease
:
> flarg <- Nuclease('Flarg', 'DNA', motifs = pams, weights = pw)
> motifs(flarg)
DNAStringSet object of length 16:
width seq
[1] 3 ACC
[2] 3 CCC
[3] 3 TCC
[4] 3 GCC
[5] 3 ACA
... ... ...
[12] 3 GCG
[13] 3 ACT
[14] 3 CCT
[15] 3 TCT
[16] 3 GCT
> flarg.cn <- new("CrisprNuclease", flarg, pam_side="3prime", spacer_length = as.integer(20))
> pams(flarg.cn)
DNAStringSet object of length 4:
width seq names
[1] 3 ACA ACA
[2] 3 CCA CCA
[3] 3 TCA TCA
[4] 3 GCA GCA
I personally have a workaround for this use case, but I thought I would raise it in case this isn't the functionality you want.
Thanks again for this awesome toolset!
@RussBainer Try with primary=FALSE
@Jfortin1 thanks for the pointer and sorry to be slow responding. After your tip I understand the tooling better and realize that the objects are working as intended. Thanks!
In case others reach this page, the pams()
function only returns the most likely pam sequences by default, and secondary sequences are not included in the findSpacers()
function call unless the canonical=FALSE
flag is set, which one can discover if you carefully RTFM :-). These defaults make sense to me, but led me to confusion when designing sequences for a nuclease with multiple high probability pams.
Thanks as always for the toolset!