Unable to parse protein fasta file.
srynobio opened this issue · 2 comments
srynobio commented
I'm guessing your original design was focused on DNA sequence. Do you plan on supporting amino acid encoded fasta files?
panic: error when parsing seq: NP_000572.2 (seq: invalid Protein letter: U)
goroutine 1 [running]:
main.digestFasta(0x7ffeefbffb17, 0x2b, 0x18)
/Users/srynobio1/go/src/github.com/srynobio/vmccl/vmccl.go:80 +0x38c
main.main()
/Users/srynobio1/go/src/github.com/srynobio/vmccl/vmccl.go:28 +0x531
shenwei356 commented
U
seems a new ambiguity code? Oh, here I didn't see it.
Here is the alphabet this package uses.
Can you paste the protein sequence here?
1 mcaarlaaaa aaaqsvyafs arplaggepv slgslrgkvl lienvaslUg ttvrdytqmn
61 elqrrlgprg lvvlgfpcnq fghqenakne eilnslkyvr pgggfepnfm lfekcevnga
121 gahplfaflr ealpapsdda talmtdpkli twspvcrndv awnfekflvg pdgvplrrys
181 rrfqtidiep dieallsqgp sca
shenwei356 commented
I see, it's selenocysteine (U
), I also add the pyrrolysine (O
) to the protein alphabet.