shenwei356/bio

Unable to parse protein fasta file.

srynobio opened this issue · 2 comments

I'm guessing your original design was focused on DNA sequence. Do you plan on supporting amino acid encoded fasta files?

panic: error when parsing seq: NP_000572.2 (seq: invalid Protein letter: U)

goroutine 1 [running]:
main.digestFasta(0x7ffeefbffb17, 0x2b, 0x18)
	/Users/srynobio1/go/src/github.com/srynobio/vmccl/vmccl.go:80 +0x38c
main.main()
	/Users/srynobio1/go/src/github.com/srynobio/vmccl/vmccl.go:28 +0x531

U seems a new ambiguity code? Oh, here I didn't see it.

Here is the alphabet this package uses.

Can you paste the protein sequence here?

        1 mcaarlaaaa aaaqsvyafs arplaggepv slgslrgkvl lienvaslUg ttvrdytqmn
       61 elqrrlgprg lvvlgfpcnq fghqenakne eilnslkyvr pgggfepnfm lfekcevnga
      121 gahplfaflr ealpapsdda talmtdpkli twspvcrndv awnfekflvg pdgvplrrys
      181 rrfqtidiep dieallsqgp sca

I see, it's selenocysteine (U), I also add the pyrrolysine (O) to the protein alphabet.