couldn't import protein sequence and correctly display

Question

couldn't import protein sequence and correctly display

mega-bisharp opened this issue a year ago · 7 comments

my protein sequence from ncbi

AXN76052.1
MRCMSELVVFKANELAISRYDLTEHETKLILCCVALLNPTIENPTRKERTVSFTYNQYAQMMNISRENAYGVLAKATRELMTRTVEIRNPLVKGFEIFQWTNYAKFSSEKLELVFSEEILPYLFQLKKFIKYNLEHVKSFENKYSMRIYEWLLKELTQKKTHKANIEISLDEFKFMLMLENNYHEFKRLNQWVLKPISKDLNTYSNMKLVVDKRGRPTDTLIFQVELDRQMDLVTELENNQIKMNGDKIPTTITSDSHLHNGLRKTLHDALTAKIQLTSFEAKFLSDMQSKYDLNGSFSWLTQKQRTTLENILAKYGRI
result:

problem:

I think the type in here maybe "PROTEIN", not "DNA"

@tnrich

Answer 1 · 2023-08-22T15:19:08.000Z

@mega-bisharp can you please attach the file as a ZIP file here ? Thanks!

Answer 2 · 2023-08-23T01:12:15.000Z

@mega-bisharp can you please attach the file as a ZIP file here ? [Thanks!]
This is my protein sequences, thank you for your reply!
seqdump.zip
@tnrich

Answer 3 · 2023-08-23T19:54:53.000Z

@mega-bisharp that file doesn't have a file extension that would indicate that it is a protein. We would need to guess based on the sequence content which can sometimes be risky..

Answer 4 · 2023-08-23T19:55:25.000Z

Also this is the new repo that OVE lives in - https://github.com/TeselaGen/tg-oss

Answer 5 · 2023-08-24T01:07:28.000Z

@mega-bisharp该文件没有表明它是蛋白质的文件扩展名。我们需要根据序列内容进行猜测，这有时可能是有风险的。

So, what's the correctly file extension for protein sequence? Thank you for your answer！
@tnrich

Answer 6 · 2023-08-28T18:47:28.000Z

@mega-bisharp I believe the format you're looking for is .faa since your data is in the fasta format:

I'll actually need to update the code here https://github.com/TeselaGen/tg-oss/ (that's the new repo for ove/bio-parsers, this on is deprecated now) in order to handle .faa files correctly. I'll do that now

Answer 7 · 2023-08-28T18:55:42.000Z

@mega-bisharp ok, I've updated @teselagen/ove to v0.3.11 which should include automatic parsing of .faa files to protein. Let me know if that works for you :)