salesforce/progen

I want to generate unnatural protein sequences from a gene family using Progen2

Seaxingzhou opened this issue · 1 comments

I want to generate unnatural protein sequences from a gene family using Progen2, but I find that there are no protein family keywords or taxIDs in the Progen2/tokenizer.json file.
Could you provide a complete tokenizer.json file (containing family keywords and taxIDs, etc.) just like the one in the mapping_files/ folder from "https://doi.org/10.5281/zenodo.7296780"?
Also, can the keywords dictionary from mapping_files/ be used in Progen2 models? (I find that "<|bos|>":1 in "Progen2/tokenizer.json" but 1: '2Fe-2S' in "mapping_files/kw_to_name.p2") I look forward to your reply.

Hello,

I want to generate the same kind of proteins by providing keywords.
I am looking forward to your reply to this issue.

Thank you!