I want to generate unnatural protein sequences from a gene family using Progen2
Seaxingzhou opened this issue · 1 comments
I want to generate unnatural protein sequences from a gene family using Progen2, but I find that there are no protein family keywords or taxIDs in the Progen2/tokenizer.json file.
Could you provide a complete tokenizer.json file (containing family keywords and taxIDs, etc.) just like the one in the mapping_files/ folder from "https://doi.org/10.5281/zenodo.7296780"?
Also, can the keywords dictionary from mapping_files/ be used in Progen2 models? (I find that "<|bos|>":1 in "Progen2/tokenizer.json" but 1: '2Fe-2S' in "mapping_files/kw_to_name.p2") I look forward to your reply.
Hello,
I want to generate the same kind of proteins by providing keywords.
I am looking forward to your reply to this issue.
Thank you!