Invalid characters in fasta sequence
Opened this issue · 0 comments
srilekha1993 commented
Hi,
After running esm_embedding_preparation.py getting the following output in fasta file
<80>^C}q^@(X^N^@^@^@4d3i_1_chain_0q^AXk^A^@^@MEEKEILWNEAKAFIAACYQELGKAAEVKDRLADIKSEIDLTGSYVHTKEELEHGAKMAWRNSNRCIGRLFWNSLNVIDRRDVRTKEEVRDALFHHIETATNNGKIRPTITIFPPEEKGEKQVEIWNHQLIRYAGYESDGERIGDPASCSLTAACEELGWRGERTDFDLLPLIFRMKGDEQPVWYELPRSLVIEVPITHPDIEAFSDLELKWYGVPIISDMKLEVGGIHYNAAPFNGWYMGTEIGARNLADEKRYDKLKKVASVIGIAADYNTDLWKDQALVELNKAVLHSYKKQGVSIVDHHTAASQFKRFEEQAEEAGRKLTGDWTWLIPPISPAATHIFHRSYDNSIVKPNYFYQDKPY
Which contains some invalid characters which are not processed by scripts/extract.py for generating the embedding using esm . Can anyone please tell me how to resolve this issue?