IndexError: list index out of range, while created a new protein
velocirraptor23 opened this issue · 6 comments
Hu all,
I got an error when submmited a new protein.
Could help with this please? I have updated the msa, sequence and positions:
LIGAND = "N#CCC(=O)N(CC1)CC@@HN(C)c2ncnc(c23)[nH]cc3" # @param {type:"string"}
SEQUENCE = "RKSPLTLEDFKFLAVLGRGHFGKVLLSEFRPSGELFAIKALKKGDIVARDEVESLMCEKRILAAVTSAGHPFLVNLFGCFQTPEHVCFVMEYSAGGDLMLHIHSDVFSEPRAIFYSACVVLGLQFLHEHKIVYRDLKLDNLLLDTEGYVKIADFGLCKEGMGYGDRTSTFCGTPEFLAPEVLTDTSYTRAVDWWGLGVLLYEMLVGESPFPGDDEEEVFDSIVNDEVRYPRFLSAEAIGIMRRLLRRNPERRLGSSERDAEDVKKQPFFRTLGWEALLARRLPPPFVPTLSGRTDVSNFDEEFTGEAPTLSPPRDARPLTAAEQAAFLDFDFVAGGC" #@param {type:"string"}
TARGET_POSITIONS = "17,28,19,20,23,24,25,91,92,93,94" #@param {type:"string"}
it creates the proteinin first step but then when it creates the paramerts and the complex, it fails.
error:
File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/make_msa_seq_feats_colab.py:98, in process(input_fasta_path, input_msas)
96 parsed_msa, parsed_deletion_matrix, _ = parsers.parse_stockholm(msa)
97 elif custom_msa[-3:] == 'a3m':
---> 98 parsed_msa, parsed_deletion_matrix = parsers.parse_a3m(msa)
99 else: raise TypeError('Unknown format for input MSA, please make sure '
100 'the MSA files you provide terminates with (and '
101 'are formatted as) .sto or .a3m')
102 parsed_msas.append(parsed_msa)
File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/net/data/parsers.py:142, in parse_a3m(a3m_string)
127 def parse_a3m(a3m_string: str) -> Tuple[Sequence[str], DeletionMatrix]:
128 """Parses sequences and deletion matrix from a3m format alignment.
129
130 Args:
(...)
140 the aligned sequence i at residue position j.
141 """
--> 142 sequences, _ = parse_fasta(a3m_string)
143 deletion_matrix = []
144 for msa_sequence in sequences:
File /cluster/ddu/cmmartinez001/Projects/Umol/content/Umol/src/net/data/parsers.py:62, in parse_fasta(fasta_string)
60 elif not line:
61 continue # Skip blank lines.
---> 62 sequences[index] += line
64 return sequences, descriptions
IndexError: list index out of range
Best wishes,
Cesar
Hi,
There seems to be something wrong with the MSA you provided. Please look at it and see if you have empty rows or similar. The MSA has to be a3m.
Hi,
Thanks for your reply, it seems it works now, the problem was at the beginning of the file, I did not have the correct ID, i just reeplace that. Then in the next step I got a nother error. Googling it seems it is an issue with JAX. Not sure, I tought it was about the RAM memory but probably not.
XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 4263510016 bytes.
Thanks a lot,
Cesar
Great 👍
Yes, try cropping the protein sequence if you don't have more resources.
Hi,
While running this, I was wondering if Umol send information I submmit locally to a webserver like ESMFold. Or if everything is in the local installation. Byt he way I just fix the Jax installation and now it works perfect.
Best wishes,
Cesar
Hi,
No nothing is sent locally - you keep all info. ESMfold is only used to visualize the target site before predicting in the Colab, but not used in any way.
Glad to hear that.
Thanks a lot for your prompt respose. I got some questions about how the code works, but I probably will send en email. For now I m going to close this as it is solved.