Design complexes with unknown chains (proposed fix included)
SimonCrouzet opened this issue · 1 comments
Hello,
Thank you for your work on ProteinMPNN!
I encountered a bug when I was trying to run the pipeline on a sample with several short unknown chains (like 'P' = 'XXXX'
). Those chains were not part of the chains I wanted to design, but I still had to remove them from my chain_id_dict
and fixed_positions_dict
to avoid a subsequent bug (where the pipeline was trying to compile a score from an empty sequence).
However, removing those, I encountered another bug from protein_mpnn_utils.py
: at line 381, omit_AA_mask[i,] = omit_AA_mask_pad
was raising an error due to a mismatch between shapes ( (S, 21) against (S, 56), with S the length of the full sequence).
I realized the bug was coming from the line 378, where omit_AA_mask_pad = np.pad(np.concatenate(omit_AA_mask_list,0), [[0, L_max-l]], 'constant', constant_values=(0.0, ))
was expanding the 2D array of L_max - l
on both dimensions, while the second one has to be constant.
I then modified the line to be omit_AA_mask_pad = np.pad(np.concatenate(omit_AA_mask_list,0), [[0, L_max-l],[0, 0]], 'constant', constant_values=(0.0, ))
.
I created a pull request, see #87
Hope this report can be of any use,
Best,
Thank you for your solution, now I believe my modification is correct :)