BioPandas/biopandas

amino3to1 in protein-protein complexes

karlafej opened this issue · 3 comments

Some pdb files consist of different protein chains with different amino acid sequence, for example 5mtn. It would be great if amino3to1 took this into account and returned something like dictionary of chain_ids and corresponding series of 1-letter codes.

At the moment, amino3to1 for 5mtn returns
SLEPEPWFFKNLSRKDAERQLLAPGNTHGSFLIRESESTAGSFSLSVRDFDQGEVVKHYKIRNLDNGGFYISPRITFPGLHELVRHYTSVSSST

although the residues in the pdb file are

>5mtn.pdb chain A 
 SLEPEPWFFK NLSRKDAERQ LLAPGNTHGS FLIRESESTA GSFSLSVRDF DQGEVVKHYK
 IRNLDNGGFY ISPRITFPGL HELVRHYT

>5mtn.pdb chain B 
 SVSSVPTKLE VVAATPTSLL ISWDAPAVTV VYYLITYGET GSPWPGGQAF EVPGSKSTAT
 ISGLKPGVDY TITVYAHRSS YGYSENPISI NYRT
rasbt commented

Thanks for pointing this out @karlafej !
I haven't worked with multi-chain proteins in recent projects and completely forgot to include them in the test cases, which should definitely be addressed like you said. I just see that there's another problem in the current implementation since it assumes unique residue numbers, which is a bad assumption for multi-domain cases ... I will fix that :).

About the returned values from the amino3to1 function. I think a dictionary could be a good idea, like you suggested, but I would favor returning a list of string sequences to preserve the order in which the chains appear in the PDB flle.

For example, for 5mtn is would return

['SLEPEPWFFK...', 'SVSSVPTKLE...']

and the chain ideas could be obtained via

pdb.df['ATOM']['chain_id'].unique()

if desired. For instance, one could iterate

for sequence, chain_id in zip(amino3to1_results, pdb.df['ATOM']['chain_id'].unique()):
    # do something

Alternatively, amino3to1 could return a list of tuples

[('A', 'SLEPEPWFFK...'), ('B', 'SVSSVPTKLE...')]

Any thoughts?

rasbt commented

I just updated the amino3to1 method (in #22 ) to return a DataFrame containing a column for the chain id. I think that this would be the most universally useful way. Let me know if you have any feedback or suggestions on this.
(I added some usage examples here)

Thank you!