Handling multi-PDB files

Question

Handling multi-PDB files

rasbt opened this issue 6 years ago · 0 comments

I am cross-posting a discussion from the mailing list with regard to multi-PDB files containing MODEL & ENDMDL tags, which are currently not handled by BioPandas.

However, it should definitely be handled in one way or the other. Currently, I don't have any best idea on how to handle that and would welcome and thoughts and feedback (let me cross-post that on the GitHub issue tracker -- maybe better to continue the discussion about potential ways to implement it there).

I think one of the problems with the DataFrame format is that having them all in one DataFrame would probably result in a lot of weird -- or unexpected -- results, thus it would probably best to separate the structures one way or the other ...

One option would be to provide a utility function (analogous to the split_multimol2 function, http://rasbt.github.io/biopandas/tutorials/Working_with_MOL2_Structures_in_DataFrames/#parsing-multi-mol2-files) that generates multiple PandasPdb objects from such a file. I.e., it would simply be a list

pdbs = [pdb_1, pdb_2, .... pdb_n]

which would preserve the current functionality of the library without any e.g., backwards-incompatible changes. This would then also help with using the multiprocessing library more easily and efficiently for the analysis of multiple PandasPdb objects in parallel.

Right now, the PandasPdb objects have a dictionary containing multiple DataFrames
dict_keys(['ATOM', 'HETATM', 'ANISOU', 'OTHERS'])

For multi-PDB files, the dictionary could be expanded to

dict_keys(['ATOM_1', 'HETATM_1', 'ANISOU_1', 'OTHERS_1', 'ATOM_2', 'HETATM_2', 'ANISOU_2', 'OTHERS_2', ...])

I strongly favor scenario 1) though; however, I would love to hear feedback on this and are open to other suggestions!

In any case, also an error (or at least a warning) should be raised if MODEL & ENDMDL tags are found in a PDB file if the current read_pdb method is used such that this doesn't lead to any unexpected behavior.