a-r-j/ProteinWorkshop

4v8m not found in raw directory. When processing go-bp dataset

Opened this issue · 1 comments

Hello, when running classification task on go-bp dataset, it gives an error:
FileNotFoundError: 4v8m not found in raw directory. Are you sure it's downloaded and has the format pdb?

with the format=pdb(cause the mmtf doesn't work)

I checked the pdb site: it says for large graphs pdb file is not available
截屏2024-07-23 14 42 09

Is there any way to work around this?

Hi @yangzhang33, thanks for flagging. This is a little tricky. I'd suggest removing that example from the dataset for now. If you're keen to include it I think you can download the mmcif, extract the relevant chains and write them to a PDB file (i.e. using BioPandas). I don't think it's possible to get the structure in pdb format otherwise.