deepmodeling/Uni-Mol

pocket representations have different dimensions for pair_repr

Opened this issue · 9 comments

Hello,

I tried to generate pocket representations of my own dataset using the code provided in the demo notebook. I got pair representations of different dimensions for different molecules, e.g. [n,n,64], where n is different for different molecules.

On the other hand, when I rerun the demo case, I got the same dimensions for all pair representations.

Could you please help to point out which steps I could have missed?

Thanks!

Is it this pocket repr demo?
If it's convenient, could you provide your code modifications and example files?

Hello @ZhouGengmo

Thanks a lot for the reply.

Everything is fine when I run the unimol_pocket_repr_demo notebook with your provided data.

However, when I run the same code with my own input, I encountered the dimension problem. I suspect it was due to different pocket size, but your input also have difference pocket sizes, so there might be some pre-processing steps that I missed?

My input pdbs are 4jym and 5dj5 removing their bound ligands.

The pocket json is:
{"4jym":["A193","A194","A134","A218","A139","A142","A246","A26","A219","A124","A157","A95"],"5dj5":["A136","A141","A144","A148","A155","A27","A28","A159","A162","A191","A194","A195","A219","A220","A96","A97","A98","A247","A126"]}

The output dimensions I got for 'pair_repr' are: (111, 111, 64) and (172, 172, 64)

However, when I run the same code with my own input, I encountered the dimension problem. I suspect it was due to different pocket size, but your input also have difference pocket sizes,

It is normal for the dimensions to differ. The dimensions of the representation are related to the pocket size. This is also reflected in the example data CASF2016, where not all data have the same dimensions. For instance:

  • PDB ID (in CASF2016): 3nq9, pair_repr_shape: (242, 242, 64)
  • PDB ID (in CASF2016): 5aba, pair_repr_shape: (206, 206, 64)
  • PDB ID (in CASF2016): 3g31, pair_repr_shape: (160, 160, 64)

It is also recommended to use unimol_tools, which are more user-friendly.

Thank you @ZhouGengmo ,

How would you recommend to treat these representations of different dimensions for comparison?

Recommend using the CLS representation to represent the entire pocket. The CLS representations of different pockets have the same dimensions, i.e., mol_repr_cls here.

Hi @ZhouGengmo ,

There doesn't seem to be a pocket representation implementation yet in unimol_tools?

In the meantime, if I continue to use the notebook implementation mentioned above, would 'mol_repr' which provides (512,) dimension output for all pockets the same as "mol_repr_cls"?

If this is the case, does that mean it should be "molecular representation" (mol_repr or mol_repr_cls) annotated in your figure below, instead of "atom representation" for pockets?

If not, could you please help elaborate on the different representation outputs for pockets?

image

In the meantime, if I continue to use the notebook implementation mentioned above, would 'mol_repr' which provides (512,) dimension output for all pockets the same as "mol_repr_cls"?

Yes, in this demo, mol_repr and mol_repr_cls are the same.

If this is the case, does that mean it should be "molecular representation" (mol_repr or mol_repr_cls) annotated in your figure below, instead of "atom representation" for pockets?

This figure is ok. CLS is a special token added before all atoms and is used to represent the whole molecule/pocket. The atom-level representation of Uni-Mol is not included in this demo.

Do you want to use the atom representation? I will add it to this demo ASAP.

Adding the atom representation would be great, Thank you!

Added in this pr #247. You can pull the latest code.