graphcore-research/minimol

Installation and RAM issues

Closed this issue · 5 comments

Could you, please, help me with installation issue? Both "pip install minimol" and installing from cloned github repo gets stuck at building wheels. I'm not sure if there is something wrong with my environment or there is just a minor bug in the repo. It's just stuck forever at
"Building wheels for collected packages: minimol, torch-cluster, torch-scatter, torch-sparse, mup
Building wheel for minimol (setup.py) ... done
Created wheel for minimol: filename=minimol-1.2-py3-none-any.whl size=19051870 sha256=9fc7aadfdc02a8dc9399629c1a464159ebc202e7d064300f0fcbe8f6e3722587
Stored in directory: /tmp/pip-ephem-wheel-cache-12fn32_h/wheels/33/65/61/09658d3eca7687d52dbd401ec31d3d16c99350d4fb938aa60f"

Edit: I attempted installing both on Google Colab and Google GCP platform with same results. I tried "pip install minimol", "pip install git+https://github.com/graphcore-research/minimol.git" and local install from readme. Pip version was 24.1.2
I believe this might be due to fundamental issues with graphium but wanted to ask if anyone else had similar problems.

Update: here is how it can work on Google Colab/GCP:

  1. install these first:
    !pip install torch-sparse torch-cluster torch-scatter -f https://pytorch-geometric.com/whl/torch-2.3.0+cu121.html
  2. then install hydra:
    !pip install hydra-core
  3. then graphium
    !pip install graphium==2.4.7
  4. now it will work
    !pip install minimol

I suspect there is some conflict with torch versions that GCP has trouble figuring out so correct torch/cuda has to be specified manually. Hope it will be useful for others trying to run this in cloud.

Update #2: it works on provided example in the readme but when I try to featurize my molecules (about 200) RAM consumption explodes, crashing the session. Have you ever encountered this issue?

Hey @Khrystofor19, thanks for flagging those issues!

We've noticed the PyPI install can be a bit finicky in some setups. Your workaround is exactly what I'd suggest.

About the RAM problem, I was able to identify it as related to the number of workers used for dataloading in the graph construction phase. I made a few tweaks:

  • Instead of using a fixed number of workers we are now adjusting to what's available on the machine.
  • Both building the graphs and running them through the model are now done in batches.
  • You can now control the batch size yourself. Just use Minimol(batch_size=...) when you set up the model.

Now the memory leakage should be gone and the featurisation should be multiple orders of magnitude faster in Colab.

Let me know if it works on your side.

FYI minimol==1.3.4 @ PyPI has all these changes.

@blazejba Thank you!