HannesStark/3DInfomax

generating a fingerprint

rohanvarm opened this issue · 6 comments

Hi,
I find the code a bit difficult to read, is there an easy way to generate the embedding as a fingerprint ?

Hi Rohan,
You could use the uploaded GNN weights that were obtained by pre-training on GEOM-Drugs
To do so, you could add a PyTorch dataset with the molecules of which you want to generate fingerprints to the datasets directory.
Then you can use a config file like tune_QM9_homo.yml where you set eval_on_test: True, dataset: 'ClassNameOfDataset , and num_epochs: 0 such that you directly run the model on the test molecules.

@rohanvarm I took a bit of a different route than @HannesStark is suggesting, but with his help I got a working solution:

  1. You can adapt the QMDataset class in 3DInfomax/datasets/qm9_dataset.py. I stripped all functionality that was not needed for inference and made it possible to provide a custom list of SMILES strings. You can see my implementation of this class in my fork of this repo
  2. Using a similar process, I adapted the load_model() function in train.py and finally created a little CLI using Click that loads a list of SMILES from a .npy file, creates a dataset and feeds the datapoints through the model. You can see all of that here.
  3. I still need to figure out how to get the fingerprints from this, but I think we should be able to change the forward() method of the PNA class to do so. I'll look into that next and can let you know if I figure it out.

Please note that this code assumes you use the provided checkpoint. For other models I might have stripped too much functionality. At the same time, I am not 100% sure if any more code could be removed, so the code could possibly be further simplified / made more efficient.

@HannesStark We could consider merging this back to your repo and write a bit about it in the README once finished? Let me know if that would interest you! (right now my fork is too different I think because I restructured it a bit to my liking, but it should be easy to just merge the relevant files once they're done)

but I think we should be able to change the forward() method of the PNA class to do so

This is exactly what I ended up doing. You can see the changes here. This should be a non-breaking, backward-compatible change. It should could be cleaned up a bit more, but this is the gist of it! 🙂

Thank you @cwognum !
I will make some changes soon such that the finger print extraction is easier!

I now made it a bit easier:
Just place your SMILES into the file dataset/inference_smiles.txt and run

python inference.py --config=configs_clean/fingerprint_inference.yml

Your fingerprints are saved as pickle file into the dataset_directory

And in the config file you can specify different pre-trained models if you want.