generating a fingerprint
rohanvarm opened this issue · 6 comments
Hi,
I find the code a bit difficult to read, is there an easy way to generate the embedding as a fingerprint ?
Hi Rohan,
You could use the uploaded GNN weights that were obtained by pre-training on GEOM-Drugs
To do so, you could add a PyTorch dataset with the molecules of which you want to generate fingerprints to the datasets
directory.
Then you can use a config file like tune_QM9_homo.yml
where you set eval_on_test: True
, dataset: 'ClassNameOfDataset
, and num_epochs: 0
such that you directly run the model on the test molecules.
@rohanvarm I took a bit of a different route than @HannesStark is suggesting, but with his help I got a working solution:
- You can adapt the
QMDataset
class in3DInfomax/datasets/qm9_dataset.py
. I stripped all functionality that was not needed for inference and made it possible to provide a custom list of SMILES strings. You can see my implementation of this class in my fork of this repo - Using a similar process, I adapted the
load_model()
function intrain.py
and finally created a little CLI using Click that loads a list of SMILES from a.npy
file, creates a dataset and feeds the datapoints through the model. You can see all of that here. - I still need to figure out how to get the fingerprints from this, but I think we should be able to change the
forward()
method of the PNA class to do so. I'll look into that next and can let you know if I figure it out.
Please note that this code assumes you use the provided checkpoint. For other models I might have stripped too much functionality. At the same time, I am not 100% sure if any more code could be removed, so the code could possibly be further simplified / made more efficient.
@HannesStark We could consider merging this back to your repo and write a bit about it in the README once finished? Let me know if that would interest you! (right now my fork is too different I think because I restructured it a bit to my liking, but it should be easy to just merge the relevant files once they're done)
but I think we should be able to change the forward() method of the PNA class to do so
This is exactly what I ended up doing. You can see the changes here. This should be a non-breaking, backward-compatible change. It should could be cleaned up a bit more, but this is the gist of it! 🙂
Thank you @cwognum !
I will make some changes soon such that the finger print extraction is easier!
I now made it a bit easier:
Just place your SMILES into the file dataset/inference_smiles.txt
and run
python inference.py --config=configs_clean/fingerprint_inference.yml
Your fingerprints are saved as pickle file into the dataset_directory
And in the config file you can specify different pre-trained models if you want.