generating a fingerprint

Question

generating a fingerprint

rohanvarm opened this issue 3 years ago · 6 comments

Hi,
I find the code a bit difficult to read, is there an easy way to generate the embedding as a fingerprint ?

Answer 1 · 2021-12-10T05:28:45.000Z

Hi Rohan,
You could use the uploaded GNN weights that were obtained by pre-training on GEOM-Drugs
To do so, you could add a PyTorch dataset with the molecules of which you want to generate fingerprints to the datasets directory.
Then you can use a config file like tune_QM9_homo.yml where you set eval_on_test: True, dataset: 'ClassNameOfDataset , and num_epochs: 0 such that you directly run the model on the test molecules.

Answer 2 · 2021-12-14T20:55:12.000Z

@rohanvarm I took a bit of a different route than @HannesStark is suggesting, but with his help I got a working solution:

You can adapt the QMDataset class in 3DInfomax/datasets/qm9_dataset.py. I stripped all functionality that was not needed for inference and made it possible to provide a custom list of SMILES strings. You can see my implementation of this class in my fork of this repo
Using a similar process, I adapted the load_model() function in train.py and finally created a little CLI using Click that loads a list of SMILES from a .npy file, creates a dataset and feeds the datapoints through the model. You can see all of that here.
I still need to figure out how to get the fingerprints from this, but I think we should be able to change the forward() method of the PNA class to do so. I'll look into that next and can let you know if I figure it out.

Please note that this code assumes you use the provided checkpoint. For other models I might have stripped too much functionality. At the same time, I am not 100% sure if any more code could be removed, so the code could possibly be further simplified / made more efficient.

@HannesStark We could consider merging this back to your repo and write a bit about it in the README once finished? Let me know if that would interest you! (right now my fork is too different I think because I restructured it a bit to my liking, but it should be easy to just merge the relevant files once they're done)

Answer 3 · 2021-12-14T21:23:05.000Z

but I think we should be able to change the forward() method of the PNA class to do so

This is exactly what I ended up doing. You can see the changes here. This should be a non-breaking, backward-compatible change. It ~~should~~ could be cleaned up a bit more, but this is the gist of it! 🙂

Answer 4 · 2021-12-15T05:39:46.000Z

Thank you @cwognum !
I will make some changes soon such that the finger print extraction is easier!

Answer 5 · 2021-12-28T07:01:55.000Z

I now made it a bit easier:
Just place your SMILES into the file dataset/inference_smiles.txt and run

python inference.py --config=configs_clean/fingerprint_inference.yml

Your fingerprints are saved as pickle file into the dataset_directory

And in the config file you can specify different pre-trained models if you want.

Answer 6 · 2021-12-29T06:11:21.000Z

Thanks a lot! Happy holidays! Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Hannes Stärk ***@***.***> Sent: Tuesday, December 28, 2021 11:02:06 AM To: HannesStark/3DInfomax ***@***.***> Cc: Rohan Varma ***@***.***>; Mention ***@***.***> Subject: [EXTERNAL MAIL] Re: [HannesStark/3DInfomax] generating a fingerprint (Issue #3) CAUTION: This email originated from outside of the Frontier organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. I now made it a bit easier: Just place your SMILES into the file dataset/inference_smiles.txt and run python inference.py --config=configs_clean/fingerprint_inference.yml Your fingerprints are saved as pickle file into the dataset_directory And in the config file you can specify different pre-trained models if you want. — Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FHannesStark%2F3DInfomax%2Fissues%2F3%23issuecomment-1001901134&data=04%7C01%7Crohan.varma%40frontiermeds.com%7C7b21e0bf35c543a0665b08d9c9cff5b4%7C7ddbe4629a8d475e89123d481b80f3e2%7C0%7C0%7C637762717305784220%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=0vtgLb%2B1%2FSo3oSGhk%2F4h3%2Ff3ocG909vlUesCxAL41a0%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUERWH7WNAGKUZIZFVJUWODUTFOG5ANCNFSM5JXOTBWQ&data=04%7C01%7Crohan.varma%40frontiermeds.com%7C7b21e0bf35c543a0665b08d9c9cff5b4%7C7ddbe4629a8d475e89123d481b80f3e2%7C0%7C0%7C637762717305794188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nuI1OxNZl%2FFKMGNskCGpzmQJW3%2F4N7PVSDDHlKapJDY%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Crohan.varma%40frontiermeds.com%7C7b21e0bf35c543a0665b08d9c9cff5b4%7C7ddbe4629a8d475e89123d481b80f3e2%7C0%7C0%7C637762717305804129%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=tWFQHSrVJLgFioOwxmgUq9a7U7%2BTqLpBYzHQDen18tE%3D&reserved=0> or Android<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Crohan.varma%40frontiermeds.com%7C7b21e0bf35c543a0665b08d9c9cff5b4%7C7ddbe4629a8d475e89123d481b80f3e2%7C0%7C0%7C637762717305804129%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=XufGSA4aKA%2FGwS0ww5DGF9MI8Se4FgYD36jRJO%2Bn2ss%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***>