SOM rewarding function

Question

SOM rewarding function

Opened this issue 6 years ago · 10 comments

Hi, I am quite new in this area (both python and AI) sorry if my questions are too stupid.
I read that you used six data sets to build the model. According to your paper, the first is "a large set of molecules derived from a ZINC data set". Is it referring to the following in pretrain.ipynb?

But why does the quantity of compound in that csv differ from the figure that reported in the supplementary table (Table 1)

If i am working on another biological target, in what stage I need to continue the training with datasets for that target? Should I repeat the model.train_as_vaelp() method with another train_loader created from my own target-specific dataset?

In the next step, train_rl.ipynb,

Does it mean it is just an example to train the model to generate compound with high penalised logP? I assumed it has nothing to do with the SOMs?
In your paper, you mentioned you use three SOMSs as reward functions, so I need to def my own scoring functions here? Is there specific module I need to install if I want to write my own SOM rewarding functions?

I am looking forward to hearing from you.
Many thanks

Answer 1 · 2019-12-05T05:58:33.000Z

Hi, I have the same questions. and I was wondering if you have solved the problem yet.
Looking forward to hearing from you.
Many thanks!

Answer 2 · 2020-02-12T08:16:49.000Z

Hi, I have the same questions too because I have read parts of the code in gentrl.py.

I am looking forward to hearing from you.
Many thanks

Answer 3 · 2020-03-19T09:06:13.000Z

#3
According to one of the authors, no SOM code have been or will be provided.

Answer 4 · 2020-04-14T01:16:34.000Z

I have a similar question too

Answer 5 · 2020-05-03T15:32:26.000Z

You can try any SOM.

This is a good example in pytorch.
https://github.com/Dotori-HJ/SelfOrganizingMap-SOM

Answer 6 · 2020-06-04T19:06:30.000Z

Any progressions on this issues? Looking forward to your updates

Answer 7 · 2020-06-05T23:36:22.000Z

Hi,
I want to tell everyone in this thread I have transformed this model into a Pytorch Lightning module for Multi GPU support. Please check it out here

It will increase your efficiency of running it on multi-GPU and even a single GPU.

Please check it and if there are any bugs raise an issue so that I can improve the codes.

Answer 8 · 2020-06-06T00:25:57.000Z

@Bibyutatsu
Thank you for your implementation. I tried it, but I still need to install pytorch-lighting separately.

I think you are very familiar with this repo. So, may I ask you some questions about sampling?
In this repo, you generated new molecules from random latent points. But in the paper, they showed a parent molecule and then generate new molecules. I don't know how to do this. For example, I have a parent molecule, I want to generate similar molecules around this parent molecule. I think you are also aware of chemvae (https://github.com/aspuru-guzik-group/chemical_vae/tree/master/chemvae). They gave an example(https://github.com/aspuru-guzik-group/chemical_vae/blob/master/examples/intro_to_chemvae.ipynb).

Your advice is highly appreciated.

Answer 9 · 2020-06-06T10:56:20.000Z

@xuzhang5788
Yeah currently generating new molecules referencing a parent molecule is not directly supported. But for that you can follow these steps:

Pass the molecule through the encoder and get the mean and log_stds
Now you need to sample from this mean and log_stds so you need to make a custom function in LP module which can take means and log_stds as input and sample from there.
Then simply run decoder.sample on these sampled points.

I will try to incorporate it in code to show you till then I hope these pointers can help you.

Also, you can look at this repo. It does exactly what you were trying to do and also uses VAE like chemvae.

Answer 10 · 2022-08-12T08:51:46.000Z

Hi, I want to tell everyone in this thread I have transformed this model into a Pytorch Lightning module for Multi GPU support. Please check it out here

It will increase your efficiency of running it on multi-GPU and even a single GPU.

Please check it and if there are any bugs raise an issue so that I can improve the codes.

Hi @Bibyutatsu, the link you provided cannot be accessed.

And do you known how to use SOM to calculate the reward? Perhaps if the generated structures are mapped in the same grid with the DDR1 inhibitor molecules, there will be a positive reward, otherwise, a negative reward?

Thank you!