Extracting the atomic embeddings for a molecule after a model is trained
Closed this issue · 7 comments
Hi everyone,
I want to visualise and use the atomic representation for a molecule after the schnet training is done. In the best_model file, I found a 100*input_feature vector for the embeddings, but how to use it to estimate the representation for a specific atom in a molecule?
Hi,
the embedding is learned for every atom type (not every atom). So in order to get the embedding for a specific atom, first get the atom type and find the embedding layer with the atomic number as an index. The dimensions of the embedding are 100*n_features, because the expected maximum atomic number is 100.
Hi,
Thanks a lot! So, If I have a dataset with molecule having 2 C atoms and 2 H atoms, then I should get two vectors for embeddings, one corresponding to C (index:6) and other corresponding to H (index:1)?
Yes :)
Okay, so after I get the embedding vector for a specific atom type, How to obtain the representation for a specific atom in a molecule? Like some kind of postprocessing?
During the forward pass we add the representation (either scalar_representation
or vectorial_representation
) to the input batch. So you can just do a forward pass and read the input representation from there. You probably also need to collect the atom ids idx_i
and the molecule ids idx_m
to match every representation vector to the corresponding atom.
for batch in tqdm(data_loader):
_ = model(batch)
representation = batch["scalar_representation"]
idx_i = batch["_idx_i"]
idx_m = batch["_idx_m"]
Edit: You could probably also write a postprocessing layer, that collects the data for you. Have a look at this part of the code to see where the postprocessing happens.
Thank you so much. I will try this and get back to you if I am still having any problems :)
It worked, Thanks a lot!