atomistic-machine-learning/schnetpack

Extracting the atomic embeddings for a molecule after a model is trained

Closed this issue · 7 comments

Hi everyone,
I want to visualise and use the atomic representation for a molecule after the schnet training is done. In the best_model file, I found a 100*input_feature vector for the embeddings, but how to use it to estimate the representation for a specific atom in a molecule?

Hi,
the embedding is learned for every atom type (not every atom). So in order to get the embedding for a specific atom, first get the atom type and find the embedding layer with the atomic number as an index. The dimensions of the embedding are 100*n_features, because the expected maximum atomic number is 100.

Hi,

Thanks a lot! So, If I have a dataset with molecule having 2 C atoms and 2 H atoms, then I should get two vectors for embeddings, one corresponding to C (index:6) and other corresponding to H (index:1)?

Yes :)

Okay, so after I get the embedding vector for a specific atom type, How to obtain the representation for a specific atom in a molecule? Like some kind of postprocessing?

During the forward pass we add the representation (either scalar_representation or vectorial_representation) to the input batch. So you can just do a forward pass and read the input representation from there. You probably also need to collect the atom ids idx_i and the molecule ids idx_m to match every representation vector to the corresponding atom.

for batch in tqdm(data_loader):
    _ = model(batch)
    representation = batch["scalar_representation"]
    idx_i = batch["_idx_i"]
    idx_m = batch["_idx_m"]

Edit: You could probably also write a postprocessing layer, that collects the data for you. Have a look at this part of the code to see where the postprocessing happens.

Thank you so much. I will try this and get back to you if I am still having any problems :)

It worked, Thanks a lot!