aiqm/torchani

Predicted energies from all ANI models 2 orders of magnitude out of scale

aarontuor opened this issue · 2 comments

I've loaded the pre-trained models and am assessing them with your ANI dataset. I'm hoping I'm doing something wrong as the results from the pre-trained models appear to be too poor to be usable. Predictions are commonly several orders of magnitude off and often the wrong sign. I've included my short notebook I was using to assess the performance of the trained models.
assess.zip

Hi, what is the unit of the true reference energy. prediction from ani is in hartree.

Hello @aarontuor

I checked the jupyter notebook you provided and there is a mistake in your code
when you call the function:
.subtract_self_energies(energy_shifter, species_order)
you are effectively subtracting a linear fitting of energies from your model, this is done at training time, and the resulting model predicts the left over QM energy. At evaluation time ANI models execute an energy shifter that adds this linear fitting automatically.

In your case, suppose the energy in the dataset is E_qm, with subtract_self_energies you are effectively transforming this energy into E_shifted_qm = E_qm - E_self, but since ANI models add this extra energy automatically because they execute the energy shifter, the ANI model will predict E_qm = E_shifted_qm + E_self, this means you are comparing two incompatible energies, and this is why you are getting this discrepancy.

If you remove the line subtract_self_energies... etc from your script the discrepancy goes away and you will see that the performance is consistent with the papers.

Edit: I remove the latex since it seems that github still doesn't support it