usnistgov/alignn

Jarvis data

Nokimann opened this issue · 8 comments

Thank you for your work on the efficient way to predict the ML method for the molecular system.

But, I couldn't reproduce the paper.

I found that Jarvis summarizes QM9 datasets with normalization, and I issued in jarvis.

I tested ALIGNN, but cannot reproduce it for the unnormalized QM9 datasets.
Only the normalized QM9 dataset provided by Jarvis works to reproduce prediction values in paper.

knc6 commented

Hi @Nokimann

Which property/task did you try to reproduce, and how much difference did you find?

@knc6

I got 0.029 MAE (~600 epochs) and 0.002 MAE (~300 epochs) of U0 data for the unnormalized data and the normalized data, respectively.

std ~10 of the QM9 data, which means one order high. So, it is pretty reasonable.

knc6 commented

You are right. We didn't multiply the std with corresponding MAEs but we should have.

For some properties (with std <1) in QM9 ALIGNN model performance becomes better than reported now but for models such as U0 it becomes worse. We are working on an erratum right now and will update the arXiv preprint as well as the README file soon. The performance on JARVIS-DFT and MP dataset remains intact.

We note that if we train for 1000 or so epochs, we can get U0 MAE upto 0.014 eV. For reference, the std for QM9 tasks:
image.

Thanks @Nokimann for catching this mistake.
Also adding @bdecost to the thread.

It seems like this impacts one of the main claims in the paper, but unfortunately there has been no update in the last month. The paper, readme, and arXiv still show the wrong results. Would you have any update on the progress of fixing this?

knc6 commented

@klicperajo

We have updated the README file now with the 1000 epoch run and multiplication of MAEs with corresponding standard deviations. On a related point ( usnistgov/jarvis#202 (comment) ), I see using different package datasets such as from PyG or DGL might give you different graphs. Hence, we choose to learn directly from xyz/POSCAR files. Our goal is that after we train a model, a user can feed a POSCAR/xyz file to get predictions using pretrained.py which might be possible but not too easy using PyG/DGL based datasets.

#PyG
from torch_geometric.datasets.qm9 import QM9
q=QM9(root='.')
x=[]
for i in q:
  x.append(i.edge_attr.shape[0])
print (sum(x)) #4883516

#DGL
from dgl.data.qm9 import QM9Dataset
y=[]
for i in data:
  y.append(i[0].num_edges())
print (sum(y)) #36726502

That is great to see, thank you! Any progress on arXiv and npj?

I was not suggesting to use the PyG or DGL datasets, but rather to provide the non-standardized data (in eV or similar). I have seen this mistake of reporting standardized error instead of real units several times now. We should make sure that the straightforward way of evaluation is the correct one. Otherwise this error will be repeated again.

knc6 commented

Thank you for making the effort to amend these numbers!