Questions about the performance of baselines in 3D-MOLM and MolCA paper
Closed this issue · 1 comments
Hi Zhiyuan and Sihang,
-
I went through both your 3D-MOLM and MolCA papers and I found that the reported performances of MolT5 and MoMu for Molecule Captioning on PubChem dataset in these papers are different without an explanation. (Table 2a in MolCA paper and Table 3a in 3D-MOLM paper). Could you please explain my concern and let me know which baselines are reliable?
-
Additionally, do you have a justification for why MolCA-Galac1.3B outperforms 3D-MoLM-Llama-7B on the Molecule Captioning task?
Thank you very much for your response. Looking forward to hearing from you again.
Thanks.
Hi Khiem,
The processed dataset used in 3D-MoLM and MolCA is different. This is because the two works used different pre-processing strategy, that 3D-MoLM includes a molecules's IUPAC name in the caption and MolCA does not. Therefore, the results are different for baselines and the results are not directly comparable
There are other differences in data-processing in addition to the one that I mention above. I recommend reference to the two paper's appendix section on details of the data-processing