- 2024.09: ππ MolPuzzle has been accepted by NeurIPS 2024 Dataset and Benchmark Track as a spotlight!
We present MolPuzzle, a benchmark comprising 234 instances of structure elucidation, which feature over 18,000 QA samples presented in a sequential puzzle-solving process, involving three interlinked subtasks: molecule understanding, spectrum interpretation, and molecule construction.
The figure illustrates the problem of molecular structure elucidation alongside its analogical counterpart, the crossword puzzle, highlighting the parallels in strategy and complexity between these two intellectual challenges
Model | Stage 1 | Stage 2 | Stage 3 |
---|---|---|---|
GPT-4o | β | β | β |
Claude-3 | β | β | β |
Gemini-pro | β | β | β |
GPT-3.5 | β | β | β |
Gemini-3-pro-vision | β | β | β |
LLava1.5-8b | β | β | β |
Qwen-VL-Chat | β | β | β |
InstructBLIP-7b | β | β | β |
InstructBLIP-13b | β | β | β |
Llama3-8b | β | β | β |
Vicuna-7b | β | β | β |
Llama2-7b | β | β | β |
Llama2-13b | β | β | β |
Mistral-7b | β | β | β |
The initial molecules were selected by referencing the textbook Organic Structures from Spectra, 4th Edition, available as an online PDF on ResearchGate. We chose 234 molecules based on spectrum tasks involving IR, MS, 1H-NMR, and 13C-NMR to reflect a difficulty level suitable for graduate students. To address copyright concerns, we excluded molecules with publicly available mass spectrometry (MS) spectra in open-source databases from our study. The remaining spectra were sourced from public resources, notably the PubChem database. For additional spectra that were unavailable, we used simulation methods and provided a Jupyter notebook to generate these data, ensuring high-quality spectra for analysis.
You can download the dataset at data
-
Install Required Packages
Install the necessary Python packages by running:pip install -r requirements.txt
-
API Key Setup
- Add API keys for OpenAI, Claude, and Gemini models
- Example Commands (Stage 2)
-
python stage2.py --task IR --action generate_responses --models instructBlip-7B instructBlip-13B llava gpt-4 claude-v1 --iterations 3
-
python stage2.py --task IR --action evaluate --models instructBlip-7B instructBlip-13B llava gpt-4 claude-v1 --iterations 3
-