This dataset contains MD simulations of 12,506 non-cyclic molecules from the QM9 dataset in vacuum and room temperature using the GAFF force field. We carried out these simulations using the openmmforcefields and OpenMM packages. All initial conditions were generated by energy minimizing the QM9 geometry in the corresponding GAFF force field. We sampled different molecules proportionally to their number of heavy atoms with a median sampling time of 36.5 ns. Moreover, for 100 molecules from the test set (10 %), we run longer 100 ns Replica Exchange (RE) simulations.


  • Anaconda or Miniconda with Python 3.9.

Setting the environment

The ./environment.yml file lists all the packages required for this environment. A virtual environment can be easily created using the YAML file and conda by typing into the terminal:

conda env create -f environment.yml


The MDQM9-nc dataset is available here. It contains mdqm9-nc.sdf, a sdf file with the molecules and mdqm9-nc.hdf5 with conformational data. Random splits are also provided. mdqm9-nc.hdf5 is distrubuted in 10 files mdqm9_nc_0{0 to 9}. After downloadig the 10 files, run

cat parts/mdqm9-nc_* > parts/mdqm9-nc.hdf5

to merge the files. To instantiate a class of the mdqm9-nc dataset, import MDQM9Dataset from

from mdqm9_loader import MDQM9Dataset
sdf_path = "datasets/mdqm9-nc/mdqm9-nc.sdf" #path to mdqm9-nc.sdf
hdf5_path = "datasets/mdqm9-nc/mdqm9-nc.hdf5" #path to mdqm9-nc.hdf5
dataset = MDQM9Dataset(sdf_path, hdf5_path)
first_mol = dataset[0]


@JuanViguera and @psolsson.


Contributions are welcome in the form of issues or pull requests. To report a bug, please submit an issue. Thank you to everyone who has used the code and provided feedback thus far.


If you use MDQM9-nc in your research, please reference our paper.

The reference in BibTex format are available below:

 @article{viguera diez_romeo atance_engkvist_olsson_2023, place={Cambridge}, title={Generation of conformational ensembles of small molecules via Surrogate Model-Assisted Molecular Dynamics}, DOI={10.26434/chemrxiv-2023-sx61w}, journal={ChemRxiv}, publisher={Cambridge Open Engage}, author={Viguera Diez, Juan and Romeo Atance, Sara and Engkvist, Ola and Olsson, Simon}, year={2023}} This content is a preprint and has not been peer-reviewed.

MDQM9-nc is licensed under the MIT license and is free and provided as-is.
