- Quantum Mechanics: 19443 ligands, curated and refined
- Molecular Dynamics: 16972 simulated protein-ligand structures, 10 ns each
- AI: pytorch dataloaders, 3 base line models for MD and QM and binding affinity prediction
We are a drug discovery community project 🤗
- highest possible accuracy for ligand molecules
- represent the systems dynamics in reasonable timescales
- innovative AI models for drug discovery predictions
Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.
Want to get hands-on for drug discovery using AI?
Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.
In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.
You can freely download the FULL MISATO dataset from Zenodo using the links below:
- MD (133 GiB)
- QM (0.3 GiB)
- electronic densities (6 GiB)
- MD restart and topology files (55 GiB)
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5
Start with the notebook src/getting_started.ipynb to :
- Understand the structure of our dataset and how to access each molecule's properties.
- Load the PyTorch Dataloaders of each dataset.
- Load the PyTorch lightning Datamodules of each dataset.
We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.
# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset
For singularity use:
# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif
For docker use:
sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh
├── data <- Project data
│ ├──MD
│ │ ├── h5_files <- storage of dataset
│ │ └── splits <- train, val, test splits
│ └──QM
│ │ ├── h5_files <- storage of dataset
│ │ └── splits <- train, val, test splits
│
├── src <- Source code
│ ├── data
│ │ ├── components <- Datasets and transforms
│ │ ├── md_datamodule.py <- MD Lightning data module
│ │ ├── qm_datamodule.py <- QM Lightning data module
│ │ │
│ │ └── processing <- Skripts for preprocessing, inference and conversion
│ │ ├──...
│ ├── getting_started.ipynb <- notebook : how to load data and interact with it
│ └── inference.ipynb <- notebook how to run inference
│
├── docker <- Dockerfile and execution script
└── README.md
In case you want to use conda for your own installation please create a new misato environment.
In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install
Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.
conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels
To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.
conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
If you found this work useful please consider citing the article.