/AI2BMD

AI-powered ab initio biomolecular dynamics simulation

MIT LicenseMIT

AI2BMD: AI-powered ab initio biomolecular dynamics simulation

Overview

The AI-powered MD is a generalizable solution to efficiently simulate various proteins with ab initio accuracy by machine learning force field. This project consists of our studies on Datasets, Modeling, Simulation evaluation and analysis, which are demonstrated below and in different branches. See The Homepage of AI2BMD and find the preprint version article AI2BMD: efficient characterization of protein dynamics with ab initio accuracy for more details.

Hiring: We are hiring research interns, engineering interns and full time employees on MD simulation, quantum chemistry, AIDD, geometry deep learning (GDL), molecular graph neural network, system design and CUDA acceleration. Please send your resume to watong@microsoft.com .

Datasets

AIMD-Chig

The whole comformation MD dataset for proteins calculated at Density Functional Theory (DFT) level. AIMD-Chig consists of 2M conformations of the 166-atom Chignolin and the corresponding potential energy and atomic forces calculated at M06-2X/6-31g* level.

Modeling

ViSNet

ViSNet (shorted for “Vector-Scalar interactive graph neural Network”) is an equivariant geometry-enhanced graph neural for molecules that significantly alleviate the dilemma between computational costs and sufficient utilization of geometric information. ViSNet has won the Championship in The First Global AI Drug Development Competition and one of the winners in OGB-LSC @ NeurIPS 2022 PCQM4Mv2 Track!

Simulation evaluation and analysis

Fine-grained force metrics for MLFF

Machine learning force fields (MLFFs) have gained popularity in recent years as they provide a cost-effective alternative to ab initio molecular dynamics (MD) simulations. Despite a small error on the test set, MLFFs inherently suffer from generalization and robustness issues during MD simulations. To alleviate these issues, we propose global force metrics and fine-grained metrics from element and conformation aspects to systematically measure MLFFs for every atom and every conformation of molecules. Furthermore, the performance of MLFFs and the stability of MD simulations can be further improved guided by the proposed force metrics for model training, specifically training MLFF models with these force metrics as loss functions, fine-tuning by reweighting samples in the original dataset, and continued training by recruiting additional unexplored data.

Stochastic lag time parameterization for Markov State Model

Markov state models (MSMs) play a key role in studying protein conformational dynamics. A sliding count window with a fixed lag time is widely used to sample sub-trajectories for transition counting and MSM construction. However, sub-trajectories sampled with a fixed lag time may not perform well under different selections of lag time, which requires strong prior practice and leads to less robust estimation. To alleviate it, we propose a novel stochastic method from a Poisson process to generate perturbative lag time for sub-trajectory sampling and utilize it to construct a Markov chain. Comprehensive evaluations on the double-well system, WW domain, BPTI, and RBD–ACE2 complex of SARS-CoV-2 reveal that our algorithm significantly increases the robustness and power of a constructed MSM without disturbing the Markovian properties. Furthermore, the superiority of our algorithm is amplified for slow dynamic modes in complex biological processes.

Contact

Please contact Dr. Tong Wang (watong@microsoft.com) if you have interests in our study.

License

This project is licensed under the terms of the MIT license.