/msmbuilder

Statistical models for biomolecular dynamics

Primary LanguagePythonGNU Lesser General Public License v2.1LGPL-2.1

MSMBuilder

Build Status PyPi version License Documentation

MSMBuilder is a python package which implements a series of statistical models for high-dimensional time-series. It is particularly focused on the analysis of atomistic simulations of biomolecular dynamics. For example, MSMBuilder has been used to model protein folding and conformational change from molecular dynamics (MD) simulations. MSMBuilder is available under the LGPL (v2.1 or later).

Capabilities include:

  • Feature extraction into dihedrals, contact maps, and more
  • Geometric clustering with a variety of algorithms.
  • Dimensionality reduction using time-structure independent component analysis (tICA) and principal component analysis (PCA).
  • Markov state model (MSM) construction
  • Rate-matrix MSM construction
  • Hidden markov model (HMM) construction
  • Timescale and transition path analysis.

Check out the documentation at msmbuilder.org and join the mailing list. For a broader overview of MSMBuilder, take a look at our slide deck.

Installation

The preferred installation mechanism for msmbuilder is with conda:

$ conda install -c omnia msmbuilder

If you don't have conda, or are new to scientific python, we recommend that you download the Anaconda scientific python distribution.

Workflow

An example workflow might be as follows:

  1. Set up a system for molecular dynamics, and run one or more simulations for as long as you can on as many CPUs or GPUs as you have access to. There are a lot of great software packages for running MD, e.g [OpenMM] (https://simtk.org/home/openmm), Gromacs, Amber, CHARMM, and many others. MSMBuilder is not one of them.

  2. Transform your MD coordinates into an appropriate set of features.

  3. Perform some sort of dimensionality reduction with tICA or PCA. Reduce your data into discrete states by using clustering.

  4. Fit an MSM, rate matrix MSM, or HMM. Perform model selection using cross-validation with the generalized matrix Rayleigh quotient