A collection of python methods for exploratory analysis and comparison of protein structural ensembles, e.g., from molecular dynamics simulations.
With PENSA, you can (currently):
- compare structural ensembles of proteins via the relative entropy of their features and visualize deviations on a reference structure.
- project ensembles on their combined principal components (PCs) and sort the structures along a PC.
- cluster structures via k-means and via regular-space clustering and write out the resulting clusters as trajectories.
Proteins are featurized via PyEMMA using backbone torsions, sidechain torsions, or backbone C-alpha distances, making PENSA compatible to all functionality available in PyEMMA. Trajectories are processed and written using MDAnalysis. Plots are generated using Matplotlib.
All functionality is available as a python package (installation see below). For the most common applications, example python scripts are provided. To get started, see the tutorial.
Python 3.7 with:
- numpy
- scipy >= 1.2
- mdshare
- pyemma
- MDAnalysis
- matplotlib
Create and activate a conda environment:
conda create --name pensa python=3.7 numpy scipy>=1.2 matplotlib pyemma mdshare MDAnalysis -c conda-forge
conda activate pensa
If you want to use PENSA with Jupyter notebooks:
conda install jupyter
Download and install PENSA:
git clone https://github.com/drorlab/pensa.git
cd pensa
pip install -e .