/pysster

pysster: Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

Primary LanguagePythonMIT LicenseMIT

pysster: a Sequence-STructure classifiER Build Status Build status License: MIT

Learning Sequence And Structure Motifs In Biological Sequences Using Convolutional Neural Networks

pysster is a Python package for training and interpretation of convolutional neural networks on biological sequence data. Sequences are classified by learning sequence (and optionally structure) motifs and the package offers sensible default parameters, a hyper-parameter optimization procedure and options to visualize learned motifs. The main features of the package are:

  • multi-class and single-label or multi-label classifications
  • hyper-parameter tuning (grid search)
  • interpretation of learned motifs in terms of positional and class enrichment and motif co-occurrence
  • support of input strings over user-defined alphabets (e.g. applicable to DNA, RNA, protein data)
  • optional use of structure information, handcrafted features and recurrent layers
  • seamless CPU or GPU computation

The corresponding Bioinformatics paper can be found here.

If you run into bugs, missing documentation or if you have a feature request, feel free to open an issue.

Installation

pysster is compatible with Python 3.5+ and can be installed via pip or github.

Install via pip:

pip3 install pysster

Install latest version via github:

git clone https://github.com/budach/pysster.git
cd pysster
pip3 install .

Using the GPU

pysster depends on TensorFlow and by default the CPU version of TensorFlow will be installed. If you want to use your NVIDIA GPU (which is recommended for large data sets or grid searchs) make sure that your CUDA and cuDNN drivers are correctly installed and then install the GPU version of TensorFlow:

pip3 uninstall tensorflow
pip3 install tensorflow-gpu

By the time of writing the most recent TensorFlow version is 1.7 and requires CUDA 9 and cuDNN 7. You can always check the required versions in the TensorFlow release notes.

Documentation

Tutorials

API documentation

Changelog

v1.1.3 - 19. March 2018 (PyPI)

  • added visualize_all_kernels() method to Model objects (visualize all kernels at once + get HTML summary report)
  • it is now possible to maximize the PR-AUC (precision-recall) instead of the ROC-AUC during a grid search
  • changed default color scheme for ACGT and ACGU alphabets to match conventions
  • fixed a bug that prevented Data objects from being reproducible