/Basset

Convolutional neural network analysis for predicting DNA sequence activity.

Primary LanguagePythonMIT LicenseMIT

Basset

Deep convolutional neural networks for DNA sequence analysis.

Basset provides researchers with tools to:

  1. Train deep convolutional neural networks to learn highly accurate models of DNA sequence activity such as accessibility (via DNaseI-seq or ATAC-seq), protein binding (via ChIP-seq), and chromatin state.
  2. Interpret the principles learned by the model.

Installation

Basset has a few dependencies because it uses both Torch7 and Python and takes advantage of a variety of packages available for both.

First, I recommend installing Torch7 from here. If you plan on training models on a GPU, make sure that you have CUDA installed and Torch should find it.

For the Python dependencies, I highly recommend the Anaconda distribution. The only library missing is pysam, which you can install through Anaconda or manually from here.

To download and install the remaining dependencies, run

    ./install_dependencies.py

Basset relies on the environmental variable BASSETDIR to orient itself. In your startup script (e.g. .bashrc), write

    export BASSETDIR=the/dir/where/basset/is/installed

To make the code available for use in any directory, also write

    export PATH=$BASSETDIR/src:$PATH
    export LUAPATH=$BASSETDIR/src:$LUAPATH
    export PYTHONPATH=$BASSETDIR/src:$PYTHONPATH

To download and install additional useful data, like my best pre-trained model and public datasets, run

    ./install_data.py

You can find the full requirement list here.


Documentation

Basset is under active development, so don't hesitate to ask for clarifications or additional features, documentation, or tutorials.


Tutorials

These are a work in progress, so forgive incompleteness for the moment. If there's a task that you're interested in that I haven't included, feel free to post it as an Issue at the top.