/UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

Primary LanguagePythonMIT LicenseMIT

Tools for dealing with Unique Molecular Identifiers

This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs). Currently there are two tools:

  • extract: Flexible removal of UMI sequences from fastq reads.
    UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read.
  • dedup: Implements a number of different UMI deduplication schemes.
    The recommended method is directional_adjecency.

See simulation results at the CGAT blog.

Genome Science 2015 poster.

Biorxiv Preprint.

Installation

If you're using Conda, you can use:

conda install -c https://conda.anaconda.org/toms umi_tools

Or pip:

pip install umi_tools

Or if you'd like to work directly from the git repository:

git clone git@github.com:CGATOxford/UMI-tools.git

Enter repository and run:

python setup.py install

Help

To get help on umi_tools run

`umi_tools --help`

To get help on umi_tools extract run

`umi_tools extract --help`

To get help on umi_tools dedup run

`umi_tools dedup --help`

Dependencies

umi_tools is dependent on numpy, pandas, cython, pysam and future