Run the bamutils
, bedutils
, fastqutils
, or gtfutils
scripts. These are the main driver scripts
that setup the appropriate environment (loading pysam) and run the requested command.
For ease of use, it is recommended that you add bamutils
, bedutils
, fastqutils
, and gtfutils
to your $PATH.
Running bamutils
without any arguments will return a list of commands available. Running bamutils command
will give you the parameters required for that command (or bedutils
, etc...)
General usage format:
bamutils command {options} filename
bedutils command {options} filename
fastqutils command {options} filename
gtfutils command {options} filename
Scripts for manipulating and analyzing BAM files
DNA
- basecall - Base caller
- minorallele - Calls minor alleles
RNA
- cims - Finds regions of unusual deletions (CLIP-Seq)
- rpkm - Calculates RPKM/counts for genes/regions/repeats (Note: gene counts require a RefIso file - see
sequtils
below)
General
- convertregion - Converts region mapping to genomic mapping
- expressed - Finds regions expressed in a BAM file
- extract - Extract reads from specific regions (BED)
- filter - Removes reads from a BAM file based on criteria
- merge - Merge multiple BAM files together
- reads - Extract the names of reads and their positions
- split - Splits a BAM file into smaller pieces
- stats - Calculates simple stats for a BAM file
Conversions
- tobed - Convert reads to BED6
- tobedgraph - Convert reads to BedGraph
- tofasta - Convert reads to FASTA
- tofastq - Convert reads to FASTQ
Scripts for manipulating and creating BED files
General
- extend - Extends BED regions (3' end)
- reduce - Merges overlapping BED regions
- refcount - Given a number of BED files, calculate the number of samples that overlap regions in a reference BED file
- sort - Sorts a BED file (in place)
- stats - Calculates simple stats for a BED file
Conversions
- fromprimers - Converts a list of PCR primer pairs to BED regions
- tobedgraph - BED to BedGraph
- tofasta - Extract BED regions from a reference FASTA file
Scripts for manipulating and creating FASTQ files
General
- convertqual - Convert a FASTQ file's quality values from Illumina to Sanger scale
- merge - Merge paired FASTQ files into one file
- names - Extract the read names from a FASTQ file
- split - Split one FASTQ file into N number of smaller files
- stats - Calculates simple stats for a FASTQ file
- trim - Remove 5' and 3' linkers from each sequence (uses SW alignment)
- truncate - Truncate the sequence/qual for all reads to a specific length
Conversions
- fromfasta - Converts FASTA files (with .qual) to FASTQ (basespace or colorspace)
- fromqseq - Converts Illumina qseq (or export/sorted) files to FASTQ format
- tofasta - Converts FASTQ to FASTA
Scripts for assembling the gene model used in bedutils
and bamutils
scripts. The gene model is similar to the UCSC refFlat or KnownGene
tab-delimited format, except that it adds one column to the beginning indicating isoforms. For some organisms this column can be redundant.
But for others, it is a required step to ensure annotated isoforms are on the same chromosome and overlap. We are calling this format
RefIso. RefIso files can be compiled from UCSC refFlat or KnownGene files. If needed, these can be automatically downloaded for each
organism. It is possible that this format will be deprecated in the future.
GTF extra annotations
- add_isoform - Appends isoform annotation from UCSC isoforms file"
- add_reflink - Appends isoform/name annotation from RefSeq/refLink"
- add_xref - Appends name annotation from UCSC Xref file"
General
- genesize - Extract the sizes of genes from the GTF model (genomic and transcript lengths)
- junctions - Create a library of potential splice-junctions based upon the GTF model
Conversions
- tobed - Converts a GFF/GTF model to BED format"
Checkout the code and run make
. This will create a virtualenv folder (env) and install the needed libraries. The only libraries that are
mandatory are pysam and cython. Cython requires that the Python headers be present on the system. For a linux system this can be
achieved by installing 'python-devel' or similar.
If you need read-only access use:
git clone git://github.com/ngsutils/ngsutils.git
Requires
- Python 2.6+ (including development packages)
- virtualenv
Will install
- pysam
- Cython
Recommended
- samtools
- tabix
NGSUtils - Tools for next-generation sequencing analysis
Copyright (c) 2010-2012 The Trustees of Indiana University
Copyright (c) 2013-2016 The Board of Trustees of Leland Stanford Junior University
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer listed
in this license in the documentation and/or other materials
provided with the distribution.
- Neither the name of the copyright holders nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
The copyright holders provide no reassurances that the source code
provided does not infringe any patent, copyright, or any other
intellectual property rights of third parties. The copyright holders
disclaim any liability to any recipient for claims brought against
recipient by any third party for infringement of that parties
intellectual property rights.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Written and maintained by: Marcus Breese <mbreese@stanford..edu>
Dept. of Pediatrics, Div. of Hematology/Oncology
Stanford University School of Medicine
©2010-2012 Trustees of Indiana University
©2013-2016 The Board of Trustees of Leland Stanford Junior University