Collection of tools for stuff I work with can be found in this repo.
A common collection of tools from community members around the globe for organization and accessibility.
Program | Description | Source |
---|---|---|
Artemis | A genome browser and annotation tool that allows visualization of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation. | Download |
BamTools | C++ API & command-line toolkit for working with BAM (Binary SAM file) data. Provides a programmer's API and an end-user's toolkit for handling BAM files. | Clone |
BaseMount | Explore runs, projects, samples, app results and analyses by interacting directly with BaseSpace's API as a locally mounted file system | Install |
BaseSpace | The BaseSpace Sequence Hub is a cloud-based genomics analysis and storage platform that directly integrates with all Illumina sequencers. | N/A |
BaseSpace CLI | Work with the BaseSpace Sequence Hub data using the command line interface (CLI). Supports scripting and programmatic access to BaseSpace Sequence Hub for automation, bulk operations, and other routine functions. It can be used independently or in conjunction with BaseMount. | Install |
bcl2fastq | Demultiplexes data and converts base calls in the per-cycle BCL files generated by Illumina sequencing systems to standard FASTQ file formats in a single step for downstream analysis. | Download |
BLAST+ | Command line application suite of BLAST tools that utilizes the NCBI C++ Toolkit. | Download |
EDirect | An advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal. | N/A |
E-utilities | Entrez Programming Utilities (E-utilities) are a set of nine server-side programs that provide a stable interface into the Entrez query and database system at the NCBI. | N/A |
FastQC | A quality control tool for high throughput sequence data. | Clone Download |
IGV | Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. The igvtools utility provides a set of tools for pre-processing data files. |
Download |
Martian | Martian is a language and framework for developing and executing complex computational pipelines. | Clone Download |
Nextflow | Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages. | Clone |
Samtools | A suite of programs for interacting with high-throughput sequencing data (HTS) from next generation sequencing data. It consists of three separate repositories: Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format. BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants. HTSlib: A C library for reading/writing high-throughput sequencing data. |
Download |
Seqtk | Fast and lightweight tool for processing sequences in the FASTA or FASTQ format. | Clone |
SRA Toolkit | The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. | Download |
VCFtools | Package designed for working with complex genetic variation data in the form of VCF files. | Download |
WebLogo | Create sequence logos, a graphical representation of an amino acid or nucleic acid multiple sequence alignment. | Clone Visit |
Program | Description | Purpose | Source |
---|---|---|---|
AUGUSTUS | ab initio, trainable gene prediction in eukaryotic genomic sequences. | Gene Prediction | Download |
BUSCO | Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. |
Assembly Quality Assesment | Download |
Circlator | Predict and automate assembly circularization and produce accurate linear representations of circular sequences. | Circularize Genome | Download |
Clustal | Fast and scalable multiple sequence alignment (can align hundreds of thousands of sequences in hours) | MSA | Download |
Galaxy | Web portal for accessible, reproducible, and transparent computational research. | Analysis package | Download |
HOMER | HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. | Prediction and analysis | Download |
HMMER | Search sequence databases for sequence homologs, and for making sequence alignments, analyzed by using profile hidden Markov models | Detect Homologs | Download |
HTSeq | HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. | Analysis Package | Clone Download |
Mauve | A system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. | Genome Aligner | Download |
Mothur | Expandable software to fill the bioinformatics needs of the microbial ecology community. | Microbial Ecology Pipeline | Download |
MUMmer Package | Ultra-fast alignment of large-scale DNA and protein sequences. A system for rapidly aligning entire genomes, whether in complete or draft form. MUMmer is a suffix tree algorithm designed to find maximal exact matches of some minimum length between two input sequences. NUCmer is a standard DNA sequence alignment. It is a robust pipeline that allows for multiple reference and multiple query sequences to be aligned in a many vs. many fashion. PROmer is like NUCmer with one exception - all matching and alignment routines are performed on the six frame amino acid translation of the DNA input sequence. |
Genome Aligner | Download |
MUSCLE | MUSCLE can align hundreds of sequences in seconds. | MSA | Download |
Picard | Set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. | HTS Toolkit | Download |
QIIME | Bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. | Microbial Ecology Pipeline | Install |
QUAST | Evaluates genome assemblies. | Evaluate Genome Assemblies | Download |
T-Coffee | A multiple sequence alignment package that can align sequences (Protein, DNA, and RNA) or combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee). It is also able to combine sequence information with protein structural information (3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary structures. | MSA | Download |
ViennaRNA Package | Programs for the prediction and comparison of RNA secondary structures. | Prediction | Download |
Program | Description | Purpose | Source |
---|---|---|---|
BLASR | PacBio® long read aligner | Sequence Aligner | Download |
Canu | Fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). | Genome Assembly | Download |
Celera Assembler | Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler, and can use any combination of platform reads. | Genome Assembly | Download |
Cerulean | Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads. | Hybrid Assembly | Download |
PBSuite | PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants. |
Reference Mapping Variant Calling |
Download |
SMRT Analysis | Self-contained software suite designed for use with Single Molecule, Real-Time (SMRT) Sequencing data. | Analysis Package | Download |
SPAdes | Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. | Hybrid Assembly | Download |
Sprai | Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. | Sequencing Error-correction | Download |
Program | Description | Purpose | Source |
---|---|---|---|
Bowtie2 | An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. | Reference Aligner | Download |
BWA | Mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | Reference Mapping | Download |
HISAT2 | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). | Whole-Genome Mapping | Clone |
Program | Description | Purpose | Source |
---|---|---|---|
ABySS | De novo, parallel, paired-end sequence assembler designed for short reads and large genomes. | Genome Assembly | Download Install |
ALLPATHS-LG | Short read assembler and it works on both small and large (mammalian size) genomes. | Genome Assembly | Download |
DISCOVAR | Genome assembler and variant caller. | Genome Assembly | Download |
SOAPdenovo | Novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. | Genome Assembly | Download |
SPAdes | Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. | Genome/Hybrid Assembly | Download |
Velvet | Short read de novo assembler using de Bruijn graphs. | Genome Assembly | Download |
Program | Description | Purpose | Source |
---|---|---|---|
Ballgown | A program for computing differentially expressed genes in two or more RNA-seq experiments, using the output of StringTie or Cufflinks. The Ballgown package provides functions to organize, visualize, and analyze expression measurements. | Transcriptome Assembly | Clone Bioconductor |
Cufflinks | Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. | Transcriptome Assembly | Clone |
DESeq2 | The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models. | Differential Expression | Clone Bioconductor |
edgeR | Differential expression analysis of RNA-seq expression profiles with biological replication. It can be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. | Differential Expression | Bioconductor |
HISAT2 | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). | Transcriptome Mapping | Clone |
HTSeq | HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. | Analysis Package | Clone Download |
START | Ultrafast universal RNA-seq aligner. | RNA-seq Aligner | Clone |
StringTie | StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. | Transcriptome Assembly | Clone |
Trinity | Trinity assembles transcript sequences from Illumina RNA-Seq data. | Transcriptome Assembly | Download |
Program | Description | Purpose | Source |
---|---|---|---|
Celda | Bayesian hierarchical modeling for clustering Single Cell RNA-Seq Data. | Differential Expression | Clone |
cellTree | This packages computes a Latent Dirichlet Allocation (LDA) model of single-cell RNA-seq data and builds a compact tree modelling the relationship between individual cells over time or space. | Visualization | Bioconductor |
Chromium Single Cell Software Suite | Package for analyzing and visualizing single cell 3’ RNA-seq data produced by the 10x Chromium Platform. Cell Ranger (Pipelines) is a set of analysis pipeline tools that perform sample demultiplexing, barcode processing, and single cell 3’ gene counting. Loupe™ Cell Browser is an interactive desktop application that helps find significant genes, cell types, and substructure within your single cell data. Cell Ranger (R Kit) is a R package for secondary analysis of Cell Ranger matrix data, including PCA and t-SNE projection, and k-means clustering. |
Analysis Package | Clone Download |
Pagoda | Framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells. | Pathway/Gene Set Analysis | Clone |
SCDE | The SCDE package implements a set of statistical methods for analyzing single cell RNA-seq data, including differential expression analysis and pathway and geneset overdispersion analysis PAGODA. | Differential Expression | Clone Download |
Seurat | R package designed for QC, analysis, and exploration of single cell RNA-seq data. | Differential Expression | Clone Install |
SPRING | SPRING is a kinetic interface tool for uncovering high-dimensional structure in single cell gene expression data. | Visualization | Clone Visit |
Monocle | An analysis toolkit for single cell RNA-seq that performs differential expression and time-series analysis for single cell expression experiments. | Differential Expression | Clone Bioconductor |