VictorGoitea/Bioinformatics-Software

Bioinformatics software list

Perl

Personal

Collection of tools for stuff I work with can be found in this repo.

Community Reference

A common collection of tools from community members around the globe for organization and accessibility.

Workflow Tools

Program	Description	Source
Artemis	A genome browser and annotation tool that allows visualization of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation.	Download
BamTools	C++ API & command-line toolkit for working with BAM (Binary SAM file) data. Provides a programmer's API and an end-user's toolkit for handling BAM files.	Clone
BaseMount	Explore runs, projects, samples, app results and analyses by interacting directly with BaseSpace's API as a locally mounted file system	Install
BaseSpace	The BaseSpace Sequence Hub is a cloud-based genomics analysis and storage platform that directly integrates with all Illumina sequencers.	N/A
BaseSpace CLI	Work with the BaseSpace Sequence Hub data using the command line interface (CLI). Supports scripting and programmatic access to BaseSpace Sequence Hub for automation, bulk operations, and other routine functions. It can be used independently or in conjunction with BaseMount.	Install
bcl2fastq	Demultiplexes data and converts base calls in the per-cycle BCL files generated by Illumina sequencing systems to standard FASTQ file formats in a single step for downstream analysis.	Download
BLAST+	Command line application suite of BLAST tools that utilizes the NCBI C++ Toolkit.	Download
EDirect	An advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal.	N/A
E-utilities	Entrez Programming Utilities (E-utilities) are a set of nine server-side programs that provide a stable interface into the Entrez query and database system at the NCBI.	N/A
FastQC	A quality control tool for high throughput sequence data.	Clone Download
IGV	Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. The igvtools utility provides a set of tools for pre-processing data files.	Download
Martian	Martian is a language and framework for developing and executing complex computational pipelines.	Clone Download
Nextflow	Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.	Clone
Samtools	A suite of programs for interacting with high-throughput sequencing data (HTS) from next generation sequencing data. It consists of three separate repositories: Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format. BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants. HTSlib: A C library for reading/writing high-throughput sequencing data.	Download
Seqtk	Fast and lightweight tool for processing sequences in the FASTA or FASTQ format.	Clone
SRA Toolkit	The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.	Download
VCFtools	Package designed for working with complex genetic variation data in the form of VCF files.	Download
WebLogo	Create sequence logos, a graphical representation of an amino acid or nucleic acid multiple sequence alignment.	Clone Visit

Analysis

DNA

Program	Description	Purpose	Source
AUGUSTUS	ab initio, trainable gene prediction in eukaryotic genomic sequences.	Gene Prediction	Download
BUSCO	Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.	Assembly Quality Assesment	Download
Circlator	Predict and automate assembly circularization and produce accurate linear representations of circular sequences.	Circularize Genome	Download
Clustal	Fast and scalable multiple sequence alignment (can align hundreds of thousands of sequences in hours)	MSA	Download
Galaxy	Web portal for accessible, reproducible, and transparent computational research.	Analysis package	Download
HOMER	HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.	Prediction and analysis	Download
HMMER	Search sequence databases for sequence homologs, and for making sequence alignments, analyzed by using profile hidden Markov models	Detect Homologs	Download
HTSeq	HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.	Analysis Package	Clone Download
Mauve	A system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion.	Genome Aligner	Download
Mothur	Expandable software to fill the bioinformatics needs of the microbial ecology community.	Microbial Ecology Pipeline	Download
MUMmer Package	Ultra-fast alignment of large-scale DNA and protein sequences. A system for rapidly aligning entire genomes, whether in complete or draft form. MUMmer is a suffix tree algorithm designed to find maximal exact matches of some minimum length between two input sequences. NUCmer is a standard DNA sequence alignment. It is a robust pipeline that allows for multiple reference and multiple query sequences to be aligned in a many vs. many fashion. PROmer is like NUCmer with one exception - all matching and alignment routines are performed on the six frame amino acid translation of the DNA input sequence.	Genome Aligner	Download
MUSCLE	MUSCLE can align hundreds of sequences in seconds.	MSA	Download
Picard	Set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.	HTS Toolkit	Download
QIIME	Bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics.	Microbial Ecology Pipeline	Install
QUAST	Evaluates genome assemblies.	Evaluate Genome Assemblies	Download
T-Coffee	A multiple sequence alignment package that can align sequences (Protein, DNA, and RNA) or combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee). It is also able to combine sequence information with protein structural information (3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary structures.	MSA	Download
ViennaRNA Package	Programs for the prediction and comparison of RNA secondary structures.	Prediction	Download

PacBio Sequencing

Program	Description	Purpose	Source
BLASR	PacBio® long read aligner	Sequence Aligner	Download
Canu	Fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).	Genome Assembly	Download
Celera Assembler	Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler, and can use any combination of platform reads.	Genome Assembly	Download
Cerulean	Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads.	Hybrid Assembly	Download
PBSuite	PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes. PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.	Reference Mapping Variant Calling	Download
SMRT Analysis	Self-contained software suite designed for use with Single Molecule, Real-Time (SMRT) Sequencing data.	Analysis Package	Download
SPAdes	Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads.	Hybrid Assembly	Download
Sprai	Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly.	Sequencing Error-correction	Download

Illumina Sequencing

Referenced

Program	Description	Purpose	Source
Bowtie2	An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.	Reference Aligner	Download
BWA	Mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.	Reference Mapping	Download
HISAT2	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome).	Whole-Genome Mapping	Clone

De novo

Program	Description	Purpose	Source
ABySS	De novo, parallel, paired-end sequence assembler designed for short reads and large genomes.	Genome Assembly	Download Install
ALLPATHS-LG	Short read assembler and it works on both small and large (mammalian size) genomes.	Genome Assembly	Download
DISCOVAR	Genome assembler and variant caller.	Genome Assembly	Download
SOAPdenovo	Novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes.	Genome Assembly	Download
SPAdes	Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads.	Genome/Hybrid Assembly	Download
Velvet	Short read de novo assembler using de Bruijn graphs.	Genome Assembly	Download

RNA-Seq

Program	Description	Purpose	Source
Ballgown	A program for computing differentially expressed genes in two or more RNA-seq experiments, using the output of StringTie or Cufflinks. The Ballgown package provides functions to organize, visualize, and analyze expression measurements.	Transcriptome Assembly	Clone Bioconductor
Cufflinks	Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.	Transcriptome Assembly	Clone
DESeq2	The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models.	Differential Expression	Clone Bioconductor
edgeR	Differential expression analysis of RNA-seq expression profiles with biological replication. It can be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE.	Differential Expression	Bioconductor
HISAT2	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome).	Transcriptome Mapping	Clone
HTSeq	HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.	Analysis Package	Clone Download
START	Ultrafast universal RNA-seq aligner.	RNA-seq Aligner	Clone
StringTie	StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.	Transcriptome Assembly	Clone
Trinity	Trinity assembles transcript sequences from Illumina RNA-Seq data.	Transcriptome Assembly	Download

Single Cell

Program	Description	Purpose	Source
Celda	Bayesian hierarchical modeling for clustering Single Cell RNA-Seq Data.	Differential Expression	Clone
cellTree	This packages computes a Latent Dirichlet Allocation (LDA) model of single-cell RNA-seq data and builds a compact tree modelling the relationship between individual cells over time or space.	Visualization	Bioconductor
Chromium Single Cell Software Suite	Package for analyzing and visualizing single cell 3’ RNA-seq data produced by the 10x Chromium Platform. Cell Ranger (Pipelines) is a set of analysis pipeline tools that perform sample demultiplexing, barcode processing, and single cell 3’ gene counting. Loupe™ Cell Browser is an interactive desktop application that helps find significant genes, cell types, and substructure within your single cell data. Cell Ranger (R Kit) is a R package for secondary analysis of Cell Ranger matrix data, including PCA and t-SNE projection, and k-means clustering.	Analysis Package	Clone Download
Pagoda	Framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells.	Pathway/Gene Set Analysis	Clone
SCDE	The SCDE package implements a set of statistical methods for analyzing single cell RNA-seq data, including differential expression analysis and pathway and geneset overdispersion analysis PAGODA.	Differential Expression	Clone Download
Seurat	R package designed for QC, analysis, and exploration of single cell RNA-seq data.	Differential Expression	Clone Install
SPRING	SPRING is a kinetic interface tool for uncovering high-dimensional structure in single cell gene expression data.	Visualization	Clone Visit
Monocle	An analysis toolkit for single cell RNA-seq that performs differential expression and time-series analysis for single cell expression experiments.	Differential Expression	Clone Bioconductor