biomagician

Collection of papers and tools that are helpful for bioinformatic & biostatistic analysis.

Tutorials

Category	Name	Description	Link
Training collection	SIB	A curated list of bioinformatics training material	861
Tutorial	Python Tutorial	Python Tutorial	862

Containers

Category	Name	Description	Link
Dockerfile	Singularity in Docker	The resulting Docker image can be used on any system with Docker to build Singularity images	710
Tutorial	Singularity	Containerization	711
Hub	SingularityHub	Encapsulation of Environments with Containers	712

Graph Databases aka Knowledgebases

Category	Name	Description	Link
Graph Platform	neo4j	is a graph database management system	476, 477
LBD	SemNet	provides an adoptable method for efficient Literature-Based-Discovery (LBD) of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care	478
Graph Database	Het	Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes	479
Graph Database	BioGraph	an online service and a graph DB for querying and analyzing bioinformatics resources	481, 482
Graph Database	edge2vec	Learning Node Representation Using Edge Semantics"	483, 484
Graph Database	NGLY1 Deficiency Knowledge Graph	NGLY1 Deficiency Knowledge Graph, the reasoning context to support hypothesis discovery for NGLY1 Deficiency-CDDG	485, 486, 487
Graph Database	StarPepDB	is a Neo4j graph database resulting from an integration process by which data from a large variety of bioactive peptide databases are cleaned, standardized, and merged so that it can be released into an organized collection	488, 489
Knowledgebase	NeXtProt	is an integrative resource providing both data on human protein and the tools to explore these	557, 558
Graph Database	Cayley	is an open-source database for Linked Data. It is inspired by the graph database behind Google's Knowledge Graph (formerly Freebase)	559, 560
Tutorial	Neo4j	Importing CSV Files in Neo4j	791
Tutorial	Neo4j	Getting Started with Graph Embeddings in Neo4j	792

Databases

Category	Name	Description	Link
WGS	GTDB-Tk	GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes. It is computationally efficient and designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes.	10, 11 112
16S	pbHITdb	PharmaBiome manually curated HITdb	12
HumanMicrobiome R Data	curatedMetagenomicData	Dataset that can be loaded into R which contains human microbiome data from several body sites	35 229 230
HMP 16S	HMP16SData	R/Bioconductor package to simplify access to and analysis of HMP 16S data	63 64
tRNA	GtRNAdb	The genomic tRNA database contains tRNA gene predictions made by tRNAscan-SE on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature.	75
16S Database	EzBioCloud 16S	Unlike other public databases, EzBioCloud’s 16S database can be used for species-level identification of OTUs and is freely available for academic, not-for-profit purposes	90 91
WGS core gene database	UBCG	UBCG stands for the Up-to-date Bacterial Core Gene. It is a method and software tool for inferring phylogenetic relationship using bacterial core gene set that is defined by up-to-date bacterial genome database.	94 95
Mouse gut gene catalog	iMGMC	integrated Mouse Gut Metagenomic Catalog	98 99
WGS	Paper	737 WGS from high-throughput culturomics	109 110
Database	MicrobiomeDB	A data-mining platform for interrogating microbiome experiments	113
Database	MGnify	Public Datasets of Metagenomic samples and 16S data of various clinical studies (UHGG,UHGP)	114, 115, 495, 496, 497, 603
Database	dbBact	Microorganisms Knowledge Database	117
Database	BIGSI	BIGSI can search a collection of raw (fastq/bam), contigs or assembly for genes, variant alleles and arbitrary sequence. It can scale to millions of bacterial genomes requiring ~3MB of disk per sample while maintaining millisecond kmer queries in the collection	124 125 126
Database	GutCyc	GutCyc is a publicly-available and licenced resource and portal providing pathway annotation data for environmental metagenomic samples derived from the metagenomic studies of the human gut.	134 135 136
Database	miBC	This collection includes all cultivable bacterial strains isolated from the intestine of mice (Mus musculus) that are publicly available to date.	141 142
Ortholog Database	OrthoMCL DB	is a genome-scale algorithm for grouping orthologous protein sequences. It provides not only groups shared by two or more species/genomes, but also groups representing species-specific gene expansion families.	148
Database	KiMoSys	Data repository for KInetic MOdels of biological SYstems	150
Database	BiGG Database	BiGG Models is a knowledgebase of genome-scale metabolic network reconstructions. BiGG Models integrates more than 70 published genome-scale metabolic networks into a single database with a set of stardized identifiers called BiGG IDs. Genes in the BiGG models are mapped to NCBI genome annotations, and metabolites are linked to many external databases (KEGG, PubChem, and many more).	151
Database	embl_gems	This is a collection of genome-scale models built for all reference and representative bacterial genomes of NCBI RefSeq (release 84) using CarveMe	160
Database	BioCyc	BioCyc is a collection of 14560 Pathway/Genome Databases (PGDBs), plus software tools for exploring them	168
Database	MetaCyc	MetaCyc is a curated database of experimentally elucidated metabolic pathways from all domains of life. MetaCyc contains 2666 pathways from 2960 different organisms	169
Database	KEGG	A set of annotation maps for KEGG assembled using data from KEGG	171 172
Database	VMH	The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data	176 177
Database	ggkbase	an online database that offers users several options for retrieving data of interest: by projects, names, description, by genome completion or class	189
Database Cohorts	IGGdb	integrated genomes from the gut microbiome and other environments	192 193
Database	Genome Properties	Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome	215 216 217 218
Database	YANA	a software tool for analyzing flux modes, gene-expression and enzyme activities	219
Database	Clinical Trials	is a database of privately and publicly funded clinical studies conducted around the world	222
Database	microcontax	R package of microclass: The consensus taxonomy for prokaryotes is a package of data sets designed to be the best possible for training taxonomic classifiers based on 16S rRNA sequence data	231 232
Database	ExperimentHub	ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed.	252 253
Database	curated MetagenomicData	Bioconductor package with thousands of curated metagnome datasets based on the ExperimentHub publication	257, 258
Database	Knomics-Biota	Online service for exploratory analysis of human gut metagenomes	265 266
Database	Terra	Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.	267
Knowledgebase	Grakn	Grakn is an intelligent database: a knowledge graph engine to organise complex networks of data and make it queryable	268 269 270 271
Database	HiMapDB	HiMAP database contains more unique species and strains than any major database	272
Database	HGTree	an explicit evolutionary approach that is generally considered to be a reliable way to detect HGT	276 277
Database	Pfam	a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)	281
Database	ISFinder	provides a list of insertion sequences (IS) isolated from bacteria and archae (MGEs)	313
Database	ICEberg2.0	an updated database of bacterial integrative and conjugative elements	318 319
Database	microscope	Microbial Genome Annotation & Analysis Platform	329
Database	CARD	Comprehensive Antibiotic Resistance Database that is used to identify resistance genes (used in seres patent)	335
Database	Raes Reference Genomes	Reference genomes from HMP project but filtered and assembled by Raes lab as new resource	358
Database	proGenomes	Currated database by Sunagawa about with genomes and very good functional annotation on bacteria and archea	371, 372
Database	PATRIC	the Pathosystems Resource Integration Center, provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases	416
Database	FARMEDB	is a database of DNA and protein sequences derived exclusively from environment sequences showing AR in laboratory experiments. The Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences, predicted protein sequences conferring antibiotic resistance and additional regulatory and mobile genetic elements and predicted proteins flanking the antibiotic resistant genes	442, 443
Database	VMH	The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data	474
Database	MetaNetX	Automated Model Construction and Genome Annotation for Large-Scale Metabolic Networks	475
Database	Microbiome Database (old Integrated Gene Catalogue)	Microbiome database involves the sequencing resource and metadata of ecological community samples of microorganisms, including both host-associated or environmental microbes. This database provides detailed and accurate metadata of these metagenomics samples, as well as gene catalogs for host-associated microbiome, and moreover, well-characterized isolated strains can be found in our database too	490, 491
Database	Human Gut metabolic Models	Human curated database by Raes lab to link pathway identifiers to metabolic functions which can be used for metagenomic samples to get metabolic functions	510
Database	CAZy	The Carbohydrate-Active enZYmes Database CAZy database describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds.	18, 19
Database	ImmeDB	Intestinal microbiome mobile element is a database dedicated to the collection, classification, and annotation of mobile genetic elements (MGEs) from gut microbiome	595, 596
LIMS	openBIS	open source Laboratory Notebok & Inventory manager	707
Database	probeBase	probeBase is a curated database of rRNA-targeted oligonucleotide probes and primers	724
Database	bugsigdb	A Comprehensive Database of Published Microbial Signatures	766, 767
Webapp	GMGC	Global Microbial Gene Catalog	772, 773
Webapp	MAP	The Microbe Atlas Project aims to shed new light on the ecology of these elusive microbes by leveraging the large amounts of sequenced microbial communities	821
Database	proGenomes2	an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes	851

Bioinformatics

Category	Name	Description	Link
Bioinformatic Tools	OmicTools	Collection of many many tools that can be useful for some bioinformatic anlyses	4
16S pipeline	Gloor Lab dada2 pipeline	This pipeline will take your paired fastq reads (from Illumina MiSeq or HiSeq) and generate an OTU counts table with an approximate taxonomy assignment. The reads have to have been generated using Gloor Lab Illumina SOP so that the reads are paired, overlapping, and contain the barcode and primer information (have not been demultiplexed or had primers or barcodes removed).	8
Metagenomics	SingleM	SingleM is a tool to find the abundances of discrete operational taxonomic units (OTUs) directly from shotgun metagenome data, without heavy reliance on reference sequence databases. It is able to differentiate closely related species even if those species are from lineages new to science.	13
Gene annotation	Pulpy	An automated, reproducible and scalable prediction of Polysaccharide Utilisation Loci (PUL) in 5414 public Bacteroidetes genomes. The predictions are fully open and can be accessed and used by any researcher, commercial or otherwise.	17, 18, 19; preprint 20
16S pipeline	mare	The mare R package is an easy-to-use pipeline for microbiota analysis based on 16S-amplicon reads. It takes the raw reads, creates taxonomic tables, visualises the results, and finally identifies organisms significantly associated with variables of interest. For read processing, OTU clustering, and taxonomic annotation	32
WGS assembly pipeline	pgap	The official bacterial whole genome assembly pipeline of NCBI	33, 674
r-package	picante	Phylocom integration, community analyses, null-models, traits and evolution in R	39
tree-modeling	iq-tree	Fast and effective stochastic algorithm to reconstruct phylogenetic trees by maximum likelihood. IQ-TREE compares favorably to RAxML and PhyML in terms of likelihood while requiring similar amount of computing time	45
modeling	PartitionFinder2	PartitionFinder2 is a program for selecting best-fit partitioning schemes and models of evolution for nucleotide, amino acid, and morphology alignments.	47
Function Prediction	PICRUST	Predicts functions of total genomes based on 16S sequences	49
Function Prediction	Tax4Fun	Predicts functions of total genomes based on 16S sequences	50
ML-classifier	MicroPheno	is a reference- and alignment-free approach for predicting the environment or host phenotype from microbial community samples based on k-mer distributions in shallow sub-samples of 16S rRNA data.	54, 55
OTU-generator	DiTaxa	alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for microbiome phenotype and biomarker detection	56
OTU-geneartor	HmmUFOtu	An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies	58
OTU-generator	otu2ot	Oligotyping for R	59
Microbiomics SOP	Microbiome_helper	Microbiome Helper is a repository that contains several resources to help researchers working with microbial sequencing data	62
16S Pipeline	SeekDeep	is one command line program that contains several programs within that all combined together make up the SeekDeep targeted sequencing analysis pipeline	67, 68
R Package - ShinyApp	FastqCleaner	An interactive web application for quality control, filtering and trimming of FASTQ files.	81, 82
Preprocessing tool	fastp	A tool designed to provide fast all-in-one preprocessing for FastQ files mainly used to correct R1 and R2 reads for better merging	83, 84
Python tool	ncbi-genome-download	Some script to download bacterial and fungal genomes from NCBI after they restructured their FTP a while ago.	85
Pipeline	phyloFlash	phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an Illumina (meta)genomic or transcriptomic dataset.	86
Tool	DUDE-Seq	DUDE-Seq: Fast, flexible, and robust denoising of nucleotide sequences	92, 93
Python tool	RAMBL	A tool for the assembly of full-length 16S genes in metagenomic shotgun data	100, 101
Classification tool	CAMITAX	Taxonomic assignment workflow based on multiapproach	105, 106
Docker container	speciesprimer	The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems	111
tool	EnaBrowserTools	enaBrowserTools is a set of scripts that interface with the ENA web services to download data from ENA easily, without any knowledge of scripting required	116
Toolkit	NCBI Toolkit	NCBI C++ Toolkit provides free, portable, public domain libraries with no restrictions use - on Unix, MS Windows, and Mac OS platforms	119
tool	FastANI	Fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI)	120, 121
toolbox	EzBio tools	OrthoANI, UBCG and other useful tools for WGS analyses	122
data wrangling	Bioinformatics one-liners	Useful bash one-liners useful for bioinformatics	133
web-workbench	imngs	Integrated Microbial NGS platform	143, 144
Pipeline	Roary	Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome.	147
Tool	OrthoFinder	It finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplcation events in those gene trees.	149
Tool	CarveMe	CarveMe is a python-based tool for genome-scale metabolic model reconstruction.	152, 153, 154
Tool	SMETANA	Species METabolic interaction ANAlysis is a python-based command line tool to analyse microbial communities	155, 156
Tool	FRAMED	a python package for analysis and simulation of metabolic models. The main focus is to provide support for different modeling approaches	157 158 162
Tool	cobrapy	COBRA methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. cobrapy is a constraint-based modeling package that is designed to accommodate the biological complexity of the next generation of COBRA models and provides access to commonly used COBRA methods, such as flux balance analysis, flux variability analysis, and gene deletion analyses	159
Tool	GPRTransform	It contains an implementation of the method that transforms an SBML model by integrating the GPR associations directly into the stoichiometric matrix. This enables gene-based analysis using several constraint-based methods 163 164
Tool	eggnog-mapper	a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments	165 166
Pipeline	miQTL-cookbook	This is the cookbook for performing the GWAS analysis of microbial abundance based on analysis of 16S rRNA sequencing dataset	167
Tool	DuctApe	The final purpose of the program is to combine the genomic informations (encoded as KEGG pathways) with the results of phenomic experiments (Phenotype Microarrays) and highlight the genes that may be responsible for phenotypic variations	170
Tool	VFFVA	FVA is the workhorse of metabolic modeling. It allows to characterize the boundaries of the solution space of a metabolic model and delineates the bounds for reaction rates	174 175
Pipeline	BACTpipe	Automatic Assembly and Annotation from raw reads in a very clean implemented nextflow pipeline	178
Pipeline	MAG core	Automatic assembly and annotation from raw reads of metagenomic data implemented in nextflow pipeline	179
Pipeline	Tychus Nextflow	Automatic whole genome assembly and annotation of isolate strain. Uses multiple assemblers and takes consensus	180
Pipeline	IMP	Reference-independent metagenomic and metatranscriptomic bacterial assembly	182, 183
Tool	DESMAN	de novo extraction of strains from metagenomes, enables strain inference from frequency counts on contigs across multiple samples	184 185
SOP	MicroBiome Quality Control (MBQC)	MBQC is a collaborative effort to comprehensively evaluate methods for measuring the human microbiome	187
Pipeline	MIDAS	an integrated pipeline that leverages >30,000 reference genomes to estimate bacterial species abundance and strain-level genomic variation, including gene content and SNPs, from shotgun metagnomes	196 197
Tool	MAGpurify	algorithms to identify contamination in metagenome-assembled genomes (MAGs)	198
Tool	MicrobeCensus	a fast and easy to use pipeline for estimating the average genome size (AGS) of a microbial community from metagenomic data	199
Tool	IGGsearch	it accurately quantifies species presence-absence and species abundance by mapping reads to a database of species-specific marker genes	200
Tool	MIDAS-strains	Estimate strains from reads mapped to pan-genomes from the MIDAS database	201
Tool	AssemblyEvaluator	Evaluate the completedness and precision of a (meta)genomic assembly by mapping contigs to a complete reference genome	202
Tools	Biobakery Workflows	Set of tools by Huttenhower that can be fairly easily executed with pre-defined workflows, useful for metagenomics and metatranscriptomics	204
Tools	Anvi'o	Anvi’o is an open-source, community-driven analysis aation platform for ‘omics data	208 209 210 211
Tool	WAFFLE	the Workflow to Annotate Assemblies and Find Lateral Gene Transfer (LGT) Events	212
Tool	AUTOGRAPH	AUtomatic Transfer by Orthology of Gene Reaction Associations for Pathway Heuristics, is a semi-automatic approach to accelerate the process of genome-scale metabolic network reconstruction by taking full advantage of already manually curated networks	214
Tool	pyTARG	a library that contains functions to work with Genome Scale Metabolic Models with the goal of finding drug targets against cancer	223 224
Assembler	Unicycler	An assembler for short and long read hybrid assembly, works with SPADES and then something else for long reads.	227
R package	microclass	an R-package for 16S taxonomy classification	231 232
Tool	Prodigal	Fast, reliable protein-coding gene prediction for prokaryotic genomes	233 234
Tool	STAMP	a graphical software package that provides statistical hypothesis tests and exploratory plots for analysing taxonomic and functional profiles	235 236
Tool	CheckM	an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes	237 238
R script	consenTRAIT	Phylogenetic conservatism of functional traits in microorganisms. a phylogenetic metric that estimates the clade depth where organisms share a trait	239 240
NIH Tools	NIH Genome Inforamtics Section	Tools for various bioinformatic tasks, assembly, Mash, metagenomes, Krona, MUMmer alignment	242
R package	mmgenome	Tools for extracting individual genomes from metagneomes	243 244
Tool	SPAdes	St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines	254,365
Tool	SqueezeMeta	a fully automated metagenomics pipeline, from reads to bins	261 262
Tool	MetaWRAP	a flexible pipeline for genome-resolved metagenomic data analysis	263 264
R Package	HiMap	High-resolution Microbial Analysis Pipeline to Strain level with dada2 and curated HiMapDB	273 274
Research Group	van nimwegenlab	a range of software tools, web-services, and databases in regulatory and comparative genomics for WGS	275
Tool	Rnammer	predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences	278
Tool	RANGER-DTL	Rapid ANalysis of Gene family Evolution using ReconciliationDTL is a software package for inferring gene family evolution by speciation, gene duplication, horizontal gene transfer, and gene loss	279
Tool	Darkhorse	a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis	280
Tool	ABRicate	Mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with multiple databases: Resfinder, CARD, ARG-ANNOT, NCBI BARRGD, NCBI, EcOH, PlasmidFinder, Ecoli_VF and VFDB	286 334
Tool	MetaCompare	MetaCompare is a computational pipeline for prioritizing resistome risk by estimating the potential for ARGs to be disseminated into human pathogens from a given environmental sample based on metagenomic sequencing data	287
Tool	DeepARG	DeepARG is a machine learning solution that uses deep learning to characterize and annotate antibiotic resistance genes in metagenomes	288
Tool	SSTAR	Sequence Search Tool for Antimicrobial Resistance combines a locally executed BLASTN search against a customizable database with an intuitive graphical user interface for identifying antimicrobial resistance (AR) genes from genomic data	289 290
Tool	ProtCNN ProtENN	Predicting the function of a protein from its raw amino acid sequence is the critical step for understanding the relationship between genotype and phenotype	295
Benchmarking	Long-read-assembler-comparison	Benchmarking of long-read assembly tools for bacterial whole genomes	298
conda	bioconvert	is a collaborative project to facilitate the interconversion of life science data from one format to another	299
Tool	bin3C	Extract metagenome-assembled genomes (MAGs) from metagenomic data using Hi-C	303 304
Tool	MAGpy	Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)	305 306
Tool	graftM	a tool for scalable, phylogenetically informed classification of genes within metagenomes	307 308
Tool	GFinisher	a tool for refinement and finalization of prokaryotic genomes assemblies using the bias of GC Skew to identify assembly errors and organizes the contigs/scaffolds with genomes references	311 312
Tool	Autometa	automated extraction of microbial genomes from individual shotgun metagenomes	314 315
Tool	iMGEins	detecting novel mobile genetic elements inserted in individual genomes (MGEs)	316 317
Tool	McClintock	an Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data (MGEs)	320 321
Webtool	PHASTER	a better, faster version of the PHAST phage search tool	322 323
Tool	ISQuest	identifies bacterial ISs and their sequence elements—inverted and direct repeats—in raw read data or contigs using flexible search parameters (MGEs)	324 325
Tool	VirSorter	mining viral signal from microbial genomic data	326 327
Tool	RAST	(Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating complete or nearly complete bacterial and archaeal genomes	329 330
Tool	ShortBRED	Tool by Huttenhower group that identifies protein families in metagenomic samples. Useful for protein profiling??	336
Tool & R package	GSEA	Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes)	337 338
Tool, Database	GMMs Omixer	Tool with curated database by raes lab that links metagenomic samples to functions and metabolic capabilities	342, 343, 344, 523
Tool	GRASP2	fast and memory-efficient gene-centric assembly and homolog search for metagenomic sequencing data	345, 346
Tool	Picrust2	a software for predicting functional abundances based only on marker gene sequences	347, 348
Pipeline	Antimicrobial Resistance Finder	Nextflow pipeline to identify antimicrobial resistances protein sequences, looks simple to use	350
Tool	Geptop2	a gene essentiality prediction tool for complete-genome based on orthology and phylogeny	351, 352
Tool	Asgan	[As]sembly [G]raphs [An]alyzer – is a tool for analysis of assembly graphs	353
Tool	PopCOGenT	Identifying microbial populations using networks of horizontal gene transfer	355
Tool	PhiSpy	a novel algorithm for finding prophages in bacterial genomes that combines similarity-and composition-based strategies	356, 357
Tool	MetaCurator	Software for curating reference sequence databases used in barcoding, metabarcoding and metagenomics	359, 360
Tutorial	astrobiomike	This site aims to be a useful resource for bioinformatics beginners	361,362
Tool	(sour)Mash	fast genome and metagenome distance estimation using MinHas	363,364
Tool	(meta)pasmidSpades	for plasmid assembly in metagenomic data sets that reduced the false positive rate of plasmid detection compared with the state-of-the-art approaches	364,365
Tool	IslandViewer4	integrates four different genomic island prediction methods: IslandPick, IslandPath-DIMOB, SIGI-HMM, and Islander	366,367
Tool, Server	Specl	Web server (but also stand-alone tool) to determine species classification of whole genome based on ~40 universal single copy marker genes.	370
Tool	iRep	is a method for determining replication rates for bacteria from single time point metagenomics sequencing and draft-quality genomes	374,375
Tool	antiSMASH	allows the rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genomes	376,377,378
Tool	NeuRiPP	a neural network framework designed for classifying peptide sequences as putative precursor peptide sequences for RiPP biosynthetic gene clusters	379,380
Tool	PhyloMagnet	Pipeline for screening metagenomes, looking for arbitrary lineages, using gene-centric assembly methods and phylogenetics	381,382
Tool	KrakenUnique	Kraken based tool for classifying metagenomic reads with an additional algorithm that checks for unique Kmer matches - maybe similar to cosmosID approach	383
Tool	Mash	Tool for classifying metagenomic reads similar to kraken which uses min Hash to identify species	384
Tool	RefSeq_mash	Tool for checking what NCBI reference genomes raw reads match to or overall which reference genome fits the best, should be very fast.	385
Pipeline	Hybrid Assembler	Hybrid Assembly pipeline in Nextflow thats coupled with a plasmIDent which identifies plasmids and resistance genes	390, 391
Tool	RMI	Comprehensive antimicrobial resistance (AMR) gene finder tool online for quick analysis of genome sequences	392
Pipeline	SqueezeMeta	A full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis	394, 395
Review	Identifying repeats and transposable element	Nice nature review that describes various software for finding these things but a bit oldated	395
Tool	ARDaP	Antimicrobial Resistance Detection and Prediction) is a genomics pipeline for the comprehensive identification of antibiotic resistance markers from whole-genome sequencing data	399
Tool	Flye	New long read assembler thats faster and often better than others published by USCD	400
Tool	Ra	Overlap-layout-consensus based DNA assembler of long uncorrected reads (short for Rapid Assembler)	403, 404
Tool	Metagenomics-Index-Correction	This repository contains scripts used to prepare, compare and analyse metagenomic classifications using custom index databases, either based on default NCBI or GTDB taxonomic systems	405, 406
Tool	drep	a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set	407
Tool	strainProfiler	Program to analyze strain-level diversity within a population	408
Tool	seqtk	Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format	409, 410
Tool	anvio to bandage tools	converts output from Anvi'o, a MAG binning tool, to the coloring scheme preferred by Bandage, an assembly visual tool, to improve binning especially for mobile genes (tranposons, recently horizontally transferred, etc.) 413
Tool	OPERA-MS	OPERA-MS is a hybrid metagenomic assembler which combines the advantages of short and long-read	414, 415
Tool	traitar	Traitar is a software for characterizing microbial samples from nucleotide or protein sequences. It can accurately phenotype 67 diverse traits.	418, 419, 420
Tool	PhyloRank	PhyloRank provide functionality for calculating the relative evolutionary divergence (RED) of taxa in a tree and for finding the best placement of taxonomic labels in a tree.	421
Tool	AnnoTree	is a web tool for visualization of genome annotations across large phylogenetic trees.	422, 423, 424
Tool	AMRfinderPlus	Antibiotic resistance gene finder from NCBI	425, 426, 678
Tool	nanotext	This library enables the use of embedding vectors generated from a large corpus of protein domains to search for similar genomes, where similar is the cosine similarity between one genome's vector and another's. Think about protein domains as words, genomes as documents, and search as a form of document retrieval based on the notion of topic.	427, 428, 453
Tool	biomartr	Download genomes from NCBI or other databases by specifying species or group name automatically in R	429
Tool	Starmr	Tool in bioconda to scan for through plasmidfinder, Resfinder, pointfinder and then produce nice summary files with the results	430
Tool	TRF	Tandem Repeat Finder and Tandem Repeats Database (TRDB)	432, 433
Tool	MIST	a tool for rapid in silico generation of molecular data from bacterial genome sequences	434, 435
Tool	mummer	Visualization of correct aligment between genomes	436, 887, 888, 889
Tool	Dot2dot	accurate whole-genome tandem repeats discovery	437, 438
Tool	miCompletete	An "easy" to use tool to quickly assess the completeness and quality of new genome assemblies, kind of like checkM but with some tweaks	439
Tool, Database	ARO	Antibiotic resistance ontology database and webserver to quickly get phenotype information based on genes IDs	440, 441
Webapp	LINbase	a database designed for the purpose of accelerating and simplifying the description of Earth's microbial diversity at a precision that includes, but also goes beyond, named species	447, 448
R package	RbioRXN	facilitate retrieving and processing biochemical reaction data such as Rhea, MetaCyc, KEGG and Unipathway, the package provides the functions to download and parse data, instantiate generic reaction and check mass-balance. The package aims to construct an integrated metabolic network and genome-scale metabolic model	450
Tool	Mumame	Mutation Mapping in Metagenomes is a software tool that allows mapping of shotgun metagenomic reads to point mutations. Designed for Antibiotic Resistance mutations	451, 452
Tool	Cobra	Constraint-based reconstruction and analysis (COBRA) provides a molecular mechanistic framework for integrative analysis of experimental molecular systems biology data and quantitative prediction of physicochemically and biochemically feasible phenotypic states	460, 461, 462, 467
Tool	METABOLIC	(METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable high-throughput metabolic and biogeochemical functional trait profiler based on microbial genomes	463, 464
Tool	PhenotypeSeeker	Identify phenotype-specific k-mers and predict phenotype using sequenced bacterial strains	465, 466
R-package	MetaboAnalystR	An R Package for Comprehensive Analysis of Metabolomics Data	468, 472, 473
Shiny-App	MetaboShiny	a novel R and RShiny based metabolomics data analysis package	469, 470, 471
Tool	micom	micom is a Python package for metabolic modeling of microbial communities	492, 493, 494
Tool	Struo	a pipeline for building custom databases for common metagenome profilers	498, 499
Tool	ubialSim	This is µbialSim (pronounced microbialsim), a dynamic Flux-Balance-Analysis-based simulator for complex microbial communities. Batch and chemostat operation can be simulated	500, 501
Tool	ConFindr	to find bacterial intra-species contamination in raw Illumina data. It does this by looking for multiple alleles of core, single copy genes.	507, 508, 722
Tool	MetaSanity	a wrapper-script for genome/metagenome evaluation tasks. This script will run common evaluation and annotation programs and create a BioMetaDB project with the integrated results	509
Tool	REAPR	From Sanger institute, it maps paired-end reads to de-novo assembly to check for assembly errors and can break up wrong scaffolds	511
Tool	Kaiju	Metagenomic read classification based on Amino acid sequences. Suggested by Gabi that it works well	512
Tool	mOTU2	The mOTUs profiler is a computational tool that estimates relative abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.	513, 514
Tool	fetchMG	it extracts the 40 MGs from genomes and metagenomes in an easy and accurate manner.	515
Tool	Metage2Metabo	is a Python3 (Python >= 3.6) tool to perform graph-based metabolic analysis starting from annotated genomes (reference genomes or metagenome-assembled genomes). It uses Pathway Tools in a automatic and parallel way to reconstruct metabolic networks for a large number of genomes	518, 519
R package	AMR	simplify the analysis and prediction of Antimicrobial Resistance (AMR)	520, 521, 878
Tool	GRASE	Genome Relative Abundance to Sequencing Effort (GRASE)	522
Tool	FMAP	Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies	524, 525, 526
Tool	ResPipe	A nextflow-pipeline for interrogating metagenomes for Antimicrobial Resistance Genes (CARD-based), Insertion Sequences and Enterobactericeae Plasmids	527, 528
Tool	epa-ng	A tool to place a sequence among an already calculated tree such as SILVA. Similar to pplacer	535
Tool	ngs-less	A toolbox for metagenomics analyeses by Peer Bork at Embl. Has MOCAT integrated with mOTUs and functional profiling	536
R package	Castor	Interesting to calculate relative evolutionary divergence (RED) with get_reds to calculate relative evolutionary divergences in a tree	537, 538
R package	themetagenomics	themetagenomics provides functions to explore topics generated from 16S rRNA sequencing information on both the abundance and functional levels. It also provides an R implementation of PICRUSt and wraps Tax4fun, giving users a choice for their functional prediction strategy	543, 544
Tool	prokka2kegg	This script is used to assign KO entries (K numbers in KEGG annotation) according to UniProtKB ID in the .gbk file generated by Prokka	546
Toolset	PAGIT	From Wellcome Sanger Institute a set of tools to polish draft genomes and correct annotation	547
Tool	DFAST	a flexible and customizable pipeline for prokaryotic genome annotation as well as data submission to the INSDC	552, 553
Tool	DeepVariant	is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data	556
Tool	Apollo	Apollo is an assembly polishing algorithm that attempts to correct the errors in an assembly. It can take multiple set of reads in a single run and polish the assemblies of genomes of any size	563, 564
Tool	Minipolish	A tool for Racon polishing of miniasm assemblies	566
Tool	AMON	A command line tool for predicting the compounds produced by microbes and the host	567
Tool	Coinfinder	A tool for the identification of coincident (associating and dissociating) genes in pangenomes	568, 569, 570
Tool	wtdbg2	Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT)	571, 572
Tool	freebayes	a haplotype-based variant detector	573, 574, 578
Tool	qualimap	to facilitate the quality control of alignment sequencing data and its derivatives like feature counts; like FastQC for WGS and MAGs	579, 580
Tool	picard	A set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats from broadinstitute	581, 582
Tool	Diamond	is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data	583, 584
Tool	vcftools	a set of tools for working with the variant call format (VCF) and binary variant call format (BCF)	585, 586, 587
Tool	Gretel	An algorithm for recovering haplotypes from metagenomes	589, 590
Tool	Hansel	Computational haplotype recovery and long-read validation identifies novel isoforms of industrially relevant enzymes from natural microbial communities	591, 592
Tool	metabolisHHM	a tool for exploration of microbial phylogenies and metabolic pathways	593, 594
Tool	ConjScan	MacSyFinder-based detection of Conjugative elements using systems modelling and similarity search	597
Tool	MacSysFinder	A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems	598, 599, 600
Tool	LEMON	It is a software takes use of existing shotgun NGS datasets to detect HGT breakpoints, identify the transferred genome segments, and reconstructs the inserted local strain	601, 602
Tool	MMseqs2	Many-against-Many sequence searching is a software suite to search and cluster huge protein and nucleotide sequence sets	604, 605, 606
Pipeline	MicrobiomeBestPracticeReview	Current Challenges and Best Practice Protocols for Microbiome Analysis using Amplicon and Metagenomic Sequencing	607, 608
Tool	Medaka	is a tool to create a consensus sequence using neural networks from nanopore sequencing data	609, 610
Software	ARB	a graphically oriented package comprising various tools for sequence database handling and data analysis	611
Tool	Piphillin	a software package that predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases	613, 614
Tool	BlastFrost	a highly efficient method for querying 100,000s of genome assemblies. BlastFrost builds on the recently developed Bifrost, which generates a dynamic data structure for compacted and colored de Bruijn graphs from bacterial genomes	617, 618
Tool	BioNode	Command line tool for handy NGS data procedures, searching NCBI, downloading SRA stuff or handling fasta files.	622
Tool	Biopieces	Command line tool for a lot of NGS data procedures, fastq files, mapping, SNPs, etc. but has some dependencies...	623
Tool	GrabSeqs	Command line tool to download sequence files from SRA, iMicrobes, MG-rast easily	626
Tool	fARGene	(Fragmented Antibiotic Resistance Gene iENntifiEr ) is a tool that takes either fragmented metagenomic data or longer sequences as input and predicts and delivers full-length antiobiotic resistance genes as output	627, 628
Tool	GTDBTk-Script	various useful scripts related to GTDB	629
Tool	Cello	the code is parsed to generate a truth table, and logic synthesis produces a circuit diagram with the genetically available gate types to implement the truth table. The gates in the circuit are assigned using experimentally characterized genetic gates.	633,634,635
Tool	URMAP	The Ultra-fast Read Mapper (URMAP) is a fast, accurate read mapping with highly compressed output. It is ~10x faster than BWA and Bowtie with comparable accuracy on benchmark tests	636, 637
Tool	Artemis	The Artemis Software is a set of software tools for genome browsing and annotation	640
Tool	EDGAR 2.0	"Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" is an enhanced software platform for comparative gene content analyses	641, 642, 643
Tool	ASA3P	an automatic and scalable assembly, annotation and analysis pipeline for closely related bacterial genomes	644, 645, 646
Tool	BIGSdb	a software designed to store and analyse sequence data for bacterial isolates	647, 648, 649, 650
Tool	OrthoVenn2	is a web platform for comparison and annotation of orthologous gene clusters among multiple species	651, 652
Tool	genomeribbon	easy to use website to assess a genome assembly with raw reads, long reads and short reads	653
R package	FindMyFriends	Fast alignment-free pangenome creation and exploration	654, 655
R package	dadasnake	is a Snakemake workflow to process amplicon sequencing data, from raw fastq-files to taxonomically assigned "OTU" tables, based on the DADA2 method	660, 661
Tool	AMRtime	Metagenomic AMR detection using hierarchical machine learning models	662
Tool	panaroo	An updated pipeline for pangenome investigation	663, 664
Pipeline	TORMES	An automated pipeline for whole bacterial genome analysis of genomes and/or raw Illumina paired-end sequencing data, regardless the number, origin or species	665, 666
Pipeline	ASAP3	Automatic Bacterial Isolate Assembly, Annotation and Analyses Pipeline	667, 668
Pipeline	nullarbor	Pipeline to generate complete public health microbiology reports from sequenced isolates	669
Pipeline	Bactopia	Bactopia is a flexible pipeline for complete analysis of bacterial genomes	670, 671
Pipeline	Common Workflow Language	an open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments	673
Metric	bacterialEvolutionMetrics	Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries	675, 676
Tool	NGSpeciesID	is a tool for clustering and consensus forming of targeted ONT reads	677, 678
Catalogue	long-read-tools	A CATALOGUE OF LONG READ SEQUENCING DATA ANALYSIS TOOLS	681
Tool	fARGene	Fragmented Antibiotic Resistance Gene iENntifiEr	682, 683
Pipeline	PathoFac	a pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data	684, 685
Tool	MFEprimer	a functional primer quality control program for checking non-specific amplicons, dimers, hairpins and other parameters	686, 687, 688
Pipeline	STRONG	STRONG resolves strains on assembly graphs by resolving variants on core COGs using co-occurrence across multiple samples	689, 690, 691,704
Tool	NanoClust	De novo clustering and consensus building for ONT 16S sequencing data	694
Tool	mVIRs	a tool that locates integration sites of inducible prophages in bacterial genomes	697
Tool	Metagenome-Atlas	a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, to Annotation	698, 699, 700, 701
Tool	VIRify	a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies	702
Plattform	BioContainers	is a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity)	705, 706
Tool	DeepMAsED	deep-learning based evaluating the quality of metagenomic assemblies	708, 709
Tool	minMLST	a machine-learning based methodology for identifying a minimal subset of genes that preserves high discrimination among bacterial strains	713, 714
Tool	hAMRonization	CLI parser tools combine the outputs of disparate antimicrobial resistance gene detection tools into a single unified format	715
Tool	PPanGGOLiN	Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors	717, 718
Webtool	OGB	OpenGenomeBrowser is a dynamic and scalable web platform for comparative genomics	719, 720
Pipeline	Bakta	a tool for the rapid & standardized annotation of bacterial genomes & plasmids	721
Tool	MentaLiST	The MLST pipeline developed by the PathOGiST research group	725, 726
Webapp	TyphiNET	The TyphiNET dashboard collates antimicrobial resistance (AMR) and genotype (lineage) information extracted from whole genome sequence (WGS) data from the bacterial pathogen Salmonella Typhi, the agent of typhoid fever.	727
Webapp	Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi. MLST prediction is available for over 100 species using schemes from PubMLST, Pasteur, and Enterobase	728
Tool	mlst	Scan contig files against traditional PubMLST typing schemes	729
Tool	snippy	Rapid haploid variant calling and core genome alignment	733
Tool	MUFFIN	a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis.	734, 735, 736
Tool	Pandora	a tool for bacterial genome analysis using a pangenome reference graph (PanRG)	738, 739, 740
Tool	cgmlst	Fork of Torsten Seemanns excellent mlst tool modified for cgMLST	741
Tool	Phandango	a fully interactive tool to allow visualisation of a phylogenetic tree, associated metadata and genomic information such as recombination blocks, pan-genome contents or GWAS results	741, 742
R package	Enriched heatmap	is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions	747, 748
R package	Pagoo	is an encapsulated, object-oriented class system for analyzing bacterial pangenomes	752, 753, 754, 834
R package	simurg	Simulate a Bacterial Pangenome in R	754, 755
Nextflow	Porefile	a Nextflow full-length 16S profiling pipeline for ONT reads	757
Tool	MLSTar	R package allows you to easily determine the Multi Locus Sequence Type (MLST) of your genomes	758, 759
Tool	MOB-suite	for clustering, reconstruction and typing of plasmids from draft assemblies	760, 761
Tool	PlasForest	a random forest classifier of contigs to identify contigs of plasmid origin in contig and scaffold genomes	763, 764
Tool	GMGC-mapper	Command line tool to query the Global Microbial Gene Catalog (GMGC)	774
Tool	MetaGraph	Ultra Scalable Framework for DNA Search, Alignment, Assembly of bacterial sequences	775, 776, 777, 778
Tool	MIND	Microbial Interaction Network Database	786
Pipeline	microPIPE	a pipeline for high-quality bacterial genome construction using ONT and Illumina sequencing	787
Tool	giraffe	variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods	795, 796
Tool	SquiggleKit	A toolkit for accessing and manipulating nanopore signal data	798, 799, 800
Tool	FlowerPlot	A Python 3.9+ function that makes flower plots for pangenomics	804
Tool	Poppunk	a tool for clustering genomes	807
Tool	PATO	a R package designed to analyze pangenomes (set of genomes) intra or inter species	810, 811
Tool	PanX	is a software package for comprehensive analysis, interactive visualization and dynamic exploration of bacterial pan-genomes	812
Tool	3mcor	Metabolome-Microbiome-Metadata-Correlation Analysis	814
Tool	GenAPI	a program for gene presence absence table generation for series of closely related bacterial genomes from annotated GFF files	829, 830
Tool	bammix	Summarise nucleotide counts at a set of positions in a BAM file to search for mixtures	835
Tool	Wolka	(Web of Life Toolkit App), is a bioinformatics package for shotgun metagenome data analysis	836, 837
Tool	ECTyper	is a standalone versatile serotyping module for Escherichia coli	838, 839
Tool	Serotypefinder	is a serotyping module for Escherichia coli	840, 841
Tool	SRST2	Short Read Sequence Typing for Bacterial Pathogens	842, 843
Tool	KEMET	a python tool for KEGG Module evaluation and microbial genome annotation expansion (Metabolic)	844, 845
Tool	SIAMCAT	Statistical Inference of Associations between Microbial Communities And host phenoTypes	846, 847
Collection	EMBL	Microbiome Analysis Tools Developed at EMBL	848
Tool	BacDist	Snakemake pipeline for bacterial SNP distance, recombination and phylogenetic analysis	849
Tool	PacTyper	Snakemake pipeline for continuous clone type prediction for WGS sequenced bacterial isolates based on their core genome	850
Pipeline	CulebrONT	a streamlined long reads multi-assembler pipeline for prokaryotic and eukaryotic genomes	857, 858
Tool	gapseq	Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks	859, 860
Tool	MicrobiomeAnalysis	This package provides common methods for microbiome analysis	863, also see 852
Tool	MiMiC	proposes minimal microbial consortia from the functional potential of a given metagenomic sample	864, 865
Tool	PIRATE	identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds	867, 868
Tool	bacterial_strain_definition	Contains the code and workflow for the bacterial strain definition paper with Kostas Kostantinidis	869, 870
Tool	CheckM2	Rapid assessment of genome bin quality using machine learning	876
Tool	Gubbins	Genealogies Unbiased By recomBinations In Nucleotide Sequences	879, 880
Tool	SKA	a toolkit for prokaryotic DNA sequence analysis (phylogeny) using split kmers	881, 882
Tool	Mashtree	a rapid comparison of whole genome sequence files	883, 884
Pipeline	mGEMS	Bacterial sequencing data binning on strain-level based on probabilistic taxonomic classification	885, 886
Tool	D-GENIES	Dot plot large Genomes in an Interactive, Efficient and Simple way	893, 894, 895
Tool	nanotimeparse	parses an Oxford Nanopore fastq file on read sequencing start times found in the fastq headers	897
Tool	ClonalFrameML	package that performs efficient inference of recombination in bacterial genomes	899, 900
Tool	minidot	Quickly produce pretty dotplots from minimap mappings using R/ggplot2	903

Biostatistics

Category	Name	Description	Link
Data-Types	Microbiome Datasets Are Compositional: And This Is Not Optional	Why OTU tables need to be handled more carefully - They are compositional!	1
Compositional approach	CoDa	This directory contains the readings, materials, and examples for a workshop originally offered at the Exploring Human Host-Microbiome Interactions in Health and Disease 2016 conference.	6; wiki 7
Compositional approach	Frontiers_supplement.Rmd	The document is the supplement and companion to the "Microbiome datasets are compositional: and this is not optional." review article.	9
R package	CoDaSeq	This is the ongoing work to put together a complete suite of functions for CoDa analysis of microbiome, transcriptome and metagenome data	16
Compositional approach	PhILR	PhILR is short for “Phylogenetic Isometric Log-Ratio Transform” This R-package provides functions for the analysis of compositional data (e.g., data representing proportions of different variables/parts).	25 26
R package	PathoStat	The purpose of this package is to perform Statistical Microbiome Analysis on metagenomics results from sequencing data samples. In particular, it supports analyses on the PathoScope generated report files.	28
R package	microbiome	Tools for microbiome analysis; with multiple example data sets from published studies; extending the phyloseq class. The package is in Bioconductor and aims to provide a comprehensive collection of tools and tutorials, with a particular focus on amplicon sequencing data.	29
R package	phyloseq	phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.	30 31
Stat Comparing	DA test	Package to check various statistical methods to find "spike-ins" in 16S microbiome data	36
R Package	Mare	Promising easy microbiome analysis - find out what taxa correlate with certain metadata (Not validated yet)	41
R Package	PCAexplorer	Package to make interactive PCA plots in browser, originally for RNA-seq but maybe adaptable	43
R Package	Glimma	Interactive visualization of DEseq2 results, might be very helpful in exploration	44
R Package	CoDaSeq	Compositional Data Analysis Package written by Greg Gloor	53
dimensionality reduction	Adaptive gPCA	A method for structured dimensionality reduction	61
R Package	theseus	Add-on for phyloseq	62
R Package	decotam	implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls	65
Analysis Tutorial	Workflow by Holmes Lab	A nice tutorial/ workflow for a suggested workflow in microbiome analysis by the Holmes lab	69
R Package	phyloseqGraphTest	Convinient and easy to use package for graphical testing with phyloseq objects	70
R Package	ccrepe	Compositionality Corrected by PErmutation and REnormalization (ccrepe) is a package for analysis of sparse compositional data. Specifically, it determines the significance of association between features in a composition, using any similarity measure (e.g. Pearson correlation, Spearman correlation, etc.)	77,78
Network Analysis	NetShift	To visualize community shufflings in microbial association networks between healthy and diseased states and identify 'driver' nodes observed between the states.	79,80
R Markdown	Differential Abundance tests Microbiome	Fairly well documented implementations of many different Differential Abundance tools, useful to take some function.	87
Statistics Approach	Percentile-normalization method	A novel & easy way to deal with batch effects when comparing multiple experiments	88, 89
Software	Latent Variable Modeling for the Microbiome	probabilistic latent variable models are a cornerstone of modern unsupervised learning, they are rarely applied in the context of microbiome data analysis, in spite of the evolutionary, temporal, and count structure that could be directly incorporated through such models	107 108
Python	HAllA	Hierarchical All-against-All association testing (HAllA) is computational method to find multi-resolution associations in high-dimensional, heterogeneous datasets	117
tutorial	Transformation vs Standardization	Data Standardization and Transformation	127
R package	BioCor	Calculates functional similarities based on the pathways described on KEGG and REACTOME or in gene sets. These similarities can be calculated for pathways or gene sets, genes, or clusters and combined with other similarities	130 131
R package	Phylofactor	The package phylofactor will help you break apart the phylogeny with a variety of contrasts & objective functions, summarize the splits, and visualize the tree.	137 138 139
R package	themetagenomics	provides functions to explore topics generated from 16S rRNA sequencing information on both the abundance and functional levels. It also provides an R implementation of PICRUSt and wraps Tax4fun, giving users a choice for their functional prediction strategy.	145 146
R package	selbal	an R package for selection of balances in microbiome compositional data. It implements a forward-selection method for the identification of two groups of taxa whose relative abundance, or balance, is associated with the response variable of interest	173
R package	microPop	a dynamic model based on a functional representation of different microbiota	225 226
Article	Networks for Microbiota Analysis	A nice summary of a lot of network theory and how it is used for microbiota analysis and what the open questions are	255
R package	metamicrobiomeR	implements Generalized Additive Model for Location, Scale and Shape (GAMLSS) with zero inflated beta (BEZI) family for analysis of microbiome relative abundance data (with various options for data transformation/normalization to address compositional effects) and random effect meta-analysis models for meta-analysis pooling estimates across microbiome studies	282 283
R package	metagenomeSeq	is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples	292
R package	PIME	a package for discovery of novel differences among microbial communities	300 301
R package	GLMM-MiRKAT	A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies	309 310
R package	MIMOSA	Model-based Integration of Metabolite Observations and Species Abundances	339, 340
Tool	new mmvec (old:rhapsody)	Neural networks for estimating microbe-metabolite co-occurence probabilities	354
R package	BacArena	an open source software for simulating cellular communities. It combines agent-based modeling, flux balance analysis, and statistical analysis	503, 504, 542
Tool	BOFdat	is a three step workflow that allows modellers to generate a complete biomass objective function de novo from experimental data: Obtain stoichiometric coefficients for major macromolecules and calculate maintenance cost; Find coenzymes and inorganic ions; Find metabolic end goals	505, 506
R package	Corncob	beta-binomial regression on covariates - might be a nice statistical test on abundance data and variables of interest	531
R package	rtsne	T-Distributed Stochastic Neighbor Embedding (t-SNE) using a Barnes-Hut Implementation	539, 540, 541
R package	microbiomeDASim	A toolkit for simulating differential microbiome data designed for longitudinal analyses. Several functional forms may be specified for the mean trend	548
R package	MMUPHin	an R package for meta-analysis tasks of microbiome cohorts. It has function interfaces for: a) covariate-controlled batch- and cohort effect adjustment, b) meta-analysis differential abundance testing, c) meta-analysis unsupervised discrete structure (clustering) discovery, and d) meta-analysis unsupervised continuous structure discovery	549
R package	ReactomeGSA	uses Reactome's online analysis service to perform a multi-omics gene set analysis	550
R package	LinkHD	a general R software to integrate heterogeneous dataset focusing on micribial communities	554, 555
R, Python	Rest API	Fast Scalable Machine Learning API	576, 577
R package, Webapp	Metaboanalyst	a user-friendly, web-based analytical pipeline for high-throughput metabolomics studies	618, 619
R package	SIAMCAT	R package for easy microbiome analysis - confounder analysis - phenotype prediction - Zeller group	620
R package	Breakaway	R package for r functions for alpha diversity measurements	621
R package	seqgroup	The seqgroup R package offers a collection of functions that support the analysis of microbial sequencing data with a group structure	631, 632
R package	ranomaly	R package for statistical analyses and visualization of 16S data	656, 657
R package	RioNorm2	A Novel Normalization and Differential Abundance Test Framework for Microbiome Data	658, 659
R package	phylosmith	A conglomeration of functions that I have written, that I find useful, for analyzing phyloseq objects. Phyloseq objects are a great data-standard for microbiome and gene-expression data	692, 762
R package	MicEco	Various functions for analysis for microbial community data	693
R package	MaAsLin2	A comprehensive R package for efficiently determining multivariable association between phenotypes, environments, exposures, covariates and microbial meta’omic features	730, 731
R package	micropml	User-Friendly R Package for Supervised Machine Learning Pipelines	749, 750, 751
R package	shinyML	Compare Supervised Machine Learning Models Using Shiny App	789
R package	UMAP	Uniform Manifold Approximation and Projection for Dimension Reduction	793, 794
R package	MIMOSA2	summarizes paired microbiome-metabolome datasets to support mechanistic interpretation and hypothesis generation	813
R package	microViz	for analysis and visualization of microbiome sequencing data 825, 826
R package	mia	implements tools for microbiome analysis based on the SummarizedExperiment	852
R package	CARlasso	Conditional Auto-Regressive LASSO in R	853, 854
R package	microPopGut	R package for simulating microbial populations in the human colon	871

Visualization

Category	Name	Description	Link
Analysis tool	Calour	an Interactive, Microbe-Centric Analysis Tool	102 103 104
R package	KEGGgraph	graph approach to KEGG PATHWAY in R and Bioconductor	128
R package	pathview	Pathview is a tool set for pathway based data integration and visualization based on KEGG data	129
R package	annotate	Annotation for microarrays and GOs	132
Tool	SegmentalDuplicationsCircos	plots circular genomes	186
Tool	Keanu	A tool for viewing the contents of metagenomic samples	194 195
R package	ampvis2	An R package to visualise amplicon data	245
Python Tool	Bokeh	Creating interactive low-level visualizations with Python, kind of like ggplotly	246
Tool	icarus	Icarus is a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on QUAST genome quality assessment tool	247
Tool	metaQuast	MetaQUAST evaluates and compares metagenome assemblies based on alignments to close references. It is based on QUAST genome quality assessment tool, but addresses features specific for metagenome datasets	248 249
Web-App	ITOL	Interactive Tree Of Live	259
R package	magick	The new magick package is an ambitious effort to modernize and simplify high-quality image processing in R	285
App	Lucid Align	A modern sequence alignment viewer	297
R package	HTML Widgets	Very nice packages to create more interactive visualizations like plots and tables in HTML Rmd output	302
R package	ggpubr	an excellent and flexible package for elegant data visualization in R and publication ready figures	396
R package	metacoder	parsing, plotting, and manipulating large taxonomic data sets	397
Tool	Krona	Visualization tool to show hirarchical datasets such as metagenomic samples. Used by cosmosID and other services. Created in Excel or dedicated import tools	444
Shiny Webapp	PlotTwist	a web app for plotting and annotating time series data	445, 446
R package	KEGGREST	A package that provides a client interface to the KEGGREST server	516
Webapp	iPath	Interactive Pathways Explorer (iPath) is a web-based tool for the visualization, analysis and customization of various pathway maps. Covers microbial metabolism in diverse environments	533, 534
R package	Cowplot	The cowplot package provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images	561
R package	patchwork	The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic	562
R Package	karyoploteR	R package to visualize genomic features on genomes - can plot anything that has genomic coordinates - maybe read depth of sequencing too	565
R tutorial	kateto	Network visualization with R	575
R package	rayshader	is an open source package for producing 2D and 3D data visualizations in R	638, 639
Webapp	biorender	a webapp for scientific illustrations with template icons to use	672
App	SnapGeneViewer	SnapGene Viewer includes the same rich visualization, annotation, and sharing capabilities as the fully enabled SnapGene software	679
R script	AnnVis	Tutorial to visualize prokka output using gggenes package	680
R package	ggseqlogo	a versatile R package for drawing sequence logos	695, 696
R Markdown	webpage	Creating websites in R	716
App	TreeViewer	Flexible, modular software to visualise and manipulate phylogenetic trees	723
Software	Graphia	a powerful open source visual analytics application developed to aid the interpretation of large and complex datasets	732
R package	ComplexHeatmap	provides a highly flexible way to arrange multiple heatmaps and supports self-defined annotation graphics	744, 856
R package	circlize	circular visualization in R and circular heatmaps	745, 746, 823, 824
R package	ggsci	Scientific Journal and Sci-Fi Themed Color Palettes for ggplot2	768
R package	colorblindr	Simulate colorblindness in production-ready R figures	769
R package	scico	17 colorblind safe palettes	770, 771
R package	plumbertableau	Integrating Dynamic R and Python Models in Tableau Using plumbertableau	784, 785
R package	Boruta	Feature selection with the Boruta algorithm	788
R package	camcorder	to track and record the ggplots that are created across one or multiple sessions with the eventual goal of creating a gif showing all the plots created sequentially	790
R package	ggiraph	a tool that allows you to create dynamic ggplot graphs	797
R package	ggsvg	is an extension to ggplot to use arbitrary SVG as points	817
R package	gtsummary	provides an elegant and flexible way to create publication-ready analytical and summary tables using the R programming language	819
Webapp	Datawrapper	lets you show your data as beautiful charts, maps or tables with a few clicks	820
R package	mmtable2	Create and combine tables with a ggplot2/patchwork syntax	822
Webapp	Lucidchart	is the intelligent diagramming application that brings teams together to make better decisions and build the future	833
R Package	ampvis2	an R-package to conveniently visualise and analyse 16S rRNA amplicon data in different ways from phyloseq data	831, 832
Webpage	From Data to Viz	is a classification of chart types based on input data format	855
Cheat Sheet	Graphics Principles	Cheat Sheet for correct graphics visualization	867
R Package	GenoVi	generates circular genome representations for complete or draft bacterial and archaeal genomes	872, 873
R Package	ggcoverage	Visualize and annotate genome coverage with ggplot2	874, 875
R package	ggside	to enable users to add metadata to their ggplots with ease	877
R package	dotplotly	Create an interactive or static dot plot from mummer output OR PAF format	890
R package	ganttrify	nice-looking Gantt charts	901, 902
R package	fastbaps	The fast BAPS algorithm is based on applying the hierarchical Bayesian clustering (BHC) algorithm to the problem of clustering genetic sequences using the same likelihood as BAPS	906, 907

Pipeline Managers

Category	Name	Description	Link
R package	targets	Managing bioinformatics pipelines with R	779, 780, 781
Tutorial	bioinformatics-workflows	Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers	783

Modelling

Category	Name	Description	Link
Article/Paper	Butyrate & Propionate Pathways	Paper by Flint group that describes the pathways that lead to butyrate and propionate product and crossfeedbing in anaerobic bacteria	228
Tool	curveball	Predicting competition results from growth curves	368,369
Database	Virtual Metabolic Human	Genome wide metabolic models for 800+ different type strains from the human gut ready for extension	449
Tool	MICOM	Python package using COBRApy for microbial community modelling - sent by lacroix/tomas from recent publication	615 Publication

Other Resources

Category	Name	Description	Link
Stastic Methods Explanations	GUSTA ME	Wesbite with intuitive explainations of why to use some methods and how to use them	2 Publication: 3
Helpful R Scripts	DECIPHER	A website where he describes several helpful bioinformatic analyses & how to implement them	5
Primer Design	RUCS	RUCS - Rapid Identification of PCR Primers Pairs for Unique Core Sequences	14; webapp: 15
DB for Bac.
Genome Annotation	The SEED	DB curated by experts to annotate the genome features in bacteria. Hopefully useful to quickly scan what pathways our bacteria have or don't have.	331 332 333
DB of Reference Genomes	HumanMicrobiomeProject	Collection of many bacterial genomes sequenced up to 'draft-quality' and some up to 'gold-standard', probably helpful to analyze gene content of microbiomes and compare with PB	Catalog: 21, DataBrowser: 22
Classical Microbiome Pipeline	Applied Bioinformatics Book	An open-source book on applied bioinformatics - it has a great chapter on classical diversity analysis (UniFrac etc.)	Diversity Chapter: 23, Whole Book: 24
PhyloSeq Extension	MetagMisc	R package to export phyloSeq object easily into dataframes, etc.	27
Download NCBI Genomes	ncbi-genome-download	Some script to download bacterial and fungal genomes from NCBI after they restructured their FTP a while ago.	34
Phylo Trees	Randi Griffin Blog	Great blog to show some examples on how to create useful phylo trees and heatmaps etc.	37
R-package	biomartr	The Biological Sequence Retrieval package allows users to retrieve biological sequences in a very simple and intuitive way. Using biomartr, users can retrieve either genomes, proteomes, CDS, RNA, GFF, and genome assembly statistics data using the specialized functions	38
Rmd Templates	rticles	A package that includes templates for many journal articles	40
Tutorial	DEseq2 for microbiome	DEseq2 analysis tutorial with PhyloSeq by Susan Holmes!	42 284 291 293 294
Data	MicrobiomeHD	Human Microbiome Data from healthy and diseased people by MIT lab - Eric Alm	45
Datasets	Google Datasets Search	Nice way to search for available datasets	51
Survey Statistics	MultiTable Data Analysis for Microbiome	Survery of methods in multi table statistics from Holmes Lab	52
Datasets	Qiita	open-source microbial study management platform. It allows users to keep track of multiple studies with multiple ‘omics data	71,72,73
Workflow	Holmes Microbiome Workflow	Complete workflow from raw fastq files to fancy multivariate statistics workflow with dada2, DESeq2, etc. with code!	74
R Package	ampvis2	useful tool for nice visualization of amplicon data. Easy & nice ordinations!	76
R-Markdown	Workshop	OPEN & REPRODUCIBLE MICROBIOME DATA ANALYSIS SPRING SCHOOL 2018	96 97
SOPs	IMMSA	The International Metagenomics and Microbiome Standards Alliance (IMMSA) is a non-hierarchical association of microbiome-focused researchers from industry, academia, and government	123
CNGBdb	China National GeneBank DataBase	Archive of a lot of chinese sequencing projects with very nice search function	140
Collection	nf-Core nextflow pipeline	A collection of high quality pipelines for bioinformatic analyses built with nextflow	181
Collection	Awesome Nextflow Pipelines	A collection of a bunch of bioinformatic pipelines in nextflow: 16S, assembly, etc.	188
Competition, SOP	Critical Assessment of Metagenome Assessment	Competition where tools are tested on accuracy for strain level binning and assembly (CAMI)	189
Tools	Sanger Pathogen Tools	A collection of tool made by Sanger institute for pathogen/antimicrobial resistance screening, visualization, assembly, annotation	190
Tool	Melonnpan by Biobakery Huttenhower	Method to predict metabolites from metagenomic reads, should be pre-trained but can also be tried with standard model	205
Tool	ARepA Huttenhower	Tool to download information from specific data repositories: gene interaction, functional association	206
Tool	PysraDB	Python library to quickly and systematically download data from NCBI Sequence Read Archive	207
Journal Article	Pangenome & Metagenome	Nice article from Meren Lab describing how Anvi'o is used to create pangenomes and analyze core genes vs. assesory genes	213
Tools	Chiron	Docker images and pipelines for metagenomic processing developed for HMP project workshops, includes Huttenhower software like humann2, strainphlan, qiime2	220
Tool	PANDA	Quick prediction of GO term annotation from Amino Acid sequence - only online service so far	221
Review	What is good genome assembler	A nice comparison of several genome assemblers for de-novo assembly, hybrid, short and long reads are all compared	241
Tool	NCBI Downloader	Command line tool to download genomes from NCBI and specify by all kinds of metadata	256
Collection	Microbiome_notes	A continually expanding collection of microbiome analysis tools	260
Blog	GoogleComputeEngineR	Blog with a lot of tutorials related to using R and google cloud instances	296
Tool	KOMODO	Online tool to predict on what media a bacterial strain will grow. Based on DSMZ databases and gene predictions	328
CheatSheet	Stanford Machine learning Cheatsheet	Cheat sheet that covers all basics and advanced methods in machine learning - summary of stanford course	341
Blog	Genomics Tools List	List of tools that are installed on a bioinformatics clusters, could have some interesting tools in there	349
SOPs	Microbiome-Standards	List of SOPs made by microbiome community aimed at coming up with very good standard SOPs for a wide array of microbiome analysis and data creating	373
Website	R Graph Library	Very cool website with all kinds of visualizations and how to create them in R - great inspiration	386
Blog	Shiny Examples	Example dashboards that were built with shiny R. Good for inspiration with source code	387
Tool	TrueBac ID	Online tool to do whole genome taxonomic identification using ANI and 16S depending whats more accurate	388, 389
Tools	Pathogen Informatics Sanger	Many tools by Sanger institute for pathogen analysis: Resistance genes, circulizing genomes, rapid pan genome generation	398
Blog	Klebsiella assembly and analysis	Nice Blog post describing up-to-date genome assembly and annotation and analysis of a virulent bacteria	401
Tool	PlasFlow	Neural Network for identifying whether contig sequences are from a plasmid or chromosome	402
Blog	Comparison of long-read assemblers	Comparison by rrwick of newest long read assemblers on how they can assemble bacterial genomes with plasmids	407
Tutorial-Blog	Tyler Barnum	How to Use Assembly Graphs with Metagenomic Datasets	412
Tutorial	Phylogenetic Tree visualization	Nice and complete tutorial about visualizaing data on phylogenetic trees in R with ggtree, very nice example figures	417
Tutorial	Functional enrichment analysis	Anvi'o v5.1: Functional Enrichment Analaysis and Computing ANI	431
Repository	Kipoi	repository of pre-made deep learning models for genomics	454
Knowledgebase	KBase	a DOE Systems Biology Knowledgebase, an open-source software and data platform that enables data sharing, integration, and analysis of microbes, plants, and their communities	455, 456, 457, 458, 459
Book	Computational Genomics with R	fundamentals for data analysis for genomics	502
Blog	Rmarkdown help	A nice guide to make rmarkdown documents beautiful and nice	517
Tutorial	Microbiome Analysis 2018	A nice tutorial website for statistical microbiome analysis from Leo Lathi	529
Tutorial	Microbiome Utilities	a wrapper tool R package for phyloseq	530
Review	Data Science in Microbiome	A nice review by Leo Lathi for various tools and methods available for microbiome analysis with references to the specific tools that implement methods	532
Tool, API	IPATH python wrapper	A nice wrapper in python for the IPATH3 API to computationally create graphs	545
R Package	formattable	nice package to make nice table in Rmarkdown for nicer formatted output	551
R Packages	awesome-r	A curated list of awesome R frameworks, libraries and software	588
Book	R for Data Science	This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it	612
Website	Git stuff explained	Nice website that easily explains all the git commands for command line	624
Tool	Type Strain genome Server DSMZ	Web tool by DSMZ to type novel genomes based on their collection of type strains	625
Website	Beta diversity distances	Nice website that has the math equations for most of the beta diversity distances	630
App	Pitch	Collaborative presentation software for modern teams	703
Reporting	conflr	an R package to post R Markdown documents to Confluence, a content collaboration tool by Atlassian	737
Tutorial	Galaxy Training	Collection of tutorials developed and maintained by the worldwide Galaxy community	765
R package	thesisdown	package to write thesis in Rmarkdown	782
R package	blogdown	is an R package that makes blogging for R users as straightforward as possible	801, 802, 803
Webpage	postsyoumighthavemissed	Search 000's of R & Python articles and packages!	805
Tutorial	shell-how	Write down a command-line to see how it works	806
Webpage	webpage-repository	the website of AllanLab academic research group at Leiden University	808
Tutorial	Machine Learning	Machine Learning for Everyone	809
R package	RPushbullet	a package to send messages to your devices from R	815, 816
R package	portfoliodown	makes it painless for data scientists to create a polished professional website so they can host their project portfolios, get great job interviews, and launch their data science careers	818
Scripts	blantyreESBL	This document contains reproducing analysis code which generates the tables and figures for the manuscript: Dynamics of gut mucosal colonisation with extended spectrum beta-lactamase producing Enterobacterales in Malawi	891, 892
codes	nf-modules	A repository for hosting Nextflow DSL2 module files containing tool-specific process	896
Tutorial	Kaggle	Data Science competition	898
Tutorial	Perfect-bacterial-genome-tutorial	Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing	904, 905

Website to look up Markdown Syntax [https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet]

Save

rajaldebnath/biomagician

biomagician

Tutorials

Containers

Graph Databases aka Knowledgebases

Databases

Bioinformatics

Biostatistics

Visualization

Pipeline Managers

Modelling

Other Resources