/ToolOverview

summary of available plant genomics tools

GNU General Public License v3.0GPL-3.0

Plant Genomics Tool Overview

Many tools are available for various tasks in plant genomics. Finding the right tool for a certain purpose can be challenging. This repository provides an overview of available (and recommended) tools in plant genomics. This repository combines the experiences of the Plant Biotechnology and Biotechnology research group at TU Braunschweig. If you are new to plant genomics, our 'Data Literacy in Plant Genomics' course might be of interest to you.

Handling reads

ToolName Comments Reference
guppy ONT basecalling ONT
fastQC quality control short reads FastQC GitHub repository
LongQC quality control long reads Fukasawa et al., 2020
Filtlong trimming long reads Filtlong GitHub repository
Trimmomatic trimming short reads Bolger et al., 2014

Genome size estimation

ToolName Comments Reference
MGSE mapping-based Pucker, 2019
GenomeScope2 k-mer-based Ranallo-Benavidez, 2020
findGSE k-mer-based Sun et al., 2017
gnodes mapping-based Gilbert, 2022

Genome sequence assembly

ToolName Comments Reference
HiCanu long read assembler (resource intensive) Nurk et al., 2020
NextDenovo2 long read assembler GitHub repository
Shasta long read assembler (fast) Shafin et al., 2020
Flye long read assembler (high memory requirements) Kolmogorov et al., 2019
miniasm long read assembler Li, 2018
QUAST assembly statistics calculation Gurevich et al., 2013
Merqury assembly statistics calculation Rhie et al., 2020

Repeat masking & annotation

ToolName Comments Reference
RepeatMasker masking/annotating repeats; Repbase needed Smit, et al., 2015
EDTA repeat annotation Ou et al., 2019
panEDTA pangenome repeat annotation Ou et al., 2022

Read mapping

ToolName Comments Reference
STAR RNA-seq read mapping Dobin et al., 2013
HISAT2 RNA-seq read mapping Kim et al., 2019
minimap2 long read mapping Li, 2018

Variant calling and annotation

ToolName Comments Reference
BWA MEM short read mapping Li, 2013
GATK short read variant calling Van der Auwera GA & O'Connor BD., 2020
SVIM2 long read variant calling Heller & Vingron, 2019
Sniffles2 long read variant calling Smolka et al., 2024
SnpEff variant impact prediction Cingolani et al., 2012
NAVIP variant impact prediction Baasner et al., 2019

Gene prediction

ToolName Comments Reference
BRAKER3 protein coding gene prediction Gabriel et al., 2023
Augustus protein coding gene prediction Stanke et al., 2006
GeMoMa protein coding gene prediction Keilwagen et al., 2019
TSEBRA merge annotations Gabriel et al., 2021
tRNAscan-SE tRNAs, rRNAs Chan et al., 2021
BUSCO assembly/annotation completeness check Manni et al., 2021

Comparative genomics

ToolName Comments Reference
MCscan synteny analysis Wang et al., 2012
TBtools-II comparative genomics Chen et al,. 2023
TOGA synteny analysis Kirilenko et al., 2023

Functional annotation

ToolName Comments Reference
InterProScan5 universal functional annotation Jones et al., 2014
KIPEs biosynthesis pathway annotation Rempel et al., 2023
MYB_annotator R2R3-MYB annotation Pucker, 2022
bHLH_annotator bHLH annotation Thoben & Pucker, 2023
Mercator universal functional annotation Lohse et al., 2014
BLAST2GO universal functional annotation Conesa et al., 2005
KEGG Mapper KEGG ID assignment Kanehisa & Sato, 2020
BLAST sequence comparison Altschul et al., 1990
DIAMOND Buchfink et al., 2015
HMMER sequence comparison Finn et al., 2011
KAAS assigning KEGG IDs Moriya et al., 2007
ENTAP - Hart et al., 2019
eggNOG-mapper assigning eggNOG IDs Cantalapiedra et al., 2021

Coexpression analyses

ToolName Comments Reference
fastq-dump FASTQ downloads NCBI
kallisto gene expression quantification Bray et al., 2016
WGCNA coexpression analysis Langfelder & Horvath, 2008
GENIE3 coexpression analysis Huynh-Thu et al., 2010
dynGENIE3 coexpression analysis Huynh-Thu & Geurts, 2018

Phylogenetic analyses

ToolName Comments Reference
MAFFT multiple sequence alignment Katoh & Standley, 2013
MUSCLE multiple sequence alignment Edgar, 2022
IQ-TREE2 phylogenetic tree construction Minh et al., 2020
FastTree phylogenetic tree construction Price et al., 2010
RAxML-NG phylogenetic tree construction Kozlov et al., 2019
SHOOT phylogenetic placement of sequence Emms & Kelly, 2022
iTOL phylogenetic tree visualization Letunic & Bork, 2021
JustOrthologs ortholog identification Miller et al., 2019
SwiftOrtho ortholog identification Hu & Friedberg, 2019
FastOMA ortholog identification Majidian et al., 2024

Helpful databases

ToolName Comments Reference
jbrowse genome browser Diesh et al., 2023
gbrowse genome browser Stein, 2013
Phytozome plant genome database Goodstein et al., 2012
JGI Plant Gene Atlas gene expression atlas Sreedasyam et al., 2023
SRA sequencing database NCBI
GEO gene expression database NCBI
OrthoDB ortholog database Kuznetsov et al., 2022

Species-specific databases

ToolName Comments Reference
TAIR Arabidopsis database Berardini et al., 2015
Banana Genome Hub Musa database Droc et al., 2022
Coffee Genome Hub Coffea database Dereeper et al., 2015
Sol Genomics Network Solanaceae database Fernandez-Pozo et al., 2014
Cassava Genome Hub Cassava database (offline?)
Cocoa Genome Hub Cacao database Argout et al., 2017
Grass Genome Hub Grass database website
Rice Genome Hub Rice database website
Sugarcane Genome Hub Sugarcane database Garsmeur et al., 2018

Further Reading

There are also specific collection of tools for particular purposes:

Long-Read-Tools

If you have questions about plant genomics that were not answered by any of these resources, please feel free to get in touch with the Plant Biotechnology and Biotechnology research group at TU Braunschweig.