We can use a set of known genes to annotate a plant plastid genome. The gene set differs slightly between clades of land plant. These genes have been taken from public databases (NCBI), and turned into a set of HMM profiles for each clade of plant (currently supported are Liverworts, Hornworts, Mosses, Ferns, Conifers, and flowering plants). These can then be quickly queried against a genome, without the need for a reference.
In order to use fppa
you will need two things. First you will need to get a binary of the executable fppa
.
# assuming you have rust, if not - https://www.rust-lang.org/
git clone https://github.com/tolkit/fppa/
cd fppa
# install to PATH
cargo install --path=.
# bring up the help
fppa -h
# or install to ./target/release
cargo build --release
# bring up the help
./target/release/fppa -h
<Max Brown; Wellcome Sanger 2022>
Fast plant mito annotation (fppa).
Version: 0.1.0
USAGE:
fppa --plant-chloro <PATH> --nhmmer-path <PATH> --hmms-path <PATH>
FLAGS:
-h, --help Prints help information
-v, --version Prints version information
ARGS:
--plant-chloro Path to the plant chloroplast/plastid genome
--nhmmer-path Path to the nhmmer executable (HMMER3)
--hmms-path Path to the directory containing a set of
HMM files. Download from:
https://github.com/tolkit/fppa/tree/main/hmms
OPTIONAL ARGS:
--plot Generate an HTML SVG of where the annotated
genes occur. Requires a name ending in `.html`.
--e-value The E-value cut-off determining presence of
mito gene. <default 0.001>
--gff Output a GFF3 file of gene locations. Requires
a name ending in `.gff`.
-r, --for-rotation Identify only psbA and ycf1 for chloroplast
standardisation.
EXAMPLE:
fppa --plant-chloro ./chloro.fasta --nhmmer-path ./nhmmer --hmms-path ./hmms/angiosperm_hmms/
Secondly you will need a copy of the HMMs for each plant clade. If you git clone
'd this repository, they are under the ./hmms
directory. You can of course make your own HMMs, but the names will need to match the ones in the hmms
directory, otherwise fppa
will complain.
An example command:
# the executable
fppa \
# point to the chloroplast genome
--plant-chloro ./genomes/daBalNigr1_segs.fasta \
# point to the nhmmer executable
--nhmmer-path ~/bin/nhmmer \
# point to the directory containing the HMMs (here we're interested in flowering plants)
--hmms-path ./hmms/Magnoliopsida/ \
# generate an HTML SVG of the annotated genes
--plot test.html \
# output a GFF3 file of gene locations
--gff test.gff \
# make a more stringent E value cut-off
--e-value 0.00000001
Output depends on your options. By default a TSV will be printed to STDOUT which shows the presence/absence of each gene in the genome. It's possible to add --plot
to generate an HTML file of the annotated genes, and with the --gff
option a GFF3 file will be generated.
You will need the latest version of HMMER, either in your PATH, or you can point fppa
to the binary.
This annotator is not supposed to be extremely complete nor accurate. The aim is just to determine the presence/absence of genes which should be present on a plant chloroplast speedily and without hassle.