13 August, 2021
Identification of NLR genes in annotated protein or transcript sequences
NLRtracker extracts and functionally annotates NLRs from protein or transcript files based on the core features found in the RefPlantNLR dataset.
RefPlantNLR: a comprehensive collection of experimentally validated plant NLRs
This is a pipeline to be run on unix based machines. The following software must be available in your path. At least 2G of free memory is needed for InterProScan but more memory is better.
- InterProScan
- Requires Java 11
- If a different version than v5.51-85.0 is used specify the path
to the
description
with
-d
- HMMER for fucntional annotation of the C-terminal jelly roll/Ig-like
domain (C-JID). This only works for protein sequence input.
- Download v3.3.2 and make sure it is available in the environment
- R version >= 4.1.0
- R package
- FIMO (MEME Suite version 5.2.0)
Usage: NLRtracker.sh [OPTION]...
-h Display help
(required)
-s Filepath File path to amino acid(/nucleotide seqence) file (.fasta)
nucleotide seqence requires -t option.
-o String Directory name to save output
(optional)
-i Filepath Result of Interproscan (.gff3)
-f Filepath Result of FIMO (.gff)
-t String Seqtype of fasta file. dna/rna ("n") or protein ("p")
Default: "p"
-c Integer Number of CPUs for interproscan
Default: 2
-m Filepath meme.xml for use with FIMO
Default: module/meme.xml (from NLR Annotator)
-x Filepath hmm for use with HMMER
Default: module/abe3069_Data_S1.hmm (from Ma et al., 2020)
-d Filepath Description of Interproscan
Default: module/InterProScan 5.51-85.0.list
run NLRtracker in the same directory:
./NLRtracker -s sample_data/sample.fasta -o out_dir
if you already have results of interproscan and FIMO
bash NLRtracker.sh \
-s sample_data/sample.fasta \
-i sample_data/test_interpro.gff3 \
-f sample_data/test_fimo.gff \
-o test
- -s … Amino acid sequence fasta (or Nucleotide fasta … require -t option)
- -o … Output directory name
- -i … Output of interproscan (interproscan.sh -i sample.fasta -appl Pfam,Gene3D,SUPERFAMILY,PRINTS,SMART,CDD,ProSiteProfiles -f gff3)
- -f … Output of FIMO (fimo module/meme.xml sample.fasta)
- -t … Sequence type of fasta file. dna/rna (“n”) or protein (“p”). Default:“p”
- -c … Number of CPUs to run interproscan. Default:2
- -m … meme.xml file to run FIMO. Default:module/meme.xml
- -x … hmm file to run hmmsearch Default:module/abe3069_Data_S1.hmm
- -d … Description of Interproscan. Default: module/InterProScan 5.51-85.0.list
Here is an overview of the output files created by the script. The files will be output by in the specified output directory and modified to include the output directory name.
- The interpro, FIMO, and hmmer output
- fimo_out/
- interpro_result
- CJID.txt
- NLR-associated (RPW8, TX, CCX, and MLKL genes):
- NLR-associated.lst: list with gene identifiers
- NLR_associated.fasta: fasta file
- NLR-associated.gff3: functional annotation of NLR-associated genes in gff3 format
- The NLRs:
- NLR.lst: list with gene identifiers
- NLR.fasta: fasta file
- NLR.gff3: functional annotation of NLR genes in gff3 format
- NBARC.fasta: the extracted NB-ARC domains in fasta format
- NBARC_deduplictated.fasta: the extracted NB-ARC domains, identical sequences collapsed, in fasta format
- Domain architecture of NLRtracker output for use with
iTOL
- iTOL.txt
- iTOL_dedup.txt: for use with the deduplicated NB-ARC domains.
- NLRtracker.tsv: the NLRtracker classification of each entry
- Domains.tsv: The individual domains identified. This file can be
used with the
refplantnlR
R package for drawing the NLR domain architecture