DBTCShiny is an R shiny implementation of the DBTC Metabarcode analysis pipeline.
This package contains the DBTCShiny functions located at rgyoung6/DBTCShiny . The Dada-BLAST-Taxon Assign-Condense Shiny package contains the foundational DBTC functions (able to be run through the command line) which have been wrapped in a Shiny application for easy user interface. The DBTC functions have four main outcomes...
- Fastq file processing using Dada in R
- Using the Basic Local Alignment Search Tool (BLAST), amplicon sequence variants (ASV) can be searched against local NCBI or custom sequence databases
- Assign taxa to the unique reads using NCBI taxon database (obtain the database using taxonomizr website)
- Condense the resulting ASV taxonomic assignment tables to unique taxa with the ability to combine datasets (using different sequence databases for the same reads, or results from the same samples for different molecular regions) into a combined results table
NOTE: While the DBTC package has been built for the analysis of high-throughput sequencing results, the BLAST and taxonomic assignment, taxonomic condense can be utilized with single specimen Sanger sequencing data.
- Installation
- Package Dependencies
- DBTC Function Descriptions
- Naming Convention Rules
- DBTCShiny Function Details
- Mapping Dashboard
- Citation
DBTCShiny can be installed three ways.
install.packages('DBTCShiny’)
Run the following commands in your R terminal...
if(!require(devtools)) install.packages('devtools')
library('devtools')
devtools::install_github('rgyoung6/DBTCShiny')
library('DBTCShiny')
Navigate to the DBTCShiny GitHub page. Download the files associated with this page to your local computer and place them somewhere in the main file folder named DBTCShiny. Then run the following command pointing to that location on your local computer by replacing the HERE with the path in the below command...
library("DBTCShiny", lib.loc="HERE")
There are several dependencies necessary for the DBTCShiny package. Most notable is the DBTC package. This package requires several bioconductor, CRAN and external packages, programs, and database resources. The installation guide for DBTC should be consulted before installing DBTCShiny.
In addition to the DBTC dependencies, DBTCShiny has a number of CRAN dependencies. These are listed below...
install.packages(c('DBTC',
'DT',
'ggplot2',
'leaflet',
'leaflet.extras',
'magrittr',
'shiny',
'shinycssloaders',
'shinydashboard',
'shinyWidgets'))
library(c('DBTC',
'DT',
'ggplot2',
'leaflet',
'leaflet.extras',
'magrittr',
'shiny',
'shinycssloaders',
'shinydashboard',
'shinyWidgets'))
After DBTCShiny installation and all of its dependencies (including DBTC and all of its dependencies) you need to load the package and then run the Shiny Graphical User Interface (GUI) using the following commands...
library(DBTCShiny)
launchDBTCShiny()
DBTC Function Descriptions
The dada_implement() function takes fastq files as input, analyses them and produces amplicon sequence variant (ASV) files. This function requires a main directory containing folder(s) representing sequencing runs which in-turn contains fastq files (the location of one of the fastq files in one of the sequencing run folders is used as an input argument). A run is a group of results processed at the same time on the same machine representing the same molecular methods. All sequencing folders in the main directory need to represent data from sequencing runs that have used the same primers and protocols. Output from this function includes all processing files and final main output files in the form of fasta files and ASV tables.
DBTC dada_implement() uses ASV output files ('YYYY_MM_DD_HH_MM_UserInputRunName_Merge' and/or 'YYYY_MM_DD_HH_MM_UserInputRunName_MergeFwdRev') and combines them into a single ASV table and creates an accompanying fasta file. This function also produces a file containing the processing information for the function. The main input argument for this function is the location of a file in a folder containing all ASV tables wanting to be combined. Output files are generated with the naming convention 'YYYY_MM_DD_HH_MM_combinedDada'.
This function takes a fasta file with headers in the MACER format (Young et al. 2021) and establishes a database upon which a BLAST search can be completed. However, if a NCBI sequence database is desired, it is advisable to use, where applicable, NCBI preformatted databases and skip the make_BLAST_DB() function (https://www.ncbi.nlm.nih.gov/books/NBK62345/#blast_ftp_site.The_blastdb_subdirectory). The outcome of the function is a folder with a BLASTable NCBI formatted sequence database.
The MACER fasta header format
>GenBankAccessionOrBOLDID|GenBankAccession|Genus|species|UniqueID|Marker
Fasta file(s) are used as input along with a user selected NCBI formatted database upon which query sequences will be searched using BLAST. The outcome of the function are two files, a BLAST run file and a single file containing all of the BLAST results in tab delimited format. There are no headers in the BLAST results file but the columns (in order left to right) are: query sequence ID, search sequence ID, search taxonomic ID, query to sequence coverage, percent identity, search scientific name, search common name, query start, query end, search start, search end, e-value.
This function takes a BLAST result file and associated fasta files (either on their own or with accompanying ASV files generated from the dada_implement() function) and collapses the multiple BLAST results into as single result for each query sequence. When an ASV table is present the taxonomic results will be combined with the ASV table.
The combine_assign_output() function takes a file selection and then uses all DBTC ASV taxon assign files ('_taxaAssign_YYYY_MM_DD_HHMM.tsv') in a selected directory and combines them into a single output 'YYYY_MM_DD_HHMM_taxaAssignCombined.tsv' file. The files being combined should represent different samples but representing data that have all come from analysis using the same molecular methods, the same analysis arguments, and the same molecular sequence databases.
To reduce ASV results to unique taxa per file the reduce_taxa() function takes a file selection and then uses all '_taxaAssign_YYYY_MM_DD_HHMM.tsv' and/or 'YYYY_MM_DD_HHMM_taxaAssignCombined.tsv' files in that directory. This function then reduces all ASV with the same taxonomic assignment into a single result and places these results in a '_taxaReduced_YYYY_MM_DD_HHMM.tsv' file for each of the target files in the directory.
This function takes a file selection and then uses all '_taxaReduced_YYYY_MM_DD_HHMM.tsv' files in that directory and combines them into a single 'YYYY_MM_DD_HHMM_CombineTaxaReduced.txt' taxa table file with presence absence results. The files being combined should represent the same biological samples but with different molecular marker information. The output ASV can include read numbers or be reduced to presence absence results.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root directory) as some functions do not process long naming conventions.
Also, when using DBTC functions naming conventions need to carefully considered. Special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions. The following strings are those used in DBTC and should not be used in file or folder naming:
- _BLAST
- _combinedDada
- _taxaAssign
- _taxaAssignCombined
- _taxaReduced
- _CombineTaxaReduced
DBTCShiny uses buttons to select files necessary when running analyses. These buttons will bring up a dialog window (referred to as an 'Open' dialog in Mac OS or an 'Open File' dialog in Windows and a 'File Picker' dialog in Linux and referred to as a 'select file dialog window').
In addition to buttons which bring up select file dialog windows there are fillable fields and option buttons.
All of these input elements are used to submit user variables and function options to DBTC functions via the DBTCShiny graphical user interface.
For package function details please see the DBTC descriptions and documentation.
- dada_implement()
- combine_dada_output()
- make_BLAST_DB()
- seq_BLAST()
- taxon_assign()
- combine_assign_output()
- reduce_taxa()
- combine_reduced_output()
In addition to the implementation of DBTC core functions, DBTCShiny also provides an interactive mapping option for DBTC ASV files (see the next 'Mapping Dashboard' section below).
DBTCShiny has interactive mapping functions. The following four sections provide information about the use of the mapping functions of DBTCShiny.
Data import buttons to load ASV data files generated by the DBTCShiny pipeline along with provenance data to visualize on the map (See below for an image of the graphical user interface and the formats of the files necessary for the mapping option).
NOTE: The selected location of the ASV data file will load all files in the selected location with the '_taxaReduced' string in their name. Combining the data in these ASV files for presentation on the map requires transformation and simply combining data files will not achieve the same outcome. However, the provenance data file is only a single file that will need to represent all records (whether the ASV data come from one or combined files). Any ASV data not having representation with a geospatial coordinate in the provenance data file will not be mapped.
Provenance Data
Campaign | Sample | Run | Lab | Type | Date | West | North |
---|---|---|---|---|---|---|---|
NationalPark2024 | A001 | Run1A | GuelphHanner | Sample | YYYY-MM-DD | 43.5327 | -80.2262 |
ASV Data Information Headers (These will be followed by columns of sample read data, in the below example there is a single sample 'A001')
superkingdom | phylum | class | order | family | genus | species | Top_BLAST | Final_Common_Names | Final_Rank | Final_Taxa | Result_Code | RepSequence | Number_ASV | Average_ASV_Length | Number_Occurrences | Average_ASV_Per_Sample | Median_ASV_Per_Sample | Results | A001 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eukaryota(51.1,97,95.2,1e-76) | Chordata(51.1,97,95.2,1e-76) | Mammalia(51.1,97,95.2,1e-76) | Rodentia(51.1,97,95.2,1e-76) | Cricetidae(51.1,97,95.2,1e-76) | Microtus(51.1,97,95.2,1e-76) | Microtus pennsylvanicus(51.1,97,95.2,1e-76) | Microtus pennsylvanicus(100,97.143,6.96e-94) | meadow vole | species | Microtus pennsylvanicus(51.1,97,95.2,1e-76) | SFAT | AGCT | 20 | 196.25 | 31 | 374.2580645 | 108 | Merged | 100 |
The 'Data Filtering' tab provides options to filter out the visible data on the 'Mapping' tab and the data present in the 'Data Table' tab.
A tabular display of the data loaded and filtered which is also being visualized on the map.
Young RG, et al., Hanner RH (2024) A Scalable, Open Source, Cross Platform, MetaBarcode Analysis Method using Dada2 and BLAST. Biodiversity Data Journal (In progress)