Pipeline to annotate and filter variant called format (vcf) files and generate a report document for clinical diagnostics. The variant annotation and filtering pipeline now uses a web server GUI implemented in R Shiny.
Disclaimer
Please note that this is a beta version of the VCF-DART platform which is still undergoing final testing before its official release. The platform, its software and all content found on it are provided on an “as is” and “as available” basis. VCF-DART does not give any warranties, whether express or implied, as to the suitability or usability of the website, server, its software or any of its content.
VCF-DART will not be liable for any loss, whether such loss is direct, indirect, special or consequential, suffered by any party as a result of their use of the VCF-DART platform, its software or content. Any downloading or uploading of material to the website/server is done at the user’s own risk and the user will be solely responsible for any damage to any computer system or loss of data that results from such activities.
Should you encounter any bugs, glitches, lack of functionality or other problems on the website, please let us know immediately so we can rectify these accordingly. Your help in this regard is greatly appreciated! The best way to do this is to log an issue in this GitHub repository, or if you feel inclined you are welcome to create a pull request.
The following programs need to be available/installed for correct operation:
- VEP
- snpEFF (for SNPSift dbNSFP annotation)
- tabix (compression and indexing)
- parallel
- bedops (for vcf-sort)
- bcftools
- R
- Shiny Server
VCF-DART currently requires the following packages (and their dependencies) to be installed for correct operation:
# CRAN
install.packages('magrittr')
install.packages('shiny')
install.packages('shinyBS')
install.packages('rmarkdown')
install.packages('pander')
NOTE: for Shiny Server to be correctly installed you will require both shiny
and rmarkdown
packages to be installed.
- look at moving this to-do list over to a roadmap in the wiki
- option to run without coverage text file (more a research purpose)
- look at integrating VCF-DART and VCF-DART Viewer into a shinydashboard (and within a docker container)
- explore having options for which databases to annotate against, i.e. not running VEP
--everything
could cut run time by 30+ mins-
reducing the number of threads to 6 and removing the--merged
VEP option reduce run times to 10-15 mins for vcf files 30-50K variants in size
-
- implement selection of genome build (currently only hg19 is working)
- this is a big feature as the current databases aren't all built for hg38
- create a separate feature branch to develop this
- add more extracted features to the
vcfcompiler_diagnostics.sh
script (i.e. CADD score)-
make CADD score available (add extraction routine invcfcompiler_diagnostics.sh
) -
build script to scrape clinvar and provide updated annotation - combine this with results
-
- look into adding a cancel/exit button to the Shiny App to kill run
- explore asking user for raw data dir in GUI or configuration file (currently hard-coded)
- evaluate whether we need to continue to allow the user to define the 'home' dir
- generate and send an email and/or text message upon run completion
- look into developing an option for "off-line mode"
- design a check for internet connection
- would need a local copy of the repository available
- check for and ignore
.tbi
files in the data directory - explore adding a check for label in the coverage text file as well
- add a check for input variables and warn/error display that this is the case if missing
- integrate docker branch (this is likely to address some/all above concerns)
-
look at adding a tab for help/guide-
added tooltips throughout app, detailed help/documentation can be found at GitHub wiki
-
-
add a tab with options to upload files (VCF and coverage text files) -
check for existinggene_list
dir and delete if present -
removed the need for an external configuration file-
config options are now at the start of the script (user defined)
-
-
GitHub repo requires ssh passphrase each use -
add a Shiny GUI to the front end -
update DART-view (other shiny app) to point to the correct directory for viewing results -
issue with grep using gene lists (files) and vcf.gz-
look into using tabix (MUCH faster) -
extract list of genes from a bed file (with position info), i.e.grep -w -f 'gene_list.txt' UCSC_gene_positions_hg19.bed > gene_regions_hg19.txt
-
use:tabix -R gene_regions.txt variant.vcf.gz
-
-
remove the xmessage checks (relies on having X11 environment installed, not ideal)-
decide if we need to have user checks at these two locations
-
-
ensure the log files are being moved back into the correct location -
overhaul Shiny script to allow hosting via Shiny Server-
split intoui.R
andserver.R
-
add home directory variable to set location for data and scripts -
test working when deployed remotely
-
-
added code to set working dir to main script location -
create a configuration file to allow users to set paths to software and databases (temp) -
remove all hard-coded paths (software, databases and directories)-
remove from the main bash script (WESdiag_pipeline_dev.sh
) -
remove fromwes_vcffiltering.sh
-
remove fromassess_variants.sh
-
remove fromvcfcompiler_diagnostics.sh
-
-
add user defined option for the 3rd tier gene list-
create a feature branch for this to be implemented (user-defined-tiers) -
update variable names of gene lists to be universal -
use user uploaded gene lists (download intogene_list
dir) -
add integration with a self contained and user curated gene list repository
-
-
explore the presence of duplicate variants in the final tier (tier 3) -
add ability to determine variant caller used to generate VCF file to allow allele depth specific filtering-
testing an IF ELSE statement which looks for AD term (GATK format)
-
-
test whether bgzipping and creating tabix index for the vcf file improves VEP performance -
add time taken at the end of the pipeline (in main bash script) -
implement multiple row selection and copy to clipboard
Copyright (C) 2018 Miles Benton
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.