Command line pipeline to facilitate quality control of SNP data from Diversity Array Technologies (DArT). This version is a re-write of the original scripts aiming to be somewhat more user-friendly and executable on JCU's HPC.
Requires conda package manager, e.g. miniconda
see Install DartQC.
conda install dartqc -c bioconda -c esteinig
This section provides a brief guide of how to install and use DartQC, assuming a Bash shell and Unix. If you are using the pipeline on JCU's HPC (Zodiac) please read the relevant sections in Install DartQC and Task: pbs
DartQC has a hierarchical parser structure that allows you to set global options and execute a task (prepare, process, filter) with its own specific arguments:
dartqc [--help] [--project] [--output_path] [--pop] task
Arguments:
--project, -p output prefix
--output_path, -o output directory
--populations, --pop csv file with header: id, population
Tasks:
dartqc prepare --help
dartqc validate --help
dartqc process --help
dartqc filter -- help
Support Tasks:
dartqc install --help
dartqc pbs --help
Global arguments are specified before the command for a task, like this:
dartqc
--project example --output_path ./example
prepare
--file example_data.csv
Example workflow without pre-processing from Excel or CSV:
# CSV
dartqc prepare --file example.csv
# Excel
dartqc prepare --file example.xlsx --sheet double_row_snps
dartqc filter --call example.csv --call_scheme example_scheme.json --maf 0.02 --clusters
Example workflow with pre-processing:
dartqc prepare --file calls.csv
dartqc prepare --file raw.csv
dartqc process -c calls.csv --call_scheme calls_scheme.json -r raw.csv --raw_scheme raw_scheme.json --read_sum 7
dartqc filter --processed . --maf 0.02 --call_rate 0.7 --duplicates --clusters
If you find any bugs, please submit an issue for this repository on GitHub.