/rcqc

Report Calc for Quality Control is a python command line and Galaxy bioinformatics platform tool for text-mining log and data files to create reports and even to stop workflow execution in response to critical quality control issues.

Primary LanguagePythonCreative Commons Zero v1.0 UniversalCC0-1.0

rcqc

Report Calc for Quality Control (RCQC) is an interpreter for the RCQC scripting language for text-mining log and data files to create reports and to control workflow within a workflow engine. It works as a python command line tool and also as a Galaxy bioinformatics platform tool. We're building a library of simple "recipe" scripts that extract quality control (QC) data from various reports like FastQC, QUAST, CheckM and SPAdes into a common JSON data format. By placing the RCQC app in your workflow downstream from one of these apps, you can convert their textual or tabular data into a much more standardized and software-friendly format.

A second objective is to enable selected QC measurements to be marked up with their globally accessable ontology identifiers so that they can be compared between workflow variations in-house or between workflows that different groups have created. Currently GenEpiO, our under-development ontology, hosts basic but key genome assembly quality control terms like "genome size ratio", contig count and various QC threshold terms. This should be a step towards enabling pipelines to generate QC measurements that are part of a software accreditation process for adoption in commercial and public health laboratories.

See the wiki for extensive documentation - especially the "Getting started" page. Here is the command line summary:

Usage: rcqc.py [ruleSet file] [input files] [options]*

Records selected input text file fields into a report (json format), and
optionally applies tests to them to generate a pass/warn/fail status. Program
can be set to throw an exception based on fail states.

Options:
  -h, --help            show this help message and exit
  -v, --version         Return version of report_calc.py code.
  -H OUTPUT_HTML_FILE, --HTML=OUTPUT_HTML_FILE
                        Output HTML report to this file. (Mainly for Galaxy
                        tool use to display output folder files.)
  -f OUTPUT_FOLDER, --folder=OUTPUT_FOLDER
                        Output files (via writeFile() ) will be written to
                        this folder.  Defaults to working folder.
  -i INPUT_FILE_PATHS, --input=INPUT_FILE_PATHS
                        Provide input file information in format: [file1
                        path]:[file1 label][file1 suffix][space][file2
                        path]:[file2 label]:[file2 suffix] ... note that
                        labels can't have spaces in them.
  -o OUTPUT_JSON_FILE, --output=OUTPUT_JSON_FILE
                        Output report to this file, or to stdout if none
                        given.
  -r RULES_FILE_PATH, --rules=RULES_FILE_PATH
                        Read JSON format recipe rules from this file.
  -e EXECUTE, --execute=EXECUTE
                        Ruleset sections to execute.
  -c CUSTOM_RULES, --custom=CUSTOM_RULES
                        Read and incorporate custom rules from a file into a
                        json recipe.  Helpful for testing variations.
  -p PLAIN_RULES, --plain=PLAIN_RULES
                        Read recipe rules in plain format from this file.
  -s SAVE_RULES_PATH, --save_rules=SAVE_RULES_PATH
                        Save modified ruleset to a file.
  -d, --debug           Provides more detail about rule execution on stdout.

This tool does require a few python modules:

pip install dateutils pip install pyparsing

If you encounter ssl (encryption key) related problems in trying to use pip to install them, you might need to run:

pip install requests —upgrade

For the Galaxy tool these will be added as dependencies soon.