/irma-scripts

Stand-alone scripts deployed to Irma

Primary LanguagePython

irma-scripts

Stand-alone scripts deployed to Miarka

These scripts are deployed to /vulpes/ngi/production/latest/sw/upps_standalone_scripts/ by the miarka-provision process. The script directory is added to PATH when loading the Miarka environment, meaning that these scripts are available on the command-line.

The scripts should contain instructions for usage unless it's obvious how to use them. Preferably, invoking a script without arguments should be safe to run without any side effects and only display usage instructions on stdout

NEVER put any passwords, usernames, tokens, user data or other sensitive information in the scripts. If such information is required by the script, rely on reading it from an environment variable instead.

When adding a new script to this repository, be sure to add a brief description of its purpose below:

  • concordance_check.sh - bash script to perform concordance check between a vcf file with genotypes and a vcf file with variant calls
  • deliver_project_to_user.sh - bash wrapper script around the deliver.py script, which should facilitate the delivery for the SNP platform
  • find_unorganized_flowcells.sh - bash script that verifies that the organized project folder under the DATA directory contains all runfolders in incoming having data from the project in them
  • link_project_sisyphus_reports.sh - bash script that links sisyphus runfolder reports from the incoming folder to the corresponding project folder under ANALYSIS
  • set_charon_genotyping_status.sh - bash script to set the genotyping status field in charon to a specified value for samples present in a supplied vcf file
  • statdump_to_json.pl - perl script that can parse a statdump zipfile created by sisyphus and output the statistics as json
  • run_FastQC_and_MultiQC.sh - bash script to run FastQC on a specified project in a runfolder. The script will summarize the output in one or several MultiQC-reports.
  • run_multiqc_bp_qc.sh - A simple wrapper for the MultiQC command used when performing QC of best-practice WGS projects.
  • project_runfolders.sh - Mainly used to find all runfolders with samplesheets containing a specific project or sample name. Scans incoming for csv-files at most two folders down and greps for the given string, then echoes folder if found.
  • cleanup_nf_projects.py - Script for cleaning up old analysis nextflow projects. The script will list folders (with full path) that will be deleted and calculate how much data will be removed. It will wait for input from user before removing anything. See usage at the top of the script.
  • make_nf_run_script.py - Script for generating an sbatch run script for NextFlow rnaseq and methylseq pipelines. See usage at the top of the script.
  • merge_fastqs.py - Script for merging fastq-files from different lanes / runs per sample.
  • start_merge.py - Convenience script for merging fastq files in a project per sample, depends on merge_fastqs.py
  • 1_create_reference_tsv.bash - A helper script to create_reference_tsv.py.
  • create_reference_tsv.py - A script for writing sample info and paths to a WES project's fastq-files in a .tsv-file used by Sarek.
  • 2_create_twist_exome_analysis.bash - This script will use a template, twist_exome_38_template.sbatch, to create a sbatch script to start Sarek for WES analysis.
  • twist_exome_38_template.sbatch - Template for running Sarek 2.6.1 on WES-data using reference GRCh38.
  • charon_project_samples_status_update.sh - Script to get all samples in Charon for a supplied project and set the analysis_status to ANALYZED and the status to STALE.
  • run_hs_metrics.sh - Run CollectHsMtrics for all recalibrated BAM files in a WES project.
  • bed2interval_list.sh - Example script on how to run picard BedToIntervalList (format needed for run_hs_metrics.sh).