Clinical-Genomics-Lund/nextflow_wgs

Run output diff script

Closed this issue · 0 comments

When testing dev changes, typically a run is redone after the changes.

It would be greatly convenient to have a script which can be pointed to output folders before and after the change.

This script could then check for:

  • Whether all files are there, or whether removed / added
  • Whether they are the same size / md5sums (wherever expected)
  • For VCF files
    • If the header has changed (diff)
    • If the number of records have changed
    • What variants have been added / removed
  • For BAM files
    • (Possibly) samtools flagstat
  • For the final YAML file
    • The exact diff
    • Whether all paths resolve