DKFZ-ODCF/AlignmentAndQCWorkflows

Check all read BAM files for MD5 sum on the fly

Opened this issue · 0 comments

Data can (and does) degenerate on the harddisk. This can be detected by whenever reading from a BAM calculating the MD5 sum comparing the result to the MD5 sum stored in the .md5 file.

  • Whenever a non-temporary big file is written, an MD5 sum should be calculated on the fly.
  • Whenever a non-temporary big file is read, the MD5 sum should be calculate and compared to the reference.
  • Candidate files are
    • FASTQ (MD5 sums from disk or OTP)
    • BAM (MD5 sums from alignment workflow already implemented)

This issue would also increase compliance with the Rahmendatenschutzkonzept.