fsvarn/GLASSx

blocklist coverage_exclusion

Closed this issue · 1 comments

When evaluating the new Henry Ford cohort (batch H2) for whether they passed the coverage_exclusion threshold in the blocklist, we noticed some inconsistencies between whether a sample should have passed in the original data freeze and whether a sample should pass now. These criteria are based off of the average "MEAN_COVERAGE" values in from the wgsmetrics.txt files for each aliquot, with samples > 2 standard deviations below the mean failing the coverage threshold. While these values are inexact for whole exome sequencing files, they offer a convenient "relative coverage" metric for evaluating them in the context of the overall dataset. Based on these criteria, we noted 6 aliquots with coverage values that were in the range of those that failed the blocklist in the original data freeze. We are marking these samples as review under coverage_exclusion in the blocklist and will return to them if they give us trouble with copy number or mutation calling.

GLSS-HF-1015-NB-02D-WXS-FV122J
GLSS-HF-1176-NB-02D-WXS-B1MMKX
GLSS-HF-753F-R4-03D-WXS-NNYRR1
GLSS-HF-B30B-R2-03D-WXS-0FCXCU
GLSS-HF-DCED-NB-02D-WXS-XKGH08
GLSS-HF-EE77-R1-02D-WXS-TY2KA7

This was fully resolved during the rerun of the H2 cohort. None of the above barcodes exist anymore due to a labeling error, and many of the underlying fastqs were merged together as it was revealed to us they were from the same tumor/region/timepoint. As a result everything in the H2 cohort passes the coverage exclusion.