Samples with high GC content
Closed this issue · 5 comments
We receive WGS samples with high mean GC content (obtained from qualimap) rather frequently, but it is not clear what is causing samples to have high GC. We also do not know what are their consequences in downstream analysis.
Note: This is not a QuaC issue; instead this has to do with sample QC.
I looked at chr1 coverage for Musc*** samples with high GC. They both had variable coverage across chromosome length, compared to expected coverage at ~1.0. While this indexcov figure shows only two samples (LW001647
and LW001654
- these are part of Pad** samples) with high GC, such observation is common for other samples with high GC as well. LW001643
, which has normal GC, is shown for reference here with coverage around ~1.0.
Such coverage variability can also be seen in coverage across reference. Plots below were obtained from qualimap. Note how coverage (red line) is shaky for those with high GC.
I wasn't much successful trying to find literature on this topic. Indexcov paper highlights a sample with high coverage variability, and it notes that "samples like this one will have many spurious CNV calls"; however it doesn't discuss the cause of high coverage variability.
While I think atm that high GC content might not have significant effect on small variant calling (not convinced fully though!), I expect them to cause issues with other types of variant calls. We need to revisit this topic at some point.