UNMASC failing on a subset of samples
Closed this issue · 4 comments
Hello,
I am attempting to run UNMASC on a cohort of Agilent SureSelect XT HS2 gene panel samples which were aligned to hg19 using bwa-mem and pre-processed using Agilent AGeNT software (https://www.agilent.com/cs/library/software/public/AGeNTBestPractices.pdf) and am getting some errors. I am attempting to call variants for 177 samples. Of these, 74 samples go through the entire UNMASC workflow with no issues and produce all final output files.
From the remaining 103 samples, I am experiencing two different issues:
- The most common error occurs during the OXOG/FFPE/ARTI filtering:
Determine OXOG,FFPE,ARTI status ...
Number of unique variants = 621
.621
tumor VAF segmentation on variants...
chr1: 100%
chr2: 100%
chr3: 100%
chr4: 100%
chr5: 100%
chr6: 100%
chr7: 100%
chr8: 100%
chr9: 100%
chr10: 100%
chr11: 100%
chr12: 100%
chr13: 100%
chr14: 100%
chr15: 100%
chr16: 100%
chr17: 100%
chr19: 100%
chr20: 100%
Error in `$<-.data.frame`(`*tmp*`, "index", value = 1:0) :
replacement has 2 rows, data has 0
Calls: <Anonymous> ... UNMASC_tSEG -> segment_tVAF -> $<- -> $<-.data.frame
Execution halted
The exact chromosome this error occurs at varies, most reach up to chr20, but a few samples only manage to complete up to chr 4 or chr7 before failing.
- Other samples don't make it that far and just end in a NULL shortly after starting, with no error message:
% ------------------------------- %
% Welcome to the UNMASC workflow! %
% ------------------------------- %
Sun Oct 23 01:14:14 2022: Import image ...
Sun Oct 23 01:14:14 2022: Finding oxoG artifacts ...
Sun Oct 23 01:14:14 2022: Merge strand info ...
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
Sun Oct 23 01:14:15 2022: Finding FFPE artifacts ...
Sun Oct 23 01:14:15 2022: Merge strand info ...
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15 16
17 18 19
20 21 22
23 24 25
26 27 28 29
30
Sun Oct 23 01:14:18 2022: Finding ARTI artifacts ...
Sun Oct 23 01:14:18 2022: Merge strand info ...
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
nrow of uniq_vcs = 815
nANNO ...
............25 out of 25
Infer H2M status ...
tANNO ...
..........21 out of 21
Infer ALLELE_STAT and tANNO ...
nANNO ...
............25 out of 25
Infer H2M status ...
tANNO ...
..........21 out of 21
Infer ALLELE_STAT and tANNO ...
NULL
- Lastly, a few samples fail immediately upon starting, from what I assume is low sequencing quality or poor quality variant calls?
% ------------------------------- %
% Welcome to the UNMASC workflow! %
% ------------------------------- %
Sun Oct 23 01:00:39 2022: Calculate mutID and light filtering ...
Sun Oct 23 01:00:39 2022: LowQCSample b/c low variant count after base filtering ...
NULL
Any insights into what could be causing these issues would be greatly appreciated and please let me know if you require any additional information in order to determine what may be causing the issue.
Thanks,
Javier
Hi @javi-a-lopez,
Sorry to hear about the errors.
Would it possible for you to share the image.rds
file for a sample that fails for cases (1) and (2)? I should be able to replicate the error and find the bug.
For (3), I suspect the depth of loci or Qscore are too low. Could you send a gzipped vcf as an example?
Best,
Hello, here are some image.rds files as well as a vcf file from one of the third error samples.
UNMASC_diagnostics.zip
Thank you very much for the help!
Hello again, I've tried a few things to troubleshoot, such as removing all variants from the chromosomes which throw error #1, but I'm still no closer to figuring out what's causing the error or why only in a subset of samples. Any thoughts?
Hi @javi-a-lopez,
Apologies for the delayed response. Thank you for the image files, they quickly aided me in finding some potential issues. I see that some additional documentation and tips for the user are needed. Here are my thoughts for now.
-
Screening unmatched normals: When generating the
image.rds
files, the initial clustering of normal read counts creates thenCLUST
directory with plots of the normal VAF (nVAF). With 30 unmatched normal controls, there do appear to be a subset that appear lower quality (STUDYNUMBERs 1 and 3 specifically). We expect the nVAF to be concentrated around 0, 0.5, or 1. Any concentrated deviations from 0.5 could be indications of somatic copy number change due to a tumor/normal sample swap/mislabeling. -
Read count distribution: Based on the default clustering of counts, the binomial distribution may be less favorable than the beta-binomial distribution. The data.frame
SE
withinimage.rds
provides the metrics calculated from clustering normal read counts. You can run the unexported function asUNMASC:::run_nCLUST()
to see the difference when switching betweenbinom = TRUE
vsbinom = FALSE
. -
Sparsity: UNMASC was benchmarked with a targeted gene panel and performance improves with increased coverage (WES, WGS). The limited number of genes captured in your panel is leading to a sparsity concern leading to poor segmentation of tumor and normal VAF per chromosome. The current UNMASC implementation is a potential limitation for your samples at the moment. One solution I can consider is pooling all loci together for a genome-wide segmentation to overcome this sparsity. This can be a future direction of UNMASC but it'll take a few weeks to implement, test, and debug.
Best,
@pllittle