Joint assembly and calling for multiple tumor/normal sample pairs
Closed this issue · 1 comments
Dear GRIDSS support team,
I have read carefully the user guide and the different issues on the topic of joint calling but cannot find a suitable answer to my question.
Consider a situation with three tumor/normal sample pairs normal1
, tumor1
, normal2
, tumor2
, normal3
, tumor3
.
Option 1
Is to treat each tumor/normal pair independently from others i.e
for i in 1 2 3; do
gridss -s assemble -a patient${i}.bam normal${i}.bam tumor${i}.bam
gridss -s call -a patient${i}.bam normal${i}.bam tumor${i}.bam
done
However what would happen if
Option 2
I run
gridss -s assemble -a all.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam
gridss -s call -a all.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam
?
Another option would be
Option 3
to perform batch assembly with each batch consisting of the tumor/normal pair of each patient and then do the calling on all samples at once
for i in 1 2 3; do
gridss -s assemble -a patient${i}.bam normal${i}.bam tumor${i}.bam
done
gridss -s call -a patient1.bam -a patient2.bam -a patient3.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam
Can you advise on the correct/optimal way of analyzing these three pairs of tumor/normal data?
Best regards,
Yoann Pradat
Can you advise on the correct/optimal way of analyzing these three pairs of tumor/normal data?
Related samples should be processed together. If you have multiple tumours and/or timepoints from the same patient then they should be processed together (https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-run-gridss-on-multiple-samples). If you're after for a T+N VCF per patient then option 1 is the most appropriate (https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-perform-tumournormal-somatic-variant-calling)
For somatic calling, I strongly recommend running GRIDSS as part of a Hartwig-style GRIDSS+PURPLE+LINX pipeline as joint SV+CNV analysis gives you much insight into what's happening in the tumour than just looking at SV calls in isolation.
gridss -s assemble
gridss -s call
Generally speaking, you don't need to break out the assembly and calling steps. The only time you would do this is when you want an entire cohort in a single VCF. If you need this, then you'll need to batch the assembly as it gets problematic on 1000x+ coverage (GRIDSS has timeouts that cause assembly to be skipped in regions where the assembly graph gets too complex).
https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-perform-batched-assembly