PapenfussLab/gridss

Joint assembly and calling for multiple tumor/normal sample pairs

Closed this issue · 1 comments

Dear GRIDSS support team,

I have read carefully the user guide and the different issues on the topic of joint calling but cannot find a suitable answer to my question.
Consider a situation with three tumor/normal sample pairs normal1, tumor1, normal2, tumor2, normal3, tumor3.

Option 1

Is to treat each tumor/normal pair independently from others i.e

for i in 1 2 3; do
   gridss -s assemble -a patient${i}.bam normal${i}.bam tumor${i}.bam
   gridss -s call -a patient${i}.bam normal${i}.bam tumor${i}.bam
done

However what would happen if

Option 2

I run

gridss -s assemble -a all.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam
gridss -s call -a all.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam

?

Another option would be

Option 3

to perform batch assembly with each batch consisting of the tumor/normal pair of each patient and then do the calling on all samples at once

for i in 1 2 3; do
   gridss -s assemble -a patient${i}.bam normal${i}.bam tumor${i}.bam
done

gridss -s call -a patient1.bam -a patient2.bam  -a patient3.bam normal1.bam tumor1.bam normal2.bam tumor2.bam normal3.bam tumor3.bam

Can you advise on the correct/optimal way of analyzing these three pairs of tumor/normal data?

Best regards,
Yoann Pradat

https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#should-i-process-each-input-bam-separately-or-together

Can you advise on the correct/optimal way of analyzing these three pairs of tumor/normal data?

Related samples should be processed together. If you have multiple tumours and/or timepoints from the same patient then they should be processed together (https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-run-gridss-on-multiple-samples). If you're after for a T+N VCF per patient then option 1 is the most appropriate (https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-perform-tumournormal-somatic-variant-calling)

For somatic calling, I strongly recommend running GRIDSS as part of a Hartwig-style GRIDSS+PURPLE+LINX pipeline as joint SV+CNV analysis gives you much insight into what's happening in the tumour than just looking at SV calls in isolation.

gridss -s assemble
gridss -s call

Generally speaking, you don't need to break out the assembly and calling steps. The only time you would do this is when you want an entire cohort in a single VCF. If you need this, then you'll need to batch the assembly as it gets problematic on 1000x+ coverage (GRIDSS has timeouts that cause assembly to be skipped in regions where the assembly graph gets too complex).

https://github.com/PapenfussLab/gridss/?tab=readme-ov-file#how-do-i-perform-batched-assembly