mutations with no data in some samples
Opened this issue · 5 comments
Hello @aroth85,
I'am using PyClone-VI to infer the clonal structure between samples from the same subject
My aim is to have a description of mutation gain/loss
as stated in the README file PyClone-VI removes mutations without entries for all samples
this is the case in my data and also in the example provided from tracerx "/examples/tracerx.tsv"
mutation_id | sample_id | ref_counts | alt_counts | normal_cn | major_cn | minor_cn | tumour_content |
---|---|---|---|---|---|---|---|
CRUK0001:11:47843641:G | R1 | 202 | 0 | 2 | 3 | 2 | 0.21 |
CRUK0001:11:47843641:G | R3 | 183 | 10 | 2 | 4 | 1 | 0.11 |
the recommended solution is to "set ref/alt counts to 0 for the corresponding sample."
so I have added this line
mutation_id | sample_id | ref_counts | alt_counts | normal_cn | major_cn | minor_cn | tumour_content |
---|---|---|---|---|---|---|---|
CRUK0001:11:47843641:G | R2 | 0 | 0 | 2 | 0 | 0 | 0.11 |
but as in the original file this mutation was removed in PyClone resulting table
My second attempt was to add a major_cn equal to that in normal cell
mutation_id | sample_id | ref_counts | alt_counts | normal_cn | major_cn | minor_cn | tumour_content |
---|---|---|---|---|---|---|---|
CRUK0001:11:47843641:G | R2 | 0 | 0 | 2 | 2 | 0 | 0.11 |
in this case the mutation was retained with 0 in cellular prevalence in the R2 sample
What should I consider for the major_cn ? the normal_cn , copy number of the overlapping gene segment even if there is no mutated allele ?
The major_cn has to be greater than zero in all samples or a mutation is filtered out, since there is possible way to have a mutation at a loci which is absent. Long term this needs to be altered, but for now your second solution correct.
Ideally you would put the actual CN and allele counts observed in the sample, even if it is not reported as mutated by the variant caller.
Thank you @aroth85 for your feedback.
So ideally I can add ref_counts = counts reported in that location alt_counts = 0 as there is no altered allele and for the cn i used the cp at the gene level
Did i get it right ?
Hello,
I'm in the similar situation, and followed the second advice.
But all of CCF results were over 0.99, which should be 0 theoretically.
What should I do??
I attached my example.
test.input.txt
test.output.txt
Best regards,
I'm afraid I should input ref/alt_count of "tumor+normal", not only "tumor" ??
I tried to input former (ref_count≠0, alt_count=0), the result was CCF=0.
Or my poor understanding?
Would you give me some advice?
Now I imput tumor ref/alt count, and calculate CCF succesfully.
Following is my protocol.
- Merge each vcf file with Bcftools merge
- Generate Interval list with GATK VcfToIntervalList
- Call variant from each bam file with GATK HaplotypeCaller
Following is a part of my data, which called 0 alt_count.
Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode t_depth t_ref_count t_alt_count
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-1 288 284 4
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-2 245 245 0
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-3 259 259 0
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-5 320 253 67
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-6 292 235 57
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-7 347 275 72