Roth-Lab/pyclone-vi

mutations with no data in some samples

Opened this issue · 5 comments

ZWael commented

Hello @aroth85,
I'am using PyClone-VI to infer the clonal structure between samples from the same subject
My aim is to have a description of mutation gain/loss

as stated in the README file PyClone-VI removes mutations without entries for all samples
this is the case in my data and also in the example provided from tracerx "/examples/tracerx.tsv"

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R1 202 0 2 3 2 0.21
CRUK0001:11:47843641:G R3 183 10 2 4 1 0.11

the recommended solution is to "set ref/alt counts to 0 for the corresponding sample."
so I have added this line

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R2 0 0 2 0 0 0.11

but as in the original file this mutation was removed in PyClone resulting table

My second attempt was to add a major_cn equal to that in normal cell

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R2 0 0 2 2 0 0.11

in this case the mutation was retained with 0 in cellular prevalence in the R2 sample

What should I consider for the major_cn ? the normal_cn , copy number of the overlapping gene segment even if there is no mutated allele ?

The major_cn has to be greater than zero in all samples or a mutation is filtered out, since there is possible way to have a mutation at a loci which is absent. Long term this needs to be altered, but for now your second solution correct.

Ideally you would put the actual CN and allele counts observed in the sample, even if it is not reported as mutated by the variant caller.

ZWael commented

Thank you @aroth85 for your feedback.

So ideally I can add ref_counts = counts reported in that location alt_counts = 0 as there is no altered allele and for the cn i used the cp at the gene level

Did i get it right ?

Hello,

I'm in the similar situation, and followed the second advice.
But all of CCF results were over 0.99, which should be 0 theoretically.
What should I do??

I attached my example.

test.input.txt
test.output.txt

Best regards,

I'm afraid I should input ref/alt_count of "tumor+normal", not only "tumor" ??

I tried to input former (ref_count≠0, alt_count=0), the result was CCF=0.

Or my poor understanding?
Would you give me some advice?

Now I imput tumor ref/alt count, and calculate CCF succesfully.
Following is my protocol.

  1. Merge each vcf file with Bcftools merge
  2. Generate Interval list with GATK VcfToIntervalList
  3. Call variant from each bam file with GATK HaplotypeCaller

Following is a part of my data, which called 0 alt_count.

Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode t_depth t_ref_count t_alt_count
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-1 288 284 4
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-2 245 245 0
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-3 259 259 0
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-5 320 253 67
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-6 292 235 57
ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-7 347 275 72