odelaneau/shapeit5

rare phasing yields PP=0.5 at all singletons

giorkala opened this issue · 2 comments

Hello,

Firstly, thanks for developing this software suite and also helping with any issues here. I'd like to report a potential issue I ran into when using newer versions of phase_rare (both the released and the one by Simone in issue #34), regarding the posterior probabilities that SHAPEIT reports.

In our cohort (WES for 39k individuals), phasing rare variants is completed without errors (including phasing common variants, as suggested), but when assessing the quality I found out that all variants with MAC=1 have PP=0.5 exactly. I then discussed this with collaborators working with UKBB who noticed the same behaviour when switching to the newest release. However, using SHAPEIT5 v1.0.0 (preprint version) on the same data yielded

bcftools view $BCF -C 1 -Ou | bcftools query -i'GT="het"' -f'[%CHROM:%POS:%REF:%ALT %GT %PP \n]'
chr21:10413627:C:G 0|1 0.717
chr21:10413631:C:T 1|0 1
chr21:10413638:T:A 0|1 1
chr21:10413701:G:A 1|0 0.5
chr21:10413717:A:G 1|0 0.956
chr21:10413728:C:T 0|1 0.71
chr21:10413765:G:A 1|0 0.576
chr21:10413781:G:A 1|0 0.798
chr21:10413809:G:C 0|1 1
chr21:10413818:A:G 0|1 0.733
chr21:10413819:G:A 1|0 0.686
chr21:10414945:C:T 1|0 0.822
chr21:10414947:A:T 1|0 0.98
…

thus, a reasonable distribution of PPs. Based on that, I tried to phase using older versions myself, but got the following errors:

  • v1.0.0 (preprint) fails at the beginning, during reading the input:
src/io/genotype_reader/genotype_reader_scanning.cpp:68: void genotype_reader::scanGenotypesPlain(): Assertion `line_unphased' failed.
  • v1.1.0 does everything, saves the output, but that has a wrong format and so I get the error (as in v1.1.1) on any attempt to use the data
Finalization: [E::vcf_format] Invalid BCF, CONTIG id=-1 not present in the header.

Could you please look into this and check if we're missing something, or if there's a bug in the current version?

Thank you very much,
Georgios

Hi George,

I've just replied to you by email and repost here in case it is useful to someone else.

Singelton phasing score will be back in the next software release (as soon as I find time to make it).
In the meantime, you could recompile the code after:

  1. Un-comenting l132 in phase_rare/src/containers/genotype_set/genotype_set_phasing.cpp.
  2. Commenting l133 in the same file.
    This will bring back singleton phasing scores in the output.

Cheers,

Hi,

Thanks a lot Olivier, this is to mention that the suggestion worked!

Just note that, for those who get the "invalid CONTIG id" error, you need to make this change in the contig branch of the project.

Best wishes,
GK