PharmGKB/PGxPOP

Can PGxPOP handle unphased VCFs?

Closed this issue · 1 comments

cqgd commented

Hi Greg, Adam,

Many thanks for releasing this tool and for providing a nice overview of CYP AF in UKB!

One question: does PGxPOP handle unphased VCFs?

--phased being an optional argument seems to suggest the input can be either phased or unphased:

    ________________________________________
    |      ___  ___     ___  ___  ___        |
    |     | _ \/ __|_ _| _ \/\  \| _ \       | 
    |     |  _/ (_ \ \ /  _/  \  |  _/       |
    |     |_|  \___/_\_\_|  \__\/|_|         |
    |                                        |
    |                 v1.0                   |
    |              Written by                |     
    |     Adam Lavertu and Greg McInnes      |
    |        with help from PharmGKB.        |
    |________________________________________|
    
Copyright (C) 2020 Stanford University.
Distributed under the Mozilla Public License 2.0 open source license.
    
usage: PGxPOP.py [-h] [-f VCF] [-g GENE] [--phased] [--build BUILD] [--extra_variants] [-d] [-b] [-o OUTPUT]

CityDawg determines star allele haplotypes for samples in a VCF file and outputs predicted pharmacogenetic phenotypes.

optional arguments:
  -h, --help            show this help message and exit
  -f VCF, --vcf VCF     Input VCF
  -g GENE, --gene GENE  Gene to run. Select from list. Run all by default. CFTR, CYP2C9, CYP2D6, CYP4F2, IFNL3, TPMT, VKORC1, CYP2C19,
                        CYP3A5, DPYD, SLCO1B1, UGT1A1, CYP2B6, NUDT15
  --phased              Data is phased. Will try to determine phasing status from VCF by default.
(...)

The GitHub README.md, on the other hand, mentions only phased data input:

PGxPOP is a population-scale PGx allele caller designed to handle 100,000s of samples. Input is a phased VCF file, that has been indexed with tabix.

Many thanks,

Chris

Hi Chris, While PGxPOP accepts unphased input and will describe the pgx variants it identifies at those loci, it will not be able to give proper star allele calls as those require phase information to determine haplotypes. We suggest phasing samples prior to PGxPOP runs with EAGLE or a similar tool.