COMBINE-lab/maximum-likelihood-relatedness-estimation

What fields are required on the VCF file?

inti opened this issue · 5 comments

inti commented

Hi,
My VCF file has the PL field.
I am trying to format the missing value to use this file as input.

The original file looks like

Chr01   11582413        .       C       T       1       PASS    AC=5;AF=0.625;AN=8;CIGAR=1X;DP=116;GTI=0;LEN=1;NUMALT=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;technology.illumina=0 GT:AO:DP:PL:QA:QR:RO    .       .       .       .       .       .       .       .       .       .       .       .       .       .   .
        .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .   .
        .       .       .       .       .       .       .       .       0/0:0:4:0,12,116:0:125:4        1/1:6:7:172,18,0:242:0:0        1/1:1:1:40,3,0:40:0:0
        0/1:10:17:308,0,189:395:263:7

the I tried reformatting to this but it failed

Chr01   11582413        .       C       T       1       PASS    AC=5;AF=0.625;AN=8;CIGAR=1X;DP=116;GTI=0;LEN=1;NUMALT=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;technology.illumina=0 GT:AO:DP:PL:QA:QR:RO    .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.
        .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.        .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.
        .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.        .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   .:.:.:.:.:.:.   0/0:0:4:0,12,116:0:125:4    1/1:6:7:172,18,0:242:0:0 1/1:1:1:40,3,0:40:0:0   0/1:10:17:308,0,189:395:263:7

After looking at some of your example files I reformatted to this

Chr01   11582413        .       C       T       1       PASS    AC=5;AF=0.625;AN=8;CIGAR=1X;DP=116;GTI=0;LEN=1;NUMALT=1;PAO=0;PQA=0;PQR=0;PRO=0;RUN=1;TYPE=snp;technology.illumina=0    GT:AO:DP:PL:QA:QR:RO    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0
    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    ./.:0:0:-9.0,-9.0,-9.0:0:0:0    0/0:0:4:0,12,116:0:125:4        1/1:6:7:172,18,0:242:0:0        1/1:1:1:40,3,0:40:0:0   0/1:10:17:308,0,189:395:263:7

So the original was ./. or . and .:.:.:.:.:.:. and ./.:0:0:-9.0,-9.0,-9.0:0:0:0 failed with error.

1 - How should I format the genotype information on this vcf file (GT:AO:DP:PL:QA:QR:RO)?
2 - What is the ML field you have on your vcf files?
3 - should I strip out my vcf to leave only with the GT:PL fields?

Thanks in advance

Hi Inti,
Sorry for the late response to this, I will try and get to the two issues you found as soon as I can.

Best
Krishna

inti commented

Cheers!

On Sep 17, 2016, at 14:28, Krishna Veeramah notifications@github.com wrote:

Hi Inti,
Sorry for the late response to this, I will try and get to the two issues you found as soon as I can.

Best
Krishna


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

inti commented

@kveeramah any new on this?

Hi Inti,
Sorry it took me so long to get back to you. If you are using a file with PL fields, then the .:.:.:.:.:.:. notation for missing data should work, the -9 is only applicable when using actual genotype likelihoods. It shouldn't be necessary to strip out other fields, lcMLkin should just ignore them.

What exactly is the error you were getting in the first instance? Perhaps you could send me your original input file (or perhaps the first 100 SNPs of so)?

The ML fields stands for maximum likelihood but isn't used by lcMLkin, it's just output as part of our custom caller.

Cheers
Krishna

inti commented

Hi,
an example file can be found in here https://gist.github.com/inti/c166251db6c24541a918cb26138dc8ae

Any ideas how to make it work with files formatted as this one?

I try with all three options

/home/shared/app/maximum-likelihood-relatedness-estimation/lcmlkin -i test.vcf -o kinship.relate -g all -t 8 -l raw
lcMLkin v0.5.0
==============================
using 8 threads
Populating Data
[2016-12-21 00:50:02.268] [jointLog] [info] Genotype likelihood format is RAW
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi
Abortado
/home/shared/app/maximum-likelihood-relatedness-estimation/lcmlkin -i test.vcf -o kinship.relate -g all -t 8 -l log
lcMLkin v0.5.0
==============================
using 8 threads
Populating Data
[2016-12-21 00:50:25.054] [jointLog] [info] Genotype likelihood format is LOG
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi
Abortado
/home/shared/app/maximum-likelihood-relatedness-estimation/lcmlkin -i test.vcf -o kinship.relate -g all -t 8 -l phred
lcMLkin v0.5.0
==============================
using 8 threads
Populating Data
[2016-12-21 00:50:45.228] [jointLog] [info] Genotype likelihood format is PHRED
[2016-12-21 00:50:45.237] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.243] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.250] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.257] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.264] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.271] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.278] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.285] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.292] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.299] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.305] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.312] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.319] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.326] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.333] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.340] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.346] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.353] [jointLog] [warning] No PL field in variant: Chr01
[2016-12-21 00:50:45.353] [jointLog] [info] Parsed variants at 18 total sites
[2016-12-21 00:50:45.353] [jointLog] [info] allele freqs = 0
[2016-12-21 00:50:45.353] [jointLog] [info] gt likelihoods = 0
number of SNPs masked (freq 0) = 0
number of SNPs masked (freq 1) = 0
Calculating Prestored IBS|IBD
lcmlkin: include/Eigen/src/Core/DenseCoeffsBase.h:394: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::DenseCoeffsBase<Derived, 1>::Index) [with Derived = Eigen::Matrix<double, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = double; Eigen::DenseCoeffsBase<Derived, 1>::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Abortado