ultimatesource/denovogear

dng call --model=autosomal does not return results

jielab opened this issue · 4 comments

Hi,

Now I could use "dng dnm auto --ped sample.ped --bcf trio.bcf" to identify de-novo variants from a TRIO dataset. I feel that this type of task could be done by a simple text processing tool, basically to identify variants whose parents' genotype are both A/A, while the proband's genotype is A/a or a/a, correct?

Then I try to see how I could identify autosomal dominant variants from this same trio, after i changed the case status to "2" for both the proband and the father in the sample.ped file. This time, I run "dng call --model=autosomal --ped sample.dom.ped trio.bcf", but surprisingly, the program returns a VCF file without any variants. However, there are a lot of variants where the father's genotype and the proband's genotype are both A/a while the mother's genotype is A/A. Why those variants are not picked up by my command "dng call --model=autosomal --ped sample.dom.ped trio.bcf"?

Thank you & best regards,
Jie

Dear @jiehuang001,

As I've said before, it is very difficult for me to diagnose your issues without any information about the ped and bcf files that you are using.

I feel that this type of task could be done by a simple text processing tool, basically to identify variants whose parents' genotype are both A/A, while the proband's genotype is A/a or a/a, correct?

While you can call de novo mutations using a simple text processing tool, that will lead to a lot of issues as it will ignore the uncertainty of genotype information. DNG provides more accurate de novo mutation calling because integrates information about uncertainty and experimental designs into its genotype models.

Then I try to see how I could identify autosomal dominant variants from this same trio, after i changed the case status to "2" for both the proband and the father in the sample.ped file. This time, I run "dng call --model=autosomal --ped sample.dom.ped trio.bcf", but surprisingly, the program returns a VCF file without any variants. However, there are a lot of variants where the father's genotype and the proband's genotype are both A/a while the mother's genotype is A/A. Why those variants are not picked up by my command "dng call --model=autosomal --ped sample.dom.ped trio.bcf"?

Without information about your .ped and .bcf files, I can't be certain, but I believe from your description that you are trying to do something that DNG doesn't do, and are using a bad ped file. DNG doesn't know anything about phenotypes and does not use any information about dominance or case status. The fact that you appear to have a column in your ped file for case status, indicates to me that your ped file is not following the dng call's PEDNG format which doesn't have columns for case status or other phenotypes. More than likely, you are getting no results because dng call cannot connect the individuals in your .ped file to the samples in your .bcf file.

I did indicate that I am now using the testing data from your DNG website https://github.com/denovogear/testdata/tree/master/sample_CEU. I simply use the https://github.com/denovogear/testdata/blob/master/sample_CEU/sample_CEU.ped file and the sample_CEU.vcf file, so that it is easier for us to communicate and cross-check.

I'm sorry for being confused, as that information was missing from this issue. The sample_CEU is test data for dng dnm and is not compatible with dng call. The data I use to test dng call is in the human_trio directory.

Your sample_CEU.ped file does have 6 columns. I assume that the 6th column is the case/control status column, the same as the PLINK .ped file. For your sample_CEU.ped file, the 6th column for the proband has a value of 2, while the two parent have a value of 0. I think the value for parents should be 1 instead, if DNG uses the same format as PLINK.

dng dnm ignores the 6th column. It won't throw an error, but it doesn't use that information.

What I am trying to ask now is: can DNG pick up heritable mutation?

You can use dng call --all to output sites that contain segregating variants in addition to sites that contain de novo variants. However, what you are trying to do is find potentially causal variants via linkage analysis. For that you do need to use a tool like PLINK.

really frustrating with this tool and the messages. I have to decide that i have to quit this.

what i have been struggling to do is to simply find de novo and rare heritable mutation from a trio VCF file. but after so many emails, now you tell me to use testing data in human_trio directory instead of the sample_CEU directory, to use "dng call --all" instead of "dng call" instead of "dng dnm", to use other tools such as PLINK instead of DNG.

wasted too much of my time. nothing makes sense. sorry to say this. i really wish that you guys develop something nicer...