problem for haplotype_plot input?
Opened this issue · 6 comments
I'm using the test data ("chr01.vcf") to test the command haplotype_plot, but it always return errors like following:
#############################################################
2021-02-01:19:24:59 DEBUG [reader.py:36] Converting VCF file 'haplotype_plot/tests/data/chr01.vcf'
2021-02-01:19:24:59 DEBUG [reader.py:38] File '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5' already exists. I will remove it.
2021-02-01:19:24:59 DEBUG [reader.py:41] HDF5 file stored in '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5'
2021-02-01:19:24:59 DEBUG [genotyper.py:118] Loading HDF5 file '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5'
2021-02-01:19:24:59 ERROR [genotyper.py:72] Sample 'SAMPLE1' is not found in VCF sample list '[b'SAMPLE1', b'SAMPLE4', b'SAMPLE5', b'SAMPLE2', b'SAMPLE3', b'SAMPLE8', b'SAMPLE9', b'SAMPLE6', b'SAMPLE7', b'SAMPLE10']'
Traceback (most recent call last):
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 66, in get_sample_index
parental_sample_index = sample_list.index(sample)
ValueError: 'SAMPLE1' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/Data_disk/liangyiye/.local/bin/haplotype_plot", line 8, in
sys.exit(main())
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/main.py", line 42, in main
haplotyper.Zygosity[args.zygosis])
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 121, in process
parental_sample_index = get_sample_index(sample_list, parental_sample)
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 73, in get_sample_index
raise ValueError(msg)
ValueError: Sample 'SAMPLE1' is not found in VCF sample list '[b'SAMPLE1', b'SAMPLE4', b'SAMPLE5', b'SAMPLE2', b'SAMPLE3', b'SAMPLE8', b'SAMPLE9', b'SAMPLE6', b'SAMPLE7', b'SAMPLE10']'
##########################################################
My command is: haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE1" -z HOM. I am very confused about this error and I look forward to your answer.
Hi @liangyiye!
What version of haplotype_plot
are you using? I have successfully run the following cmd
with version 1.1.1 (latest commit so far):
haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE1" -z HOM
Which results in:
DEBUG:haplotype_plot.reader:Converting VCF file 'haplotype_plot/tests/data/chr01.vcf'
DEBUG:haplotype_plot.reader:HDF5 file stored in '/home/swit/haplotype_plot/haplotype_plot/tests/data/chr01.h5'
DEBUG:haplotype_plot.genotyper:Loading HDF5 file '/home/swit/haplotype_plot/haplotype_plot/tests/data/chr01.h5'
DEBUG:haplotype_plot.genotyper:Retrieving genotypes via 'calldata/GT' key
DEBUG:haplotype_plot.genotyper:Retrieving variants via 'variants' key
DEBUG:haplotype_plot.filter:There are 10000 variants in chromosome chr01
DEBUG:haplotype_plot.filter:There are 10000 segregating SNPs
DEBUG:haplotype_plot.filter:Number of variants to keep 10000
DEBUG:haplotype_plot.filter:Found 534 phased variant calls
INFO:haplotype_plot.plot:Saving haplotype plot in .../haplotype_plot/haplotype_plot/tests/data/haplotypes.png
I have tested this in Windows 10 and Ubuntu 14.10 (both having the h5py lib version 2.10.0 and 2.7.1)
Hi again @liangyiye,
As I could not reproduce it I had a look to the failed travis build and it seemed that it was an issue of the version of PyVCF
and h5py
.
This issue should be fixed in v1.1.2, according to the latest travis build.
Version v1.1.2 is available via Github tags and pypi.
Hi @neobernad,
I'm glad to hear from you. I've changed Version v1.1.1 to Version v1.1.2, and the issue has been solved. But I still have some questions about the output file.
I used the following command in order to get the haplotype-plot from the first ten variants in positions chr01:1, chr01:2 ,..., chr01:10
haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE4" -z HET --conf start=0 end=10
which results in:
2021-02-02:14:31:21 DEBUG [reader.py:36] Converting VCF file 'chr01.vcf' 2021-02-02:14:31:21 DEBUG [reader.py:38] File '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' already exists. I will remove it. 2021-02-02:14:31:22 DEBUG [reader.py:41] HDF5 file stored in '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' 2021-02-02:14:31:22 DEBUG [genotyper.py:118] Loading HDF5 file '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' 2021-02-02:14:31:22 DEBUG [genotyper.py:36] Retrieving genotypes via 'calldata/GT' key 2021-02-02:14:31:22 DEBUG [genotyper.py:38] Retrieving variants via 'variants' key' 2021-02-02:14:31:22 DEBUG [filter.py:61] There are 10000 variants in chromosome chr01 2021-02-02:14:31:22 DEBUG [filter.py:68] There are 10000 segregating SNPs 2021-02-02:14:31:22 DEBUG [filter.py:72] Number of variants to keep 10000 2021-02-02:14:31:22 DEBUG [filter.py:105] Found 534 phased variant calls 2021-02-02:14:31:22 INFO [plot.py:205] Saving haplotype plot in /mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/haplotypes.png
Finally, I got the haplotype-plot start with position chr01:4, and end with position chr01:134. This result seems to be different from the example result in your manual.
my plot-result:
I'm confused how can i plot variants from position chr01:1 to chr01:10?
Hi @liangyiye
By default haplotype_plot
was performing the phase
stage even though --phase
was not provided in the arguments. This is why the output plot is showing the variants at position 4, 59, etc., because these are the phased ones.
I had uploaded a fix in v1.1.3
. Phased variants (in beta phase) are only called if --phase
argument is present.
Hi @neobernad,
Thank you very much! By using V1.1.3, I have plot the haplotype structure for the positions I want.
Actually, I have another question. Can I obtain the haplotype plot for the following functions: 1) When the individual's genotype is homozygous and the same as the parent (like 0/0), it is displayed as one color; 2)When the individual's genotype is homozygous but opposite to the parent (like 1/1), it is displayed as the second color; 3)When the individual's genotype is heterozygous (like 0/1,or 1/0), it is displayed as the third color; 4)When the individual's genotype is missing (like .,or ./.), it is displayed as the fourth color. Now the functions I can achieve from your program are 1), 2), 4), and I want to know how to achieve the third function?
Hi @liangyiye,
Glad to hear it has worked! :-)
In order to show both alleles for heterozygous datasets you should use the -z HET
argument. By doing that you will be able to see 0/1 and 1/0 genotypes, however, the color schema changes depending on the inheritance from the parent haplotype (check color schema).
Just for your into, this tool is based on scikit-allel library, just in case you are interested in further functionalities.
Best