neobernad/haplotype_plot

problem for haplotype_plot input?

Opened this issue · 6 comments

I'm using the test data ("chr01.vcf") to test the command haplotype_plot, but it always return errors like following:
#############################################################
2021-02-01:19:24:59 DEBUG [reader.py:36] Converting VCF file 'haplotype_plot/tests/data/chr01.vcf'
2021-02-01:19:24:59 DEBUG [reader.py:38] File '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5' already exists. I will remove it.
2021-02-01:19:24:59 DEBUG [reader.py:41] HDF5 file stored in '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5'
2021-02-01:19:24:59 DEBUG [genotyper.py:118] Loading HDF5 file '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-master/haplotype_plot/tests/data/chr01.h5'
2021-02-01:19:24:59 ERROR [genotyper.py:72] Sample 'SAMPLE1' is not found in VCF sample list '[b'SAMPLE1', b'SAMPLE4', b'SAMPLE5', b'SAMPLE2', b'SAMPLE3', b'SAMPLE8', b'SAMPLE9', b'SAMPLE6', b'SAMPLE7', b'SAMPLE10']'
Traceback (most recent call last):
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 66, in get_sample_index
parental_sample_index = sample_list.index(sample)
ValueError: 'SAMPLE1' is not in list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/mnt/Data_disk/liangyiye/.local/bin/haplotype_plot", line 8, in
sys.exit(main())
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/main.py", line 42, in main
haplotyper.Zygosity[args.zygosis])
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 121, in process
parental_sample_index = get_sample_index(sample_list, parental_sample)
File "/mnt/Data_disk/liangyiye/.local/lib/python3.7/site-packages/haplotype_plot/genotyper.py", line 73, in get_sample_index
raise ValueError(msg)
ValueError: Sample 'SAMPLE1' is not found in VCF sample list '[b'SAMPLE1', b'SAMPLE4', b'SAMPLE5', b'SAMPLE2', b'SAMPLE3', b'SAMPLE8', b'SAMPLE9', b'SAMPLE6', b'SAMPLE7', b'SAMPLE10']'
##########################################################
My command is: haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE1" -z HOM. I am very confused about this error and I look forward to your answer.

Hi @liangyiye!

What version of haplotype_plot are you using? I have successfully run the following cmd with version 1.1.1 (latest commit so far):

haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE1" -z HOM

Which results in:

DEBUG:haplotype_plot.reader:Converting VCF file 'haplotype_plot/tests/data/chr01.vcf'
DEBUG:haplotype_plot.reader:HDF5 file stored in '/home/swit/haplotype_plot/haplotype_plot/tests/data/chr01.h5'
DEBUG:haplotype_plot.genotyper:Loading HDF5 file '/home/swit/haplotype_plot/haplotype_plot/tests/data/chr01.h5'
DEBUG:haplotype_plot.genotyper:Retrieving genotypes via 'calldata/GT' key
DEBUG:haplotype_plot.genotyper:Retrieving variants via 'variants' key
DEBUG:haplotype_plot.filter:There are 10000 variants in chromosome chr01
DEBUG:haplotype_plot.filter:There are 10000 segregating SNPs
DEBUG:haplotype_plot.filter:Number of variants to keep 10000
DEBUG:haplotype_plot.filter:Found 534 phased variant calls
INFO:haplotype_plot.plot:Saving haplotype plot in .../haplotype_plot/haplotype_plot/tests/data/haplotypes.png

I have tested this in Windows 10 and Ubuntu 14.10 (both having the h5py lib version 2.10.0 and 2.7.1)

Hi again @liangyiye,

As I could not reproduce it I had a look to the failed travis build and it seemed that it was an issue of the version of PyVCF and h5py.
This issue should be fixed in v1.1.2, according to the latest travis build.
Version v1.1.2 is available via Github tags and pypi.

Hi @neobernad,
I'm glad to hear from you. I've changed Version v1.1.1 to Version v1.1.2, and the issue has been solved. But I still have some questions about the output file.
I used the following command in order to get the haplotype-plot from the first ten variants in positions chr01:1, chr01:2 ,..., chr01:10
haplotype_plot -v "haplotype_plot/tests/data/chr01.vcf" -c "chr01" -p "SAMPLE4" -z HET --conf start=0 end=10

which results in:
2021-02-02:14:31:21 DEBUG [reader.py:36] Converting VCF file 'chr01.vcf' 2021-02-02:14:31:21 DEBUG [reader.py:38] File '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' already exists. I will remove it. 2021-02-02:14:31:22 DEBUG [reader.py:41] HDF5 file stored in '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' 2021-02-02:14:31:22 DEBUG [genotyper.py:118] Loading HDF5 file '/mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/chr01.h5' 2021-02-02:14:31:22 DEBUG [genotyper.py:36] Retrieving genotypes via 'calldata/GT' key 2021-02-02:14:31:22 DEBUG [genotyper.py:38] Retrieving variants via 'variants' key' 2021-02-02:14:31:22 DEBUG [filter.py:61] There are 10000 variants in chromosome chr01 2021-02-02:14:31:22 DEBUG [filter.py:68] There are 10000 segregating SNPs 2021-02-02:14:31:22 DEBUG [filter.py:72] Number of variants to keep 10000 2021-02-02:14:31:22 DEBUG [filter.py:105] Found 534 phased variant calls 2021-02-02:14:31:22 INFO [plot.py:205] Saving haplotype plot in /mnt/Data_disk/liangyiye/Qvari7/007genomescan/haplotype_plot/haplotype_plot-1.1.2/haplotype_plot/tests/data/haplotypes.png

Finally, I got the haplotype-plot start with position chr01:4, and end with position chr01:134. This result seems to be different from the example result in your manual.
my plot-result:
haplotypes
I'm confused how can i plot variants from position chr01:1 to chr01:10?

Hi @liangyiye

By default haplotype_plot was performing the phase stage even though --phase was not provided in the arguments. This is why the output plot is showing the variants at position 4, 59, etc., because these are the phased ones.

I had uploaded a fix in v1.1.3. Phased variants (in beta phase) are only called if --phase argument is present.

Hi @neobernad
Thank you very much! By using V1.1.3, I have plot the haplotype structure for the positions I want.

Actually, I have another question. Can I obtain the haplotype plot for the following functions: 1) When the individual's genotype is homozygous and the same as the parent (like 0/0), it is displayed as one color; 2)When the individual's genotype is homozygous but opposite to the parent (like 1/1), it is displayed as the second color; 3)When the individual's genotype is heterozygous (like 0/1,or 1/0), it is displayed as the third color; 4)When the individual's genotype is missing (like .,or ./.), it is displayed as the fourth color. Now the functions I can achieve from your program are 1), 2), 4), and I want to know how to achieve the third function?

Hi @liangyiye,

Glad to hear it has worked! :-)

In order to show both alleles for heterozygous datasets you should use the -z HET argument. By doing that you will be able to see 0/1 and 1/0 genotypes, however, the color schema changes depending on the inheritance from the parent haplotype (check color schema).
Just for your into, this tool is based on scikit-allel library, just in case you are interested in further functionalities.

Best