mgalardini/pyseer

SNPs GWAS

AbubakariAbdulwasid opened this issue · 6 comments

Please I am running GWAS with pyseer using the SNPs variant with pyseer --phenotypes snps.pheno --vcf snps.vcf.gz --lineage --print-samples --no-distances> SNPs.txt and had this output
Read 98 phenotypes Detected binary phenotype Writing lineage effects to lineage_effects.txt Traceback (most recent call last): File "/home/public/PublicSoftware/pyseer", line 10, in sys.exit(main()) File "/home/public/anaconda3/lib/python3.8/site-packages/pyseer/main.py", line 465, in main infile, sample_order = open_variant_file(var_type, var_file, options.burden, burden_regions, options.uncompressed) File "/home/public/anaconda3/lib/python3.8/site-packages/pyseer/input.py", line 278, in open_variant_file infile = VariantFile(var_file) File "pysam/libcbcf.pyx", line 4036, in pysam.libcbcf.VariantFile.init File "pysam/libcbcf.pyx", line 4299, in pysam.libcbcf.VariantFile.open File "pysam/libchtslib.pyx", line 516, in pysam.libchtslib.HTSFile.tell
NotImplementedError: seek not implemented in files compressed by method 1

Please is there a way I can resolved it?

I think this error might be due to the way your vcf file was compressed/indexed. What happens if you decompress it and try again?

Please I use Fastree to generate the SNP.VCF and converted it with gzip command
image

when I use the decompressed file, this is what happened Read 98 phenotypes Detected binary phenotype Writing lineage effects to lineage_effects.txt Traceback (most recent call last): File "/home/public/PublicSoftware/pyseer", line 10, in sys.exit(main()) File "/home/public/anaconda3/lib/python3.8/site-packages/pyseer/main.py", line 793, in main ret = fixed_effects_regression(*data) File "/home/public/anaconda3/lib/python3.8/site-packages/pyseer/model.py", line 359, in fixed_effects_regression max_lineage = fit_lineage_effect(lin, c, k) File "/home/public/anaconda3/lib/python3.8/site-packages/pyseer/model.py", line 175, in fit_lineage_effect lineage_mod = smf.Logit(k, X) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/discrete/discrete_model.py", line 462, in init super().init(endog, exog, check_rank, **kwargs) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/discrete/discrete_model.py", line 178, in init super().init(endog, exog, **kwargs) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/model.py", line 267, in init super().init(endog, exog, **kwargs) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/model.py", line 92, in init self.data = self._handle_data(endog, exog, missing, hasconst, File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/model.py", line 132, in _handle_data data = handle_data(endog, exog, missing, hasconst, **kwargs) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/data.py", line 673, in handle_data return klass(endog, exog=exog, missing=missing, hasconst=hasconst, File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/data.py", line 86, in init self._handle_constant(hasconst) File "/home/public/anaconda3/lib/python3.8/site-packages/statsmodels/base/data.py", line 130, in _handle_constant exog_max = np.max(self.exog, axis=0) File "<array_function internals>", line 200, in amax File "/home/public/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2820, in amax return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/home/public/anaconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

Can you provide the commands you are using? I think for the purpose of testing what is going on with your VCF file you can probably omit all the --lineage arguments

Perhaps also try compressing the vcf with bgzip (from the samtools suite) rather than gzip

Please it has worked with "pyseer --phenotypes input.pheno --vcf snps.vcf --print-samples --no-distances> output_SNPs.txt"