Handle error message when saving invalid genotype files to VCF?
Closed this issue · 2 comments
PhilPalmer commented
I think the error could be handled when trying to save invalid genotype files to VCF.
To reproduce
For example, if I run:
# create invalid empty file
touch file.txt
python
>>> from snps import SNPs
>>> s = SNPs('file.txt', output_dir='.')
# error message is handled nicely when vcf is false
>>> saved_snps = s.save_snps('test.txt', vcf=False)
no data to save...
# error message is not handled when vcf is true
>>> saved_snps = s.save_snps('test.vcf', vcf=True)
Then I get the following error message:
Traceback (most recent call last):
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'genotype'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/__init__.py", line 394, in save_snps
snps=self, filename=filename, vcf=vcf, atomic=atomic, **kwargs
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 951, in write_file
return w()
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 924, in __call__
return self._write_vcf()
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 1060, in _write_vcf
df["genotype"].notnull()
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'genotype'
Any ideas how this could be improved? Perhaps catching the error here?
Thanks in advance
willgdjones commented
Shouldn't this be handled when the file is read into snps
? If the file is empty, or doesn't conform to any known format, then we should perhaps flag this to the user as soon as they try to load the data.
PhilPalmer commented
Yeah good point @willgdjones. I agree it would be better if this is handled when reading the file into snps
@apriha, awesome, thanks for implementing this so fast!