apriha/snps

Handle error message when saving invalid genotype files to VCF?

Closed this issue · 2 comments

I think the error could be handled when trying to save invalid genotype files to VCF.

To reproduce

For example, if I run:

# create invalid empty file
touch file.txt
python
>>> from snps import SNPs
>>> s = SNPs('file.txt', output_dir='.')
# error message is handled nicely when vcf is false
>>> saved_snps = s.save_snps('test.txt', vcf=False)
no data to save...
# error message is not handled when vcf is true
>>> saved_snps = s.save_snps('test.vcf', vcf=True)

Then I get the following error message:

Traceback (most recent call last):
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'genotype'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/__init__.py", line 394, in save_snps
    snps=self, filename=filename, vcf=vcf, atomic=atomic, **kwargs
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 951, in write_file
    return w()
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 924, in __call__
    return self._write_vcf()
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/snps/io.py", line 1060, in _write_vcf
    df["genotype"].notnull()
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/conda/envs/common-latest-geno/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'genotype'

Any ideas how this could be improved? Perhaps catching the error here?

Thanks in advance

Shouldn't this be handled when the file is read into snps? If the file is empty, or doesn't conform to any known format, then we should perhaps flag this to the user as soon as they try to load the data.

Yeah good point @willgdjones. I agree it would be better if this is handled when reading the file into snps

@apriha, awesome, thanks for implementing this so fast!