AlexTISYoung/snipar

KeyError: 'InfType' in impute.py

AnnabelPerry opened this issue · 3 comments

Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:

023-06-27 14:22:47,023 INFO impute - main: creating pedigree ...
2023-06-27 14:22:47,106 INFO preprocess_data - create_pedigree: loaded kinship file
Traceback (most recent call last):
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'InfType'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 432, in 
    main(args)
  File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 239, in main
    pedigree = create_pedigree(args.king, args.agesex)
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/snipar/imputation/preprocess_data.py", line 93, in create_pedigree
    mz_kin = kinship.loc[kinship['InfType']=='Dup/MZ']
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 'InfType'

Here is the code I ran:

source activate sniparEnv
unset PYTHONPATH

impute.py -c --ibd IBD_Chr@ --bgen chr@ --out Imputed_Chr@ --king FirstDegreeKING_forImputation.kin0 --agesex FirstDegreeAgeSex_forImputation.txt

Here is a description of my inputs:
IBD_Chr@ IBD information generated using the ibd.py command-line script

chr@ phased chromosomal information.

FirstDegreeKING_forImputation.kin0
This file is derived from a KING output file.
This file was generated by a collaborator and only had values for the "ID1", "ID2", "HetHet", "IBS0", "Kinship", and "InfType" columns, so I restricted the file to just the following columns requested in the snipar documentation for the impute.py --king flag: FID1 ID1 FID2 ID2 InfType.
Both the FID1 and FID2 columns are filled with NAs
I also restricted the file to only first degree relatives (Kinship >= 0.177) to save space.
When I got the error the first time, I double-checked that the file was separated by single spaces. The error persisted. Since the original InfType column contained only 'PO' and 'FS', I replaced the InfType values of all individuals with Kinship>0.354 with 'Dup/MZ' and re-ran the code. The error is still persisting.

FirstDegreeAgeSex_forImputation.txt
Describes the age and sex of all individuals in FirstDegreeKING_forImputation.kin0 . The columns are “FID”, “IID”, “FATHER_ID”, “MOTHER_ID”, “sex”, “age”. The "FID" column contains all NAs, while the FATHER_ID column is NA unless the individual in the IID column has a PO relationship with a male who is at least 12 years older. Likewise, the MOTHER_ID column is NA unless the individual in the IID column as a PO relationship with a female who is at least 12 years older.

Thanks! Switching the files to tab-separation worked