Incompatibility with Pandas 1.1.0
Opened this issue · 0 comments
lizziel commented
I ran into an error reading a GEOS-Chem bpch file after upgrading to Pandas to 1.1.0. I traced the problem to this section of code in util/diaginfo.py:
tracer_df = (
tracer_df
.apply(_assign_hydrocarbon, axis=1)
.assign(chemical=lambda x: x['molwt'].astype(bool))
)
Before that code is executed tracer_df correctly stores tracerinfo.dat content:
name full_name molwt C tracer scale \
0 ACET ACET tracer 0.01200 3 1 1.000000e+09
1 ACTA ACTA tracer 0.06006 1 2 1.000000e+09
2 AERI AERI tracer 0.12690 1 3 1.000000e+09
3 ALD2 ALD2 tracer 0.01200 2 4 1.000000e+09
Following the apply, all rows are for ACET which is wrong:
name full_name molwt C tracer scale unit hydrocarbon \
0 ACET ACET tracer 0.012 3 1 1.000000e+09 ppbC False
1 ACET ACET tracer 0.012 3 1 1.000000e+09 ppbC False
2 ACET ACET tracer 0.012 3 1 1.000000e+09 ppbC False
3 ACET ACET tracer 0.012 3 1 1.000000e+09 ppbC False
I was able to fix it by initializing the new column 'hydrocarbon' prior to the apply:
tracer_df['hydrocarbon']=False
tracer_df = (
tracer_df
.apply(_assign_hydrocarbon, axis=1)
.assign(chemical=lambda x: x['molwt'].astype(bool))
)
I downgraded my pandas version to 0.25.1 and verified this was not necessary in that older version, but it is in the new version.
Here is the error message I got to help others find this issue via search:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2888 try:
-> 2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-43-2d7bd5a6928f> in <module>
2 ds = xb.open_bpchdataset(filename=gcc_bpch,
3 tracerinfo_file=tracerinfo_f,
----> 4 diaginfo_file=diaginfo_f)
5 except FileNotFoundError:
6 print('Could not find file {}'.format(bpchfile))
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
79 tracerinfo_file=tracerinfo_file,
80 diaginfo_file=diaginfo_file, endian=endian,
---> 81 use_mmap=memmap, dask_delayed=dask
82 )
83 ds = xr.Dataset.load_store(store)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in __init__(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
278
279 # Parse the binary file and prepare to add variables to the DataStore
--> 280 self._bpch._read_var_data()
281
282 # Create storage dicts for variables and attributes, to be used later
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/bpch.py in _read_var_data(self)
312 var_attr['unit'] = unit
313
--> 314 vname = diag['name']
315 fullname = category_name.strip() + "_" + vname
316
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2897 if self.columns.nlevels > 1:
2898 return self._getitem_multilevel(key)
-> 2899 indexer = self.columns.get_loc(key)
2900 if is_integer(indexer):
2901 indexer = [indexer]
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
-> 2891 raise KeyError(key) from err
2892
2893 if tolerance is not None:
KeyError: 'name'