darothen/xbpch

Incompatibility with Pandas 1.1.0

Opened this issue · 0 comments

I ran into an error reading a GEOS-Chem bpch file after upgrading to Pandas to 1.1.0. I traced the problem to this section of code in util/diaginfo.py:

    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

Before that code is executed tracer_df correctly stores tracerinfo.dat content:

       name                       full_name    molwt  C  tracer         scale  \
0      ACET                     ACET tracer  0.01200  3       1  1.000000e+09   
1      ACTA                     ACTA tracer  0.06006  1       2  1.000000e+09   
2      AERI                     AERI tracer  0.12690  1       3  1.000000e+09   
3      ALD2                     ALD2 tracer  0.01200  2       4  1.000000e+09  

Following the apply, all rows are for ACET which is wrong:

      name    full_name  molwt  C  tracer         scale  unit  hydrocarbon  \
0     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
1     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
2     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
3     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   

I was able to fix it by initializing the new column 'hydrocarbon' prior to the apply:

    tracer_df['hydrocarbon']=False                                                                     
    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

I downgraded my pandas version to 0.25.1 and verified this was not necessary in that older version, but it is in the new version.

Here is the error message I got to help others find this issue via search:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'name'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
<ipython-input-43-2d7bd5a6928f> in <module>
      2     ds = xb.open_bpchdataset(filename=gcc_bpch,
      3                              tracerinfo_file=tracerinfo_f,
----> 4                              diaginfo_file=diaginfo_f)
      5 except FileNotFoundError:
      6     print('Could not find file {}'.format(bpchfile))
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
     79         tracerinfo_file=tracerinfo_file,
     80         diaginfo_file=diaginfo_file, endian=endian,
---> 81         use_mmap=memmap, dask_delayed=dask
     82     )
     83     ds = xr.Dataset.load_store(store)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in __init__(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
    278 
    279         # Parse the binary file and prepare to add variables to the DataStore
--> 280         self._bpch._read_var_data()
    281 
    282         # Create storage dicts for variables and attributes, to be used later
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/bpch.py in _read_var_data(self)
    312             var_attr['unit'] = unit
    313 
--> 314             vname = diag['name']
    315             fullname = category_name.strip() + "_" + vname
    316 
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:
KeyError: 'name'