primap-community/unfccc_di_api

key error for query with empty result

Closed this issue · 6 comments

  • UNFCCC DI API version: 2.0.1
  • Python version: 3.8.10
  • Operating System: Linux

Description

When using query (single category interface) and there is no data available a key error is thrown instead of returning an empty dataframe or "None" or a message saying that there are no results.

What I Did

import unfccc_di_api

reader = unfccc_di_api.UNFCCCApiReader()
test = reader.non_annex_one_reader.query(party_codes=party_codes_nai, category_ids=[14817])

Hm, but that is what I would expect, no? KeyError means "no data for this key", and can be handled programmatically. None would be terribly wrong (the query function usually returns a dataframe, returning None will just lead to confusing downstream errors after df = query(…)), returning a message is also not type-safe and will be confusing. The only other option I see would be an empty dataframe. But why? Usually, the user can't really do anything useful with an empty dataframe, and failing early with a KeyError ensures that the user doesn't waste their time trying any analysis on the empty results.

We could have our own class NoDataError inheriting from KeyError to make it easy to distinguish this error from other KeyErrors. Has the advantage that it is easy to catch this specific error, has the disadvantage that the meaning of unfccc_di_api.NoDataError is less immediately obvious to Python people that KeyError.

Compare what we do with what pandas does:

In [1]: import pandas as pd

In [3]: df = pd.DataFrame([{"a": 2, "b": 3}, {"a": 4, "b": 12}], index=["first", "second"])

In [4]: df
Out[4]: 
        a   b
first   2   3
second  4  12

In [5]: df.loc["third"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'third'

My problems with the current key error are
1.) If you run stuff for several queries in a row you would need to catch it, else your code fails. That of course is doable. You might not want to delete the query from your list as you don't want to manually check before every run if any of the formerly empty categories now have data.
2.) The error thrown is for key "party" so it's not really obvious that your query result is empty, I think. As said in the original issue text, I would be happy with an error that says "no data".

True, KeyError for party looks like the party doesn't exist, that's bad. So a good solution would be the unfccc_di_api.EmptyQueryResultError, I guess? Or would you prefer an empty dataframe?

I think an error is best as empty dataframes can also pose problems when working with them so the error would just occur a bit later.