key error for query with empty result
Closed this issue · 6 comments
- UNFCCC DI API version: 2.0.1
- Python version: 3.8.10
- Operating System: Linux
Description
When using query (single category interface) and there is no data available a key error is thrown instead of returning an empty dataframe or "None" or a message saying that there are no results.
What I Did
import unfccc_di_api
reader = unfccc_di_api.UNFCCCApiReader()
test = reader.non_annex_one_reader.query(party_codes=party_codes_nai, category_ids=[14817])
Hm, but that is what I would expect, no? KeyError
means "no data for this key", and can be handled programmatically. None
would be terribly wrong (the query
function usually returns a dataframe, returning None
will just lead to confusing downstream errors after df = query(…)
), returning a message is also not type-safe and will be confusing. The only other option I see would be an empty dataframe. But why? Usually, the user can't really do anything useful with an empty dataframe, and failing early with a KeyError ensures that the user doesn't waste their time trying any analysis on the empty results.
We could have our own class NoDataError
inheriting from KeyError
to make it easy to distinguish this error from other KeyErrors. Has the advantage that it is easy to catch this specific error, has the disadvantage that the meaning of unfccc_di_api.NoDataError
is less immediately obvious to Python people that KeyError
.
Compare what we do with what pandas does:
In [1]: import pandas as pd
In [3]: df = pd.DataFrame([{"a": 2, "b": 3}, {"a": 4, "b": 12}], index=["first", "second"])
In [4]: df
Out[4]:
a b
first 2 3
second 4 12
In [5]: df.loc["third"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'third'
My problems with the current key error are
1.) If you run stuff for several queries in a row you would need to catch it, else your code fails. That of course is doable. You might not want to delete the query from your list as you don't want to manually check before every run if any of the formerly empty categories now have data.
2.) The error thrown is for key "party" so it's not really obvious that your query result is empty, I think. As said in the original issue text, I would be happy with an error that says "no data".
True, KeyError for party
looks like the party doesn't exist, that's bad. So a good solution would be the unfccc_di_api.EmptyQueryResultError
, I guess? Or would you prefer an empty dataframe?
I think an error is best as empty dataframes can also pose problems when working with them so the error would just occur a bit later.