kaizen-ai/kaizenflow

Creating a helper function `dassert_no_duplicates_dict_keys` in hpandas

smitpatel49 opened this issue · 6 comments

Follow up on #1075

We want to add a new helper function dassert_no_duplicates_dict_keys here.

Also, we need to add unit tests for the new function. Apart from that, we want to remove hdbg.dassert_no_duplicates calls in the codebase with dictionary keys as an input.

FYI @samarth9008

I just wanted some clarity on creating this function. Are we trying to stop overwriting the key entry if it already exists or we want the latest entry as Python does by default. @gpsaggese, @samarth9008

We want to check and assert if there are duplicate entries.

For dictionary keys that can be achieved using dassert_no_duplicates and it will pass because of how python works.

Could there be any other way to check the duplicate keys? like a hack or something out of the box.

I don't think there is a solution to check for a duplicate key entry if it is done in the dictionary itself, i.e. while defining it. For example:

dict_ = {
            "dummy_value_1": "1, 2",
            "dummy_value_2": "A, B",
            "dummy_value_1": "4, 5",
        }

will simply be overwritten and we will get:

dict_ = {
            "dummy_value_1": "4, 5",
            "dummy_value_2": "A, B",
        }

If we want to update the dictionary we can put a condition to provide a new key entry that does not exist previously or if we are creating a dictionary using something like a list which has multiple key entries and want to append/ update rather than overwriting it we can use something like defaultdict .

Any word on this one @samarth9008.