Suor/funcy

`join_with` - transform single elements too

Closed this issue · 10 comments

Hi! funcy is a great tool. I especially like join_with because it easily and fast merges many dicts into one in, for example, a defaultdict(list) style. However, I noticed that if, for example, a list of dicts contains only one element - join_with doesn't change its structure. It could be very useful to change them too:

list_of_dicts = [{1: 2, 3: 4, "foo": "bar"}]

# current implementation
join_with(list, list_of_dicts) -> {1: 2, 3: 4, "foo": "bar"}

# proposed implementation
join_with(list, list_of_dicts) -> {1: [2], 3: [4], "foo": ["bar"]}
Suor commented

Hmm, it's a bit weird if that works that way. But would be a backwards incompatible change.

Suor commented

Not a big fan of boolean flags )

Current behavior kind of makes sense, but it breaks types, so I see how that one is not convenient. The thing is also complicated by the fact that join_with()/merge_with() are two phase, i.e. they are a composition of grouping values and processing those lists. funcy also has zip_values()/zip_dicts(), which also group dict values, but only leave the intersection of the keys. Another, option is to use reduce on resulting lists instead of a summarizing function. Many options.

Maybe some lower level tool is needed here. Which leads me to a question - what use case do you have at hand?

I'm writing a package to download JSON data using requests. However, I use join_with() not only with JSON dicts but with pandas DataFrames or .csv files. It's really a very useful tool.

Suor commented

Thanks, I meant more code-wise though. What do you pass to join_with()? Is it list everywhere or smth else? Why do you need to use it for a single dict?

I use list of dicts everywhere. Sometimes I need to convert dicts to pandas-like style in order to further convert one dict to DataFrame. Usually, these lists contain more than one dict. In that cases join_with() works perfectly. However, if a list has only one dict, join_with() doesn't change its structure which later raises ValueError:

from pprint import pprint
from funcy import join_with
from pandas import DataFrame

list_of_dicts_corr = [
    {1: "foo", 3: "bar", "foo": 1},
    {1: 2, 3: 4, "foo": "bar"}
]

list_of_dicts_err = [{1: 2, 3: 4, "foo": "bar"}]

joined_with_corr = join_with(list, list_of_dicts_corr)
joined_with_err = join_with(list, list_of_dicts_err)

pprint(joined_with_corr)
pprint(joined_with_err)

df_corr = DataFrame.from_dict(joined_with_corr)
pprint(df_corr)

df_err = DataFrame.from_dict(joined_with_err)
pprint(df_err)

Output:

{1: ['foo', 2], 3: ['bar', 4], 'foo': [1, 'bar']}
{1: 2, 3: 4, 'foo': 'bar'}
     1    3  foo
0  foo  bar    1
1    2    4  bar
Traceback (most recent call last):
  File "c:\join with test.py", line 21, in <module>
    df_err = DataFrame.from_dict(joined_with_err)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 1677, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 664, in _extract_index
    raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index

Of course, there is a workaround - first, dict should be converted to Series and later to DataFrame. Still, it's rather unconvenient I think.

Suor commented

So it's join_with(list, ...). That being such a special case might warrant extracting it into a separate util. Or complementing zip_dicts() somehow might be another approach.

I will think about this.

Thanks! Let us know what you are going to do :).

Suor commented

Added strict param. Looks like the easiest way to fix it now. Let life show us whether we'll need a more generic solution.

Thanks. Works like a charm. In my case it'll spare a couple of lines of code :).

By the way - I had to install it from GitHub. When (approximately) could it be available on PyPI via pip?