`join_with` - transform single elements too
Closed this issue · 10 comments
Hi! funcy
is a great tool. I especially like join_with
because it easily and fast merges many dicts into one in, for example, a defaultdict(list)
style. However, I noticed that if, for example, a list of dicts contains only one element - join_with
doesn't change its structure. It could be very useful to change them too:
list_of_dicts = [{1: 2, 3: 4, "foo": "bar"}]
# current implementation
join_with(list, list_of_dicts) -> {1: 2, 3: 4, "foo": "bar"}
# proposed implementation
join_with(list, list_of_dicts) -> {1: [2], 3: [4], "foo": ["bar"]}
Hmm, it's a bit weird if that works that way. But would be a backwards incompatible change.
Not a big fan of boolean flags )
Current behavior kind of makes sense, but it breaks types, so I see how that one is not convenient. The thing is also complicated by the fact that join_with()/merge_with()
are two phase, i.e. they are a composition of grouping values and processing those lists. funcy
also has zip_values()/zip_dicts()
, which also group dict values, but only leave the intersection of the keys. Another, option is to use reduce on resulting lists instead of a summarizing function. Many options.
Maybe some lower level tool is needed here. Which leads me to a question - what use case do you have at hand?
I'm writing a package to download JSON data using requests
. However, I use join_with()
not only with JSON dicts but with pandas
DataFrames or .csv files. It's really a very useful tool.
Thanks, I meant more code-wise though. What do you pass to join_with()
? Is it list
everywhere or smth else? Why do you need to use it for a single dict?
I use list
of dicts everywhere. Sometimes I need to convert dicts to pandas
-like style in order to further convert one dict to DataFrame
. Usually, these lists contain more than one dict. In that cases join_with()
works perfectly. However, if a list has only one dict, join_with()
doesn't change its structure which later raises ValueError
:
from pprint import pprint
from funcy import join_with
from pandas import DataFrame
list_of_dicts_corr = [
{1: "foo", 3: "bar", "foo": 1},
{1: 2, 3: 4, "foo": "bar"}
]
list_of_dicts_err = [{1: 2, 3: 4, "foo": "bar"}]
joined_with_corr = join_with(list, list_of_dicts_corr)
joined_with_err = join_with(list, list_of_dicts_err)
pprint(joined_with_corr)
pprint(joined_with_err)
df_corr = DataFrame.from_dict(joined_with_corr)
pprint(df_corr)
df_err = DataFrame.from_dict(joined_with_err)
pprint(df_err)
Output:
{1: ['foo', 2], 3: ['bar', 4], 'foo': [1, 'bar']}
{1: 2, 3: 4, 'foo': 'bar'}
1 3 foo
0 foo bar 1
1 2 4 bar
Traceback (most recent call last):
File "c:\join with test.py", line 21, in <module>
df_err = DataFrame.from_dict(joined_with_err)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 1677, in from_dict
return cls(data, index=index, columns=columns, dtype=dtype)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 636, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 502, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 120, in arrays_to_mgr
index = _extract_index(arrays)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py", line 664, in _extract_index
raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index
Of course, there is a workaround - first, dict should be converted to Series
and later to DataFrame
. Still, it's rather unconvenient I think.
So it's join_with(list, ...)
. That being such a special case might warrant extracting it into a separate util. Or complementing zip_dicts()
somehow might be another approach.
I will think about this.
Thanks! Let us know what you are going to do :).
Added strict
param. Looks like the easiest way to fix it now. Let life show us whether we'll need a more generic solution.
Thanks. Works like a charm. In my case it'll spare a couple of lines of code :).
By the way - I had to install it from GitHub. When (approximately) could it be available on PyPI via pip
?