groupby only grouping consecutive occurrences
Closed this issue · 5 comments
I want to group tuples with respect to a common second entry. however, the following code
valuePairs = [(:A, :Hello),
(:B, :Bye),
(:C, :Hello),
(:D, :Hello),
(:E, :Bye),
(:F, :Bye),
(:G, :Hello),
(:H, :Bye)]
kk = Iterators.groupby(valuePairs, x -> x[2])
for ii in kk
@show ii
end
displays
ii => [(:A,:Hello)]
ii => [(:B,:Bye)]
ii => [(:C,:Hello),(:D,:Hello)]
ii => [(:E,:Bye),(:F,:Bye)]
ii => [(:G,:Hello)]
ii => [(:H,:Bye)]
Is this intended? I'd expect tuples to be split in two groups only: those showing :Hello as first entry, and those showing :Bye.
That actually is the intended behavior. It's the same behavior as groupBy
in haskell or group-by
in clojure. The sort of grouping behavior you want would be useful, but not a particularly good fit for an iterators package, since if I did groupby(values)
, it would have to iterate through all of values before the groupby
iterator produced anything. I think that would be better implemented a function that just returns a Dict, rather than an iterator.
Thanks for the explanation, and sorry for the false alert.
For reference:
The following languages do require consecutive (as is the current state of Iterators.jl):
The following languages do not care about consecutive (and have this method in there Iterators type library)
- .Net/C#, F#, VB etc
- Groovy
- Scalar , I'm not sure if this is in an iterators type library or not, Scalr doc is so hard to read, so I'm linking to a tutorial page.
The following do not care about consecutive, and have this in their core library:
- Mathematica
- clojure, you may have been thinking of partition-by
I'm not sure if I would suggest changing the method or not, since it is definitely a breaking change. And matching python is good, given user overlap.
But I would suggest that not requiring consecutive is the more common approach
The following languages do not care about consecutive (and have this method in there Iterators type library)
- Python
Actually, Python does care that the values are consecutive:
In [7]: valuePairs = [("A", "Hello"), ("B", "Bye"), ("C", "Hello"), ("D", "Hello"), ("E", "Bye"), ("F", "Bye"), ("G", "Hello")]
In [8]: [(x[0], list(x[1])) for x in groupby(valuePairs, lambda x: x[1])]
Out[8]:
[('Hello', [('A', 'Hello')]),
('Bye', [('B', 'Bye')]),
('Hello', [('C', 'Hello'), ('D', 'Hello')]),
('Bye', [('E', 'Bye'), ('F', 'Bye')]),
('Hello', [('G', 'Hello')])]
Oops. Fixed. Matching pythons behavior seems most important.