Order of method arguments different from Base
Closed this issue · 9 comments
In Base, functions like filter
, reduce
and map
take arguments
func(f::Function, xs)
The implementations of these methods in Spark.jl take arguments in the opposite order
func(rdd::RDD, f::Function)
Might it make sense to align the order of method arguments with Base to make it more Julian?
Yeah, I thought about it, but this way RDD API will become inconsistent - currently all methods take rdd
as its first parameter, e.g.:
map(rdd, f)
which roughly resembles method call
rdd.map(f)
Changing the order will break this consistency. Although, given that parameters are typed, we can introduce both versions - map(rdd, f)
and map(f, rdd)
.
Ah okay. I think it would be a good idea to add func(f, rdd)
for the times where users think more Juliay than Sparky.
Deal! Give me a couple of days to sort out urgent tasks and I will add these methods.
Is there anything beyond reduce
, map
and filter
that needs to be updated?
Hmm, I'm now thinking what to do with things like flat_map(rdd, f)
- on one hand, there's no Julia equivalent for it, but there will be map(f, rdd)
, so it makes sense to add flat_map(f, rdd)
as well. I think I need to review the list of RDD methods.
I've created (not fully tested yet) PR for this issue: #67
Here's the list of all methods that have got alternative signature. Comments are welcome.
map_partitions_with_index(f::Function, rdd::RDD) = map_partitions_with_index(rdd, f)
map_partitions(f::Function, rdd::RDD) = map_partitions(rdd, f)
map_partitions_pair(f::Function, rdd::RDD) = map_partitions_pair(rdd, f)
Base.map(f::Function, rdd::RDD) = map(rdd, f)
map_pair(f::Function, rdd::RDD) = map_pair(rdd, f)
flat_map(f::Function, rdd::RDD) = flat_map(rdd, f)
flat_map_pair(f::Function, rdd::RDD) = flat_map_pair(f, rdd)
Base.filter(f::Function, rdd::SingleRDD) = filter(rdd, f)
Base.filter(f::Function, rdd::PairRDD) = filter(rdd, f)
Base.reduce(f::Function, rdd::RDD) = reduce(rdd, f)
reduce_by_key(f::Function, rdd::RDD) = reduce_by_key(rdd, f)
Awesome work!
While writing tests for new methods I found a couple of bugs / inconsistencies. They don't affect the PR in any way, but I'll need to fix them before tagging the next version.
I've merged the PR for alternative method argument order and opened a separate issue for broken PairRDD methods (#69).
Closing this issue, feel free to re-open if you see something else is needed here.
Fantastic! Thanks for this