dfdx/Spark.jl

Order of method arguments different from Base

Closed this issue · 9 comments

In Base, functions like filter, reduce and map take arguments

func(f::Function, xs)

The implementations of these methods in Spark.jl take arguments in the opposite order

func(rdd::RDD, f::Function)

Might it make sense to align the order of method arguments with Base to make it more Julian?

dfdx commented

Yeah, I thought about it, but this way RDD API will become inconsistent - currently all methods take rdd as its first parameter, e.g.:

map(rdd, f)

which roughly resembles method call

rdd.map(f)

Changing the order will break this consistency. Although, given that parameters are typed, we can introduce both versions - map(rdd, f) and map(f, rdd).

Ah okay. I think it would be a good idea to add func(f, rdd) for the times where users think more Juliay than Sparky.

dfdx commented

Deal! Give me a couple of days to sort out urgent tasks and I will add these methods.

Is there anything beyond reduce, map and filter that needs to be updated?

dfdx commented

Hmm, I'm now thinking what to do with things like flat_map(rdd, f) - on one hand, there's no Julia equivalent for it, but there will be map(f, rdd), so it makes sense to add flat_map(f, rdd) as well. I think I need to review the list of RDD methods.

dfdx commented

I've created (not fully tested yet) PR for this issue: #67

Here's the list of all methods that have got alternative signature. Comments are welcome.

map_partitions_with_index(f::Function, rdd::RDD) = map_partitions_with_index(rdd, f)
map_partitions(f::Function, rdd::RDD) = map_partitions(rdd, f)
map_partitions_pair(f::Function, rdd::RDD) = map_partitions_pair(rdd, f)
Base.map(f::Function, rdd::RDD) = map(rdd, f)
map_pair(f::Function, rdd::RDD) = map_pair(rdd, f)
flat_map(f::Function, rdd::RDD) = flat_map(rdd, f)
flat_map_pair(f::Function, rdd::RDD) = flat_map_pair(f, rdd)
Base.filter(f::Function, rdd::SingleRDD) = filter(rdd, f)
Base.filter(f::Function, rdd::PairRDD) = filter(rdd, f)
Base.reduce(f::Function, rdd::RDD) = reduce(rdd, f)
reduce_by_key(f::Function, rdd::RDD) = reduce_by_key(rdd, f)

Awesome work!

dfdx commented

While writing tests for new methods I found a couple of bugs / inconsistencies. They don't affect the PR in any way, but I'll need to fix them before tagging the next version.

dfdx commented

I've merged the PR for alternative method argument order and opened a separate issue for broken PairRDD methods (#69).

Closing this issue, feel free to re-open if you see something else is needed here.

Fantastic! Thanks for this