dfdx/Spark.jl

Do we need an embedded JVM on the worker nodes

Closed this issue · 3 comments

aviks commented

Scala starts Julia as an external process, and connects with it over a socket. Does the Julia process need to start an embedded JVM on the workers?

cc: @dfdx

dfdx commented

As far as I can see, there's no need in additional JVM: Spark starts Julia to execute a task (just like it starts Python in PySpark), Julia process executes this task and exits. Is there any reason to start an additional JVM?

Note, that this is different from a driver process where we use Julia to instantiate JVM and control program flow.

aviks commented

Ok, that is what I thought. Currently using Spark starts the embedded JVM, which means it also starts on the worker, since julia on the worker is started with julia -e 'using Spark'.

I'll then do a PR to change this. This will mean that on the driver, we will need to do an explicit init() call.

dfdx commented

Ah, I missed this detail. Thanks!