uber/RemoteShuffleService

spark 3.1/3.2?

Opened this issue · 10 comments

cpd85 commented

hi all, I saw there is a spark30 branch for spark 3.0.x supported in the readme. there also seems to be a spark31 branch but wondering is there any plans to support spark 3.2 or could it work out of the box with spark31 branch?

Yeah, agree it is confusing here. Spark 3.1 and 3.2 have slight difference in shuffle APIs, thus we need to change Remote Shuffle Service accordingly. I used to work on Remote Shuffle Service when I was in Uber. Now I left Uber, and do not have write access to this repo anymore.

What environment are you interested to run Remote Shuffle Service, e.g. YARN, or Kubernetes? If Kubernetes, I have some other repo to make Remote Shuffle Service compatible with Kubernetes for Spark 3.1 and 3.2.

cpd85 commented

@hiboyang thanks for the response -- I really appreciate it! I think for now, would love to be able to run on YARN. Kubernetes I would love to explore as well. If you point me towards some repo/changes you made for compatibility, maybe I could extend it to run on YARN as well?

I see. In that case, you could change <spark.version>2.4.3</spark.version> in pom.xml to Spark 3 version. You will get some compile error, and you could start from there.

I tried to get some time to provide example, but really busy these days :(

@hiboyang I am looking to deploy remote shuffle service in my kubernetes cluster, preferably for spark 3.1.1. What's your recommendation?

Hi!

Support for spark 3.2 is very interesting
is also required there java 11
I tried to change some parameters for spark 3.2, for example,

<java.version>11</java.version>
<hadoop.version>3.2.2</hadoop.version>
<spark.version>3.2.0</spark.version>
<scala.version>2.12.15</scala.version>

but I get an error

[ERROR] /home/alatau/ssk/3.2/src/main/scala/org/apache/spark/shuffle/rss/RssStressTool.scala:144: not enough arguments for method registerShuffle: (shuffleId: Int, numMaps: Int, numReduces: Int)Unit.
Unspecified value parameter numReduces.
[ERROR]     mapOutputTrackerMaster.registerShuffle(appShuffleId.getShuffleId, numMaps)
[ERROR]                                           ^
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

cpd85 commented

@avs-alatau as @hiboyang mentioned, there's a difference in APIs, so its not enough to just change the spark.version -- you'll need to implement the new APIs as well. Bo's done the work here but its only running on k8s at the moment : https://github.com/hiboyang/RemoteShuffleService/tree/k8s-spark-3.2

@cpd85 thanks for the link to k8s but at the moment it is possible to configure only for yarn

cpd85 commented

@avs-alatau could you help me understand what you're asking for? The code doesn't exist or isn't open source for yarn. At the moment I'm working on fighting through these compilation issues to see if I can get a 3.2 client to communicate with a 2.4 server. I'll be happy to share the code if I end up getting it working

@cpd85
Thanks for the help. I have a hadoop cluster with spark 3.2
Now spark jobs are working through YARN and there are some problems with this because of which I am looking for an external Shuffle Service
I managed to set up spark jobs on a test cluster for the spark 3.0 version, but due to the fact that spark 3.2 is installed in the industrial cluster, I am looking for an external Shuffle Service that will provide this opportunity
If you manage to build an RSS version for spark 3.2, I will be grateful

cpd85 commented

@avs-alatau haven't done too much testing but I got this to work with a spark3.2 page rank example app

https://github.com/cpd85/RemoteShuffleService/tree/spark32