amplab/spark-ec2

support for spark 2.2.0?

kmu-leeky opened this issue · 8 comments

It looks like spark 2.2.0 is officially released. Is it going to be supported in spark-ec2 shortly?

We can support it. Would you like to open a PR ?

I tried locally, but it does not seem as simple as I first thought - just adding 2.2.0 to "VALID_SPARK_VERSIONS" does not really work. Few things to consider. The base image contains Hadoop 2.4, while the Spark binary files are provided from Hadoop 2.6 (spark-2.2.0-bin-hadoop2.6.tgz). The base image also contains Java 1.7, and I read few documents saying that either the recent Hadoop or Spark needs Java 1.8.

I see. Those do require more changes including changes to the AMI and Hadoop scripts. Unfortunately I dont have time right now to try out the changes right now.

that's ok. I tweaked the code locally to run 2.2.0 in my repo. I will create a PR if the modification and images can be generalized.

Hey guys, could you please clarify if there are any updates\progress on this issue? @kmu-leeky were you able to tweak your local code to make it PRable?

For those still waiting for spark-ec2 to support Spark 2.2, I recommend taking a look at my project,
Flintrock. It's basically a faster spark-ec2 with a better user experience.

If anyone does submit a PR adding Spark 2.2 support to spark-ec2, ping me and I'll take a look. Unfortunately, updating the spark-ec2 AMIs to fully support new Spark versions (e.g. adding Java 8) is non-trivial. On Flintrock, you don't need to wait for new commits, AMIs, or branches to be created. You just set an option to pick your version of Spark. Most of the time with Flintrock you can use a new Spark version the day it comes out without any issue.

+1 to what @nchammas said. We unfortunately do not have bandwidth to create new AMIs / update spark-ec2 to match the Spark releases.

I tried changing the source to use the hadoop 2.7 which the default yarn is used.
so once I change it it starts referring to

http://s3.amazonaws.com/spark-related-packages/spark-2.2.0-bin-hadoop2.4.tgz

I tried changing the init.sh in spark folder but for some reason thats not going through. Let me know where I should make the changes and I will add to the source since we need to use this.