Using spark_version='1.6.2' results in partial installation (?)
arokem opened this issue · 10 comments
Specifically, I get these messages during launch of the cluster, and these files are indeed not in place once the cluster starts up:
./spark-ec2/spark-standalone/setup.sh: line 22: /root/spark/bin/stop-all.sh: No such file or directory
./spark-ec2/spark-standalone/setup.sh: line 27: /root/spark/bin/start-master.sh: No such file or directory
Indeed, no spark web interface on port 8080 either
Is this from branch-2.0
? I think the problem is we didn't backport the change that added 1.6.1 and 1.6.2 to branch-2.0 as seen from [1]. Can you check if adding 1.6.2 there fixes the problem ?
[1]
Line 78 in 06f5d2b
This is from branch-1.6
Hmm that means SPARK_VERSION
isn't being correctly parsed somehow. Because as in [2] the default should be sbin
and not bin
[2]
spark-ec2/spark-standalone/setup.sh
Line 5 in 4b57900
As far as I can tell, there's neither a bin
nor a sbin
directory under
/root/spark
. The only thing under /root/spark
is:
/root/spark/conf/spark-env.sh
On Mon, Oct 3, 2016 at 9:13 PM, Shivaram Venkataraman <
notifications@github.com> wrote:
Hmm that means SPARK_VERSION isn't being correctly parsed somehow.
Because as in [2] the default should be sbin and not bin[2] https://github.com/amplab/spark-ec2/blob/
4b57900/spark-standalone/setup.sh#L5—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#57 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHPNoh-XHzaa-sDSe9b5615YDmnKWFyks5qwdJfgaJpZM4KNRnZ
.
That means that Spark wasn't downloaded properly - My guess is this has to do with the tar.gz files for hadoop1 not being found on S3. You could try --hadoop-version=yarn
as a workaround
Thanks! For now, I have resorted to set spark_version
to 1.6.0
, which seems to work, but I'll try that too. Feel free to close this issue, unless you want to keep track of this. And thanks again.
I ran into something similar recently.
It appears that for spark 1.6.2 only a subset of the binaries were uploaded to s3:
$ s3cmd ls s3://spark-related-packages/spark-1.6.2*
2016-06-27 23:47 241425242 s3://spark-related-packages/spark-1.6.2-bin-cdh4.tgz
2016-06-27 23:47 230444067 s3://spark-related-packages/spark-1.6.2-bin-hadoop1-scala2.11.tgz
2016-06-27 23:48 271799224 s3://spark-related-packages/spark-1.6.2-bin-hadoop2.3.tgz
2016-06-27 23:49 273797124 s3://spark-related-packages/spark-1.6.2-bin-hadoop2.4.tgz
2016-06-27 23:50 278057117 s3://spark-related-packages/spark-1.6.2-bin-hadoop2.6.tgz
2016-06-27 23:50 196142809 s3://spark-related-packages/spark-1.6.2-bin-without-hadoop.tgz
2016-06-27 23:51 12276956 s3://spark-related-packages/spark-1.6.2.tgz
While
$ s3cmd ls s3://spark-related-packages/spark-1.6.0*
2015-12-27 23:07 252549861 s3://spark-related-packages/spark-1.6.0-bin-cdh4.tgz
2015-12-27 23:15 241526957 s3://spark-related-packages/spark-1.6.0-bin-hadoop1-scala2.11.tgz
2015-12-27 23:23 243448482 s3://spark-related-packages/spark-1.6.0-bin-hadoop1.tgz
2015-12-27 23:31 282904569 s3://spark-related-packages/spark-1.6.0-bin-hadoop2.3.tgz
2015-12-27 23:41 244381359 s3://spark-related-packages/spark-1.6.0-bin-hadoop2.4-without-hive.tgz
2015-12-27 23:48 284903527 s3://spark-related-packages/spark-1.6.0-bin-hadoop2.4.tgz
2015-12-28 00:00 289160984 s3://spark-related-packages/spark-1.6.0-bin-hadoop2.6.tgz
2015-12-28 00:08 201549664 s3://spark-related-packages/spark-1.6.0-bin-without-hadoop.tgz
2015-12-28 00:16 12204380 s3://spark-related-packages/spark-1.6.0.tgz
I think the problem here is that the artifacts are missing from the release not just from s3. i.e. http://www-us.apache.org/dist/spark/spark-1.6.2/spark-1.6.2-bin-hadoop1.tgz gives me a 404