A one-step utility to setup single-node pseudo-distributed Spark/Hadoop cluster on OSX.
Package | Version | Release Note |
---|---|---|
Apache Spark | 1.4.1 | RN 1.4.1 |
Apache Hadoop | 2.4.0 | RN 2.4.1 |
This installer has only been tested on Yosemite 10.10. May work fine on El Capitan.
Please use Java 8. More and more Apache projects drop supporting for old Java version.
You have two choices. Either one works just fine.
Spark 1.4.0 comes with SparkR. In case you'd like to delve into it, install R.
In order for Spark/Hadoop to work in pseudo-distributed mode, you need to open SSH server just for yourself.
- Open "System Preference"
- Open "Sharing"
- Check "Remote Login" for "Administators" only.
-
Unzip & execute installer.
-
Once completed, open "Terminal" or "iTerm"
-
To start, type
start-service.sh
- Check out Spark Console if Spark comes up online.
- Same for Hadoop Namenode.
-
To stop, type
stop-service.sh
If you have a question, leave it to my blog post or tweet me.