In order to run the example notebook as seen in the Hadoop Conference Dublin live demo, follow these instructions. These instructions have been tested with Spark 1.6.0 and 1.6.1.
I like to create a new virtualenv specifically for my notebooks, feel free to set this up however you like.
virtualenv mleap-notebooks # create the virtual env
cd mleap-notebooks
source bin/activate # activate the virtualenv
Next we need to install Jupyter.
pip install jupyter
Install the Toree kernel
The Toree kernel allows us to write Scala and talk with Spark from Jupyter.
pip install toree
jupyter toree install
You will find a folder called mleap-spark under kernels, copy this folder to your home ipython directory. After doing this step, you will have access to a Spark kernel that automatically loads all of the MLeap jars via Spark Packages.
cp -r kernels/mleap-spark ~/.ipython/kernels # linux
cp -r kernels/mleap-spark ~/Library/Jupyter/kernels # Mac OS X
Download the assembly kernel for the presentation using CURL and put it in your tmp directory.
curl https://s3-us-west-2.amazonaws.com/mleap-demo/presentation-kernel-assembly-1.0.jar \
-o /tmp/presentation-kernel-assembly-1.0.jar
You will have to modify the SPARK_HOME environment variable in the kernel.json file to point to your Spark installation. Also, if you downloaded the assembly jar to a folder other than /tmp
you will have to modify the SPARK_OPTS to reflect this difference.
# edit the kernel file you just copied
vi ~/.ipython/kernels/mleap-spark/kernel.json # linux
vi ~/Library/Jupyter/kernels/mleap-spark/kernel.json # Mac OS X
# change the SPARK_OPTS to point to the presentation-kernel assembly
# it should already be pointing to `/tmp` by default
# change the line
# "SPARK_HOME": "/Users/hollinwilkins/Lib/Spark",
# to point to your Spark installation
This is the sample dataset we got from Inside Airbnb.
curl https://s3-us-west-2.amazonaws.com/mleap-demo/airbnb.avro.zip -o /tmp/airbnb.avro.zip
unzip /tmp/airbnb.avro.zip -d /tmp
Now let's load up our notebook into Jupyter!
cd notebooks # from the root of mleap-demo
jupyter notebook # this will start a web ui
After the web UI starts, select the notebook you want to run and have fun :)
After you have finished running the notebook, you will have created two new directories with Bundle.ML content in them:
/tmp/transformer.lr.ml
- this is the linear regression/tmp/transformer.rf.ml
- this is the random forest
Start the servers like this:
# Run the linear regression on port 8080
sbt "server/run /tmp/transformer.lr.ml 8080"
# Run the random forest on port 8081
sbt "server/run /tmp/transformer.rf.ml 8081"
Now that our servers are started up, download the sample LeapFrame JSON.
curl https://s3-us-west-2.amazonaws.com/mleap-demo/frame.json -o /tmp/frame.json
Then send it into our servers to be transformed.
curl -v -XPOST -H "content-type: application/json" -d@/tmp/frame.json http://localhost:8080/transform
curl -v -XPOST -H "content-type: application/json" -d@/tmp/frame.json http://localhost:8081/transform
If you get an error trying to run the notebook about some dependency trying to download to the same file, you can try clearing you local ivy2 cache and see if that helps.
rm -rf ~/.ivy/cache
If you are following along with the blog on Driven by Code or just don't want to use Jupyter, then you can find the instructions for how to use this project in the file in this repository: blog/mleap.md.
The MIT License (MIT) Copyright (c) 2016 TrueCar
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.