- Download Apache Zeppelin version 0.7 with "all" bundled interpreters. Although the following link looks like a direct link to the package, it actually opens the Apache download page, where you can choose a mirror: http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.7.2/zeppelin-0.7.2-bin-all.tgz
- Install ... which should be as simple as unzipping the archive you just downloaded. This version of Zeppelin includes an embedded install of Apache Spark 2.1, which is tested to work with the class notebooks.
- Test it out!
cd
into the folder where you unpacked Zeppelin- If you run
ls
you should see thebin
folder - on MacOS/Linux, run
bin/zeppelin-daemon.sh start
- UPDATE FOR WINDOWS USERS:
- In theory, Zeppelin should run on Windows with minor changes (e.g.,
zeppelin.cmd
), but your mileage may vary - Zeppelin claims support on Win 7 SP 1, but I have not tried that
- I did try it on Windows 10 (more or less latest Microsoft build) and it did not work out of the box
- Of course, it's quite possible that with some monkeying around it can be made to work on Win 10
- HOWEVER my recommendation would be to install a Linux VM and run it there: free, easy, definitely works, and your "real" Spark work will almost certainly be on Linux anyway
- Easiest path: Grab free VMWare or VirtualBox, install ubuntu-16.04.2-desktop-amd64, add a JRE with
sudo apt install openjdk-8-jre-headless
and then just unzip Zeppelin and you're all set.- Make sure you give this VM a good chunk of resources -- I'd recommend 8 GB RAM and 2 CPU cores if you have them
- In theory, Zeppelin should run on Windows with minor changes (e.g.,
- UPDATE FOR WINDOWS USERS:
- Point your browser to
localhost:8080
and you should see Zeppelin running!- occasionally it seems to require an extra few seconds, refreshes, or even clearing browser cache, moreso than other webapps, not sure why, but don't sweat it
- Click "create new note" to start a new notebook
- Give it a name (like "Test"); the default interpreter should say "Spark" and you can leave that as is, and click "Create Note"
- In the first notebook cell, type in
1 + 1
and hit shift + enter- After a few seconds, you should see
res0: Int = 2
pop out
- After a few seconds, you should see
- Just to make sure Spark can actually run a job, in the next cell, enter
sc.parallelize(1 to 100).sum
and hit shift + enter- You should see
res1: Double = 5050.0
pop out
- You should see
- Congrats, your Zeppelin + embedded Spark 2.1 is installed!
- If you haven't yet cloned this github repo on your machine, go ahead and do that. If you really don't want to use git, you can just "Download as Zip" this whole repo and use that, and you'll be fine.
- Move or copy the "data" folder from the repo into your Zeppelin folder. It should be at the same "level" as
bin
,conf
,interpreter
, etc. Why? This way, if youcd
to the Zeppelin folder and start it withbin/zeppelin-daemon.sh start
, inside the notebook environment, your data will be underdata/
.- This is for simplicity; of course you are welcome to
%sh cd ...
from inside Zeppelin, or hack the paths to the datasets, or anything else you wish -- I'm just trying to make it as simple and consistent for everyone as possible. - You can check if everything is in the right place by punching in
%sh ls data
from inside a notebook cell, and you should see the list of data files, withamazon20k.parquet
,diamonds.csv
, etc.
- This is for simplicity; of course you are welcome to
- Make note (for yourself) of where you put the
notebooks
folder from this repo. You don't need to do anything with them yet, but in class, we will load notebooks into Zeppelin, so it will be handy to know where to find the files.- If you want, you can try it from the main Zeppelin page by clicking "Import Note" then "Choose a JSON here" and then using the file picker to find the notebook.
- When long notebooks like these import, they will look weird at first -- like a bunch of blank cells. Just wait about 30 seconds for Zeppelin to catch up and render them, and you should be all set.
- If you want, you can try it from the main Zeppelin page by clicking "Import Note" then "Choose a JSON here" and then using the file picker to find the notebook.