Spark ML Zeppelin

  1. Install Python 2.7.x (The one that comes with MacOS/Linux is probably fine).

  2. Install and initialize nltk:

  3. Install library using pip: pip install nltk

  4. Download nltk data

      import nltk
      nltk.download()
    

    (Details: http://www.nltk.org/data.html)

  5. Download and install Zeppelin 0.7.0:

  6. Download from https://zeppelin.apache.org/download.html

  7. Unzip the downloaded file.

  8. Run: (zeppelin home)/bin/zeppelin-daemon.sh start

  9. Clone this project.

  10. Configure Zeppelin to use this project's notebook directory:

  11. Open the notebook repos page: http://localhost:8080/#/notebookRepos

  12. Change the "Notebook Path" to .../sparkml-zeppelin/notebook

  13. Load the Zeppelin note: http://localhost:8080/#/notebook/2CBEJDES5