Easy to use environment with `jupyter notebook` and apache spark.

Instead of opening jupyter notebook, just copy the preapareSparkEnvironment.sh script into your prefered directory and inside it run:

    $ source preapareSparkEnvironment.sh

Run directly from github:

    $ source <(curl -s https://raw.githubusercontent.com/boyander/runpyspark/master/prepareSparkEnvironment.sh)

It prepare your shell with pyspark configured to use jupyter notebook. After env is ready run pyspark it will open a jupyter notebook.

IMPORTANT: Before running the preapareSparkEnvironment.sh script, ensure you have followed the checklist for your OS.

Checklist for Ubuntu

Install spark following this medium post
Create an alias of spark in your home directory or rename the installation to just "spark"

    $ ln -s ~/spark-2.4.0-bin-hadoop2.7 ~/spark

Ensure you have spark-shell in your $PATH variable (Note: this suposes you are running zsh or oh-my-zsh terminal, if that's not the case or you are not sure, just change .zshrc to .bashrc in the following command).

    $ echo "export PATH=\"\$PATH:$HOME/spark/bin\"" >> ~/.zshrc
    $ source ~/.zshrc

To check it works, you must be able to run spark-shell from your terminal.

You need brew installed, install the following packages:

This script uses python3, ensure python3 is installed and running in your terminal.
When creating a jupyter notebook, ensure you've choosed python 3 kernel, otherwise it will not work.
There's also a notebook PysparkDemo.ipynb to test apache spark worked.
In case you've created multiple spark contexts, run $ killall java to stop all apache spark instances.