This script will configure Jupyter Notebook on a local windows machine to work with a HDInsight Spark Cluster (v3.5+) by adding necessary PySpark and PySpark3 Kernels.
-
Install Anaconda
-
Open Anaconda Prompt
-
Install Jupyter Notebook - Reference:
conda install jupyter
-
Install SparkMagic - Reference:
pip install sparkmagic==0.11.2
-
Install ipywidgets - Reference:
jupyter nbextension enable --py --sys-prefix widgetsnbextension
-
Locate your sparkmagic directory with
pip show sparkmagic
and cd to that location, e.g:cd c:\users\<UserName>\appdata\local\continuum\anaconda3\lib\site-packages
-
Install the PySpark and PySpark3 Kernels:
jupyter-kernelspec install sparkmagic/kernels/pysparkkernel jupyter-kernelspec install sparkmagic/kernels/pyspark3kernel
The script will prompt you to enter your cluster name and credentials. It will use that information to generate the necessary config.json file for Jupyter.
- In the Anaconda Command Prompt, change directory to wherever you cloned the repo, e.g.:
cd C:\projects\jupyter_local_hdispark_config
- Run the script:
python config.py
After configuration, new kernels should appear in Jupyter Notebook: