Azure/aztk

Master and Worker Node on different Python Versions

manthanthakker opened this issue · 1 comments

Docker Image: aztk/spark:v0.1.0-spark2.2.0-miniconda-base

(Tried with different docker images too, the issue still persists)

My cluster has 2 dedicated nodes and 3 low priority nodes. I have enabled jupyterlab and jupyter plugin as mentioned in the documentation:

plugins:
  - name: jupyterlab
  - name: jupyter

When I try to execute the sample Jupyter Calculate PI notebook, I get the following error:

Exception: Python in worker has different version 3.6 then that in driver 3.7, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I tried to see the python version on each node


aztk spark cluster run --id myclustername "python --version" 

and realized the master node to be on Python 3.7.0 vs worker nodes on Python 3.6.4 :: Anaconda, Inc.

This looks like a bug in installing a plugin. How can I fix this?

This issue is due to Jupyterlab plugin,

image
When a customized docker image is deployed, the Jupyterlab installation script somehow upgrades the python version only
on the head node. Removing jupyterlab from the plugin list resolves this issue.