big-data-europe/docker-hadoop-spark-workbench

Connect spark-notebook to spark cluster

Closed this issue · 3 comments

Hi,

I'm trying to connect spark notebook to the spark cluster. By default it runs the notebooks in a local spark (the notebook jobs never appear in the spark master page) and when I try to connect it to the cluster created by the docker compose file the kernel dies.

Following spark-notebook's documentation on this, I'm adding the following to the notebook's metadata:

  "customSparkConf": {
    "spark.app.name": "Notebook",
    "spark.master": "spark://spark-master:7077",
    "spark.executor.memory": "1G"
  },

Is there anything else I need to do/add?

Hi @Miguel-Alonso!

I've looked into the issue. The problem was the mismatch between Java versions in spark-notebook and spark. I migrated all the images to Java 8. You can find new docker-compose file in the root of the repo:
https://github.com/big-data-europe/docker-hadoop-spark-workbench/blob/master/docker-compose-java8.yml

If it does not fix the issue for you, feel free to reopen. -)

Hi @earthquakesan, thanks for that!

One last (small) detail: The docker compose file is looking for a missing hadoop-hive.env file. I'm just using the normal hadoop.env and, apart from an error in postgreSql, everything else seems to be running ok.

Thanks!

Hi @Miguel-Alonso!

Oops, forgot to push. It's there now. -)