/orchest-hello-spark

This repo shows how to run (Py)Spark in Orchest (locally)

Primary LanguageJupyter Notebook

Orchest: Hello Spark

Open in Orchest

This repo shows how to run (Py)Spark in Orchest (locally).

For details on how Spark is installed check out setup_script.sh. The actual Spark code is a minimal example of how to count words in a Python LICENSE text file. Checkout the notebook with code.

To connect to a cluster instead use a different PySpark context initializer:

conf = pyspark.SparkConf()
conf.setMaster('spark://head_node:7077')
conf.set('spark.authenticate', True)
conf.set('spark.authenticate.secret', 'secret-key')
sc = pyspark.SparkContext(conf=conf)

Pipeline

PySpark pipeline