mvillarrealb/docker-spark-cluster

How to run python app?

hughesadam87 opened this issue · 0 comments

Thank you for this repo.

One thing that was unclear to me - once I've got the spark cluster up and running via docker-compose, if I have a pyspark script on my computer, can I simply just run it and connect to this spark cluster? Or does the python app have to live in the container?

Say I had this file hello-spark.py

from pyspark.sql import SparkSession


def main():
    # Initialize SparkSession
    spark = SparkSession.builder \
        .appName("HelloWorld")  \
        .getOrCreate()

    # Create an RDD containing numbers from 1 to 10
    numbers_rdd = spark.sparkContext.parallelize(range(1, 11))

    # Count the elements in the RDD
    count = numbers_rdd.count()

    print(f"Count of numbers from 1 to 10 is: {count}")

    # Stop the SparkSession
    spark.stop()


if __name__ == "__main__":
    main()