How to run python app?
hughesadam87 opened this issue · 0 comments
hughesadam87 commented
Thank you for this repo.
One thing that was unclear to me - once I've got the spark cluster up and running via docker-compose, if I have a pyspark script on my computer, can I simply just run it and connect to this spark cluster? Or does the python app have to live in the container?
Say I had this file hello-spark.py
from pyspark.sql import SparkSession
def main():
# Initialize SparkSession
spark = SparkSession.builder \
.appName("HelloWorld") \
.getOrCreate()
# Create an RDD containing numbers from 1 to 10
numbers_rdd = spark.sparkContext.parallelize(range(1, 11))
# Count the elements in the RDD
count = numbers_rdd.count()
print(f"Count of numbers from 1 to 10 is: {count}")
# Stop the SparkSession
spark.stop()
if __name__ == "__main__":
main()