aws-samples/spark-on-aws-lambda

Spark configuration for local deployment

JohnChe88 opened this issue · 2 comments

Identify key configuration for Spark running local on a container. Adjust the JVM spin up cost ,maximize the memory capacity in AWS Lambda and reduce the container size.

  • Best architecture to deploy a set of spark configuration to Script
  • Decision at conf file or conf setting in the pyspark script. Ensure that the spark configuration is not overwritten
  • JVM spin up, Memory/storage fraction, Serializer
  • Caching and checkpointing to reduce memory foot print. Ensure that it spillover to disk instead of memory.
  • Dynamically assign the CPU and memory based on the machine that AWS Lambda picks up.