aws-samples/spark-on-aws-lambda

Include S3 based Shuffle storage plugin

gauravtanwar03 opened this issue · 3 comments

AWS lambda has a limit of 1024 open File descriptors that leads to task result loss failure if you are merging data in the target tables for building a idempotent ingestion pipelines.

Solution: Use S3 based shuffle storage plugin that can act as a replacement for disk in AWS lambda.

@gauravtanwar03 Good thinking. Do you have a prototype ready?