[BUG] spark.eventLog.enable and spark.eventLog.dir not working
ahululu opened this issue · 3 comments
ahululu commented
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the Feature request
template.
- ✋ I have searched the open/closed issues and my issue is not listed.
Reproduction Code [Required]
hadoopConf:
# EMRFS filesystem
fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3a.endpoint: s3.us-east-1.amazonaws.com
fs.s3.buffer.dir: /mnt/s3
fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
sparkConf:
# Required for EMR Runtime
spark.driver.extraClassPath: /usr/share/aws/aws-java-sdk-v2/*:/usr/lib/hudi/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
spark.driver.extraLibraryPath: /usr/share/aws/aws-java-sdk-v2/*:/usr/lib/hudi/*:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath: /usr/share/aws/aws-java-sdk-v2/*:/usr/lib/hudi/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
spark.executor.extraLibraryPath: /usr/share/aws/aws-java-sdk-v2/*:/usr/lib/hudi/*:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.hadoop.hive.metastore.client.factory.class: com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
# History logs
spark.eventLog.dir: s3://abc/def/
spark.eventLog.enable: "true"
Steps to reproduce the behavior:
Expected behavior
It will produce a log file at s3://abc/def/
Actual behavior
nothing
Environment & Versions
- Spark Operator App version: v1beta2-1.3.8-3.1.1-amzn-4
- Helm Chart Version: spark-operator-7.0.0
- Kubernetes Version: AWS EKS 1.29.0
- Apache Spark version: 3.5.0
peter-mcclonski commented
Two items of note:
1- Try using s3a://
rather than s3://
2- Try adding the following to your sparkConf: spark.eventLog.logBlockUpdates.enabled: "true"
ahululu commented
Two items of note: 1- Try using
s3a://
rather thans3://
2- Try adding the following to your sparkConf:spark.eventLog.logBlockUpdates.enabled: "true"
not working.