Spark streaming AvailableNow trigger terminates after first batch
seb-emmot opened this issue · 1 comments
I am trying to build a spark streaming application to ingest data from Azure Event Hubs and persist to a delta table in databricks.
I'm using the AvailableNow trigger in spark streaming.
This trigger should process all data from the source in batches according to https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
Bug Report:
- Actual behavior
The stream start and processes first batch, then it terminates. - Expected behavior
The stream start and processes all available data, in microbatches, then terminates - Spark version
3.3.0 - spark-eventhubs artifactId and version
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22
It seems like the support for the 'AvailableNow' trigger might not be implemented?
My code:
val connectionString = ConnectionStringBuilder(namespace_str)
.setEventHubName("myhubname")
.build
val ehConf = EventHubsConf(connectionString)
.setConsumerGroup("myconsumergroup")
.setMaxEventsPerTrigger(1000)
val inStream = spark.readStream.format("eventhubs").options(ehConf.toMap).load()
val outStream = inStream.writeStream
.outputMode("append")
.format("delta")
.option("checkpointLocation", checkpointLocation)
.trigger(Trigger.AvailableNow).toTable("mytablename")
I have previously asked a question related to this on Stack Overflow (in Pyspark though)
https://stackoverflow.com/questions/74025485/is-spark-streaming-availablenow-trigger-compatible-with-azure-event-hub
Hi, I am facing the same issue. Is there any fix on this @yamin-msft @hmlam? If yes, by when will this feature be available?