Spark streaming AvailableNow trigger terminates after first batch

Question

Spark streaming AvailableNow trigger terminates after first batch

seb-emmot opened this issue 2 years ago · 1 comments

I am trying to build a spark streaming application to ingest data from Azure Event Hubs and persist to a delta table in databricks.
I'm using the AvailableNow trigger in spark streaming.
This trigger should process all data from the source in batches according to https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers

Bug Report:

Actual behavior
The stream start and processes first batch, then it terminates.
Expected behavior
The stream start and processes all available data, in microbatches, then terminates
Spark version
3.3.0
spark-eventhubs artifactId and version
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22

It seems like the support for the 'AvailableNow' trigger might not be implemented?

My code:

val connectionString = ConnectionStringBuilder(namespace_str)
  .setEventHubName("myhubname")
  .build

val ehConf = EventHubsConf(connectionString)
  .setConsumerGroup("myconsumergroup")
  .setMaxEventsPerTrigger(1000)

val inStream = spark.readStream.format("eventhubs").options(ehConf.toMap).load()

val outStream = inStream.writeStream
  .outputMode("append")
  .format("delta")
  .option("checkpointLocation", checkpointLocation)
  .trigger(Trigger.AvailableNow).toTable("mytablename")

I have previously asked a question related to this on Stack Overflow (in Pyspark though)
https://stackoverflow.com/questions/74025485/is-spark-streaming-availablenow-trigger-compatible-with-azure-event-hub

Answer 1 · 2023-07-27T07:13:54.000Z

Hi, I am facing the same issue. Is there any fix on this @yamin-msft @hmlam? If yes, by when will this feature be available?