Azure/azure-functions-eventhubs-extension

EventHub Trigger advances stream position when function times out

Closed this issue · 4 comments

When using the EventHubTrigger with batched events, we are observing that the position on the stream is advanced when the Function times out. This is in the consumption plan.

Per the doc here: https://docs.microsoft.com/en-us/azure/azure-functions/functions-reliable-event-processing

If conditions prevent the function execution from completing, the host fails to progress the pointer. If the pointer isn't advanced, then later checks end up processing the same messages.

I would expect that the function timing out is a condition that prevented function execution from completing.

Repro steps

Provide the steps required to reproduce the problem

  1. Create a Function with an EventHubTrigger, and a code loop that will cause a timeout.

  2. Add some items to the EventHub for the function to process.

Expected behavior

Upon timeout, the next function invocation should pull from the same position as the previous invocation.

Actual behavior

The position in the stream has been advanced, and only new events will be processed.

Related information

  • Microsoft.Azure.WebJobs.EventHubs, version 3.0.3.0

@tolzon its intentional that it works this way by default. If we retried the batch on a timeout, and the timeout reproduced reliably (perhaps because the data in that batch requires a computation that cannot be finished within the timeout), then we would retry the batch indefinitely, potentially creating a very large bill and also preventing any future messages on that partition from being processed. For more information on this, see here.

However we have a new retry policies feature that is intended to allow you to control how situations like this are handled. Its documented here:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-error-pages?tabs=csharp#retry-policies-preview

I'll admit, I'm not sure if we explicitly tested the timeout scenario when we built the retry policies feature. So if it does not work how you'd expect, please let us know (just comment here and @ me). Since the retry policy feature is in preview, this is a good opportunity for you to provide feedback.

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

Is there any behaviour change on the Event Hub Trigger timeouts handling?
@paulbatum, isn't your description on why timeout advances the pointer actually in contradiction with the documentation here. The documentation states that Event Hub trigger guarantees an at-least-once delivery for event hub messages.
The mentioned retry policy is relatively new and is still in preview according to the linked documentation.
This still doesn't seem to solve the at-least-once delivery.
The indefinite reprocess on a timeout should be a responsibility of the developer. The timeout could very well be a transient issue and this would mean that at-least-once delivery is not guaranteed even if a retry policy is defined.
And one more thing, you expressed some doubts "I'm not sure if we explicitly tested the timeout scenario when we built the retry policies feature". Is there any news on this side?

@vladislav-mitev I am not sure how it contradicts? First, the event is delivered to the application - then, something happens - whether its a successful completion of app logic execution, or an error is thrown or a timeout occurs. Regardless of the outcome, message delivery has occurred.

Regarding how retry policies interact with timeouts - I was able to confirm that the two work together as expected - with an infinite retry policy, a function that times out does not update the checkpoint.