Azure/azure-functions-durable-extension

Activity function keeps executing even after orchestrator has been terminated

Opened this issue · 2 comments

I am using Durable Functions 2.0 and am having a strange issue where an activity function (CreateMissingPriceRecordsActivity), which is only called once from an orchestrate function (CreateMissingPriceRecordsOrchestrate), keeps running in multiple instances, usually 6-7 minutes apart, as shown by transaction search:

Image

The highlighted orchestration ID (4068cb4c7b0748d5a92c237d7c1c3068) I had already terminated more than an hour before the timestamps of these transactions. I used the terminate POST uri to do this, and indeed, the response status code shows as 410 Gone:

Image

The below message occurred 2 minutes after I terminated the orchestration function, which shows that something is awaiting it again - no idea what, since it is only awaited by my HTTP function, which only ran once (based on log messages):

ITMaint-Redacted-Integration-CosmosDb-Create-Missing-Price-Records-Orchestrate$4068cb4c7b0748d5a92c237d7c1c3068: Function 'ITMaint-Redacted-Integration-CosmosDb-Create-Missing-Price-Records-Orchestrate (Orchestrator)' awaited. IsReplay: False. State: Awaited. HubName: usos1fap01Redactedintegrationtest. AppName: usos1fap01-Redacted-integration-test. SlotName: Production. ExtensionVersion: 2.9.5. SequenceNumber: 6.

This issue has been happening for about a week or more. What can I do to troubleshoot? There are about 3-4 terminated orchestrators that even today keep printing Function 'activity name' started. IsReplay: False. For what it's worth, I'm also seeing activity functions start multiple times for orchestrators that are not terminated; although, this could be due to the pre-existing orchestrators that were terminated but somehow still running. No recent exceptions; there are some before a large refactor was done to the orchestrators and activites that haven't resurfaced.

Here is a gist showing my HTTP function, orchestrator and activity functions: https://gist.github.com/aldrichdev/5f48138b1ed8c3c3864569fea674d420

There are some exceptions since I wrote this post. They are all OutOfMemoryExceptions, like this:

Exception while executing function: ITMaint-Redacted-Integration-CosmosDb-Fetch-Products-Activity
System.OutOfMemoryException at Redacted.Integration.ITMaint.Redacted+<FetchProductsActivity>d__15.MoveNext

However, these errors started 10 hours after the fetch activity started running. It seems to be a product of the environment (activities running multiple times when they shouldn't) and not an issue with this function. If I use the same Cosmos DB data locally, put breakpoints before and after this function runs and take snapshots of memory usage, the usage from the first breakpoint to the second is only 0.22 GB:

Image

This feature is not implemented yet: #506