Azure/durabletask

RetryOptions error-handler in Orchestration (Handle method) gets called multiple times on "retries"

Nabakamal opened this issue · 2 comments

Problem:
Retry error -handler gets called multiple times when "retrying"

Scenario:
I have an Orchestration that calls a sub-orchestration using RetryOptions, like:

DurableTask.Core.RetryOptions retryOptions = new DurableTask.Core.RetryOptions(
	TimeSpan.FromMilliseconds(2500), 15)
{
	Handle = e =>
	{
		_logger.LogWarning($"Retrying orchestration - {GetType().Name} - due to '{e.Message}'");
		return true;
	}
};
return await context.CreateSubOrchestrationInstanceWithRetry<IPayload>(typeof(ThisIsASubOrchestration), retryOptions, input);

However, when the sub-orchestration has failed, the code inside the "Handle" block needs to be executed once - yes - but it does so more than that. First time it executes two times, next time it executes 2 times for the first orchestration and 2 times for the second orchestration i.e. a total of 4 times, and so on. Sometimes this number seems to be random as well i.e. more logs are printed.

What have I tried so far?
a. Setting the exception object (e above) to null

DurableTask.Core.RetryOptions retryOptions = new DurableTask.Core.RetryOptions(
	TimeSpan.FromMilliseconds(2500), 15)
{
	Handle = e =>
	{
		_logger.LogWarning($"Retrying orchestration - {GetType().Name} - due to '{e.Message}'");
		e = null;
		return true;
	}
};
return await context.CreateSubOrchestrationInstanceWithRetry<IPayload>(typeof(ThisIsASubOrchestration), retryOptions, input);

b. Creating a new execution using context.ContinueAsNew(input)

DurableTask.Core.RetryOptions retryOptions = new DurableTask.Core.RetryOptions(
	TimeSpan.FromMilliseconds(2500), 15)
{
	Handle = e =>
	{
		_logger.LogWarning($"Retrying orchestration - {GetType().Name} - due to '{e.Message}'");
		context.ContinueAsNew(input);
		return true;
	}
};
return await context.CreateSubOrchestrationInstanceWithRetry<IPayload>(typeof(ThisIsASubOrchestration), retryOptions, input);

c. I (also)thought this could be a timing-issue so introduced a big-enough BackoffCoefficient, didn't help:

DurableTask.Core.RetryOptions retryOptions = new DurableTask.Core.RetryOptions(
	TimeSpan.FromMilliseconds(2500), 15)
{
	BackoffCoefficient = 5.0,
	Handle = e =>
	{
		_logger.LogWarning($"Retrying orchestration - {GetType().Name} - due to '{e.Message}'");		
		return true;
	}
};
return await context.CreateSubOrchestrationInstanceWithRetry<IPayload>(typeof(ThisIsASubOrchestration), retryOptions, input);

I'm using the following packages:

Microsoft.DurableTask.SqlServer - 1.1.1
Microsoft.Azure.DurableTask.Core - 2.10.0 (dependency brought in via Microsoft.DurableTask.SqlServer)

@cgillum @jviau @papigers Can you please advise, when time permits? Thank you.

I also have a another ticket opened for RetryOptions here.

jviau commented

@Nabakamal the handler being executed multiple times is expected and by design. This is all part of the record-replay model that durable uses. You need to use a replay-safe logger to avoid multiple log statements. See an example here: https://github.com/jviau/durabletask-hosting/blob/main/src/DurableTask.Extensions/src/Logging/OrchestrationLogger.cs

Essentially, only log if OrchestrationContext.IsReplaying == false

Thanks, @jviau - somehow I missed this update/notification. I'll take a look at your suggestion - appreciate it. Cheers!