Azure/durabletask

RetryOptions error-handler in Orchestration cannot capture right exception-type from Sub-Orchestration

Nabakamal opened this issue · 1 comments

My question is related to 436 and 807- this is explained below:

Problem:
Cannot get the right exception (via InnerException or FailureDetails) in the exception handler of RetryOptions (Handle)

Scenario:
I have an Orchestration that calls a sub-orchestration using RetryOptions, like:

RetryOptions retryOptions = new RetryOptions(TimeSpan.FromMilliseconds(2500),3){
Handle = e =>
{
               /*
                    This exception never captures MyCustomException, which was originally thrown, 
                    either in FailureDetails or the InnerException property
              */

		SubOrchestrationFailedException tfe = e as SubOrchestrationFailedException;

		if (tfe != null && tfe.InnerException != null)
		{
			e = tfe.InnerException;
		}

		MyCustomException ce = e as MyCustomException; 
		if (ce != null)
		{
			LatestException = ce;  //LatestException is a variable of type MyCustomException
			return true;
		}
		return false;
	}};
    // IPayload is a custom type that is supposed to be returned from my sub-orchestration(if it ran successfully) 
    return await context.CreateSubOrchestrationInstanceWithRetry<IPayload>(typeof(FetchRatesSubOrchestration), retryOptions, 1);

The sub-orchestration's RunTask() looks like:

public override async Task<IPayload> RunTask(OrchestrationContext context, object input)
{
	List<AssetDTO> rates = new List<AssetDTO>();
	Reference1 reference1 = await context.ScheduleTask<Reference1>(typeof(TaskActivity1));
	await context.ScheduleTask<bool>(typeof(TaskActivity2), reference1);
	rates = await context.ScheduleTask<List<AssetDTO>>(typeof(TaskActivity3), reference1);
	List<string> datasReceived = rates.Select(x => x.TickerName).ToList();
	List<string> validDataPoints = _dbContext.SourceKeys.Select(t => t.SourceKeyValue).ToList();
	List<string> missingDataPoints = datasReceived.Except(validDataPoints).ToList();
	if (missingDataPoints.Count() > 0)
	{
		_logger.LogError($"Requested data points {string.Join(",", missingDataPoints)} were not returned in the response. Retrying.");
		throw new MyCustomException($"Requested data points {string.Join(",", missingDataPoints)} were not returned in the response. Retrying.");

		//OrchestrationFailureException innerExceptionToThrow = new OrchestrationFailureException($"Requested data points {string.Join(",", missingDataPoints)} were not returned in the response. Retrying.");
		//throw innerExceptionToThrow;
		// var exc =  new DurableTask.Core.Exceptions.SubOrchestrationFailedException($"Orchestration failed - {GetType().Name}", innerExceptionToThrow);
		// exc.FailureDetails
		// throw exc;
	}
	return new RatesAvailablePayload(rates);
}

The error thrown within the if-block in the sub-orchestration never bubbles-up to the Handle error-handler of the defined RetryOptions (within the parent orchestration).

Additionally, what else have I tried?
a. Setting the ErrorPropagationMode to ErrorPropagationMode.UseFailureDetails (or ErrorPropagationMode.SerializeExceptions) doesn't help - the FailureDetails object of the exception and the InnerException property are always null.

b. I have tried debugging against the source of the framework, and, I believe to have seen the "Multiple ExecutionCompletedEvent found, potential corruption in state storage" message from the SerMarkerEvents() method in DurableTask.Core.OrchestrationRuntimeState class - but that was during one of the times I was debugging it.

c. While debugging the framework source, I noticed that the FailureDetails object is populated, for the most part, but due to the parallel-execution of the code it is a little difficult to debug as the control jumps from one class to the other, and then to the third - which makes following a trail difficult, at best

d. I have looked at most, if not all, of the tests in the DurableTask and the DurableTask-MSSQL repositories, but these haven't helped me in figuring out what I may still be missing

I'm using the following packages:

Microsoft.DurableTask.SqlServer - 1.1.1
Microsoft.Azure.DurableTask.Core - 2.10.0 (dependency brought in via Microsoft.DurableTask.SqlServer)

Please advise on how to get this resolved.

Thank you.

@cgillum @jviau @papigers Can you please advise, when time permits? Thank you.