Particular/ServiceControl

Retry message fails from ServicePulse after upgrade to ServiceControl 5.2.0

saschanm opened this issue · 10 comments

Describe the bug

Description

Retry fails from Service Pulse after upgrading ServiceControl from v5.1.2 to v5.2.0

Expected behavior

Retry of message from ServicePulse should return the messages to the failed queue for processing

Actual behavior

Results in Failed to execute recoverability policy for message with native ID: ...

Versions

Please list the version of the relevant packages or applications in which the bug exists.

SC: Version 5.2.0
SP: Version 1.38.3
NServiceBus: Version 7.7

Steps to reproduce

  • Find a failed message in service pulse.
  • Click retry.
  • The message will end up back in the error queue shortly after without hitting failed queue for reprocessing and a custom check failure will show "Failed to execute recoverability policy for message with native ID: ..."
  • Some time after, a custom check failure will show: "One or more error messages have failed to import properly into ServiceControl and have been stored in the ServiceControl database. The import of these messages could have failed for a number of reasons and ServiceControl is not able to automatically reimport them. For guidance on how to resolve this see https://docs.particular.net/servicecontrol/import-failed-messages"
  • Following the procedure from the link above results in error message when attempting to import, referring to missing NServiceBus.FailedQ header entry for the message.

Note - the original failed message has the NServiceBus.FailedQ header entry as expected - while the retried message ending up in the error queue does not.

Relevant log output

No response

Additional Information

Workarounds

No workaround found

Possible solutions

If no resolution found I will attempt to downgrade back to SC v5.1.2 - but would prefer not to given potential risks

Additional information

A further issue appears to have occured with the upgrade to 5.2.0

We have an endpoint that handled custom check failure messages and posts a notification to a Teams channel. These have also stopped working though where in the pipeline they are failing is unclear.

I have gone through downgrade process in our test environment that was on the same versions and showing the same issues.

I have confirmed that downgrade resolves both issues - but the downgrade process resulted in loss of failed messages in the ravenDB instance - so that is not a viable option in our production environment.

Additional info:
The instance/endpoints experiencing the problems are using MSMQ transport.

We have another instance for a another project that is using Azure Service Bus. I have just confirmed that retries from that system are processing correctly.

Thanks for the detailed bugreport @saschanm , we are looking into it

@saschanm would you be able to send the headers and the body of one of the failing messages to us? (support@particular.net)

@andreasohlund
Email with attachments sent to support@particular.net with subject line:
Retry message fails from ServicePulse after upgrade to ServiceControl 5.2.0 (issue 4180)

I've tried to reproduce this by:

  1. Installing SC 5.2
  2. Configured it to use MSMQ
  3. Ran https://docs.particular.net/samples/msmq/simple/ and simulated a failure
  4. Verified that it got picked up by SC
  5. Retried it via the ServicePulse UI
  6. Verified that it got retried correctly by the endpoint

@saschanm does the above cover your scenario? (If yes it looks like it might be some specific details of the failing messages on your end that are causing this issue)

Email with attachments sent to support@particular.net with subject line:

Thanks, we will take a deeper look

Also note an example of error in ServiceControl logs for attempting to ingest the error queue message after retry attempt - though it may be the problem is upstream, this is the end result.

2024-05-19 00:02:48.4450|52|Warn|ServiceControl.Operations.ErrorProcessor|Processing of message '25e9e93e-97b5-434f-a512-fc7b393b5bea\215537358' failed. System.Exception: Missing 'NServiceBus.FailedQ' header. Message is poison message or incorrectly send to (error) queue. at ServiceControl.Operations.FailedMessageFactory.ParseFailureDetails(IReadOnlyDictionary``2 headers) in /_/src/ServiceControl/Operations/FailedMessageFactory.cs:line 41 at ServiceControl.Operations.ErrorProcessor.ProcessMessage(MessageContext context, IIngestionUnitOfWork unitOfWork) in /_/src/ServiceControl/Operations/ErrorProcessor.cs:line 114

@saschanm and other following this, we are discussing the cause and a potentially fix here Particular/NServiceBus.Transport.Msmq#710

@saschanm v5.2.1 has now been released with a fix for this

https://particular.net/start-servicecontrol-download

thanks again for your extensive help with the investigation ❤️