Data inconsistency caused by `updateEventHandler` in `RedisEventHandlerDAO.java`
wy-hua opened this issue · 0 comments
Describe the bug
One of my critical business flow in Conductor <SQS_QUEUE_NAME> events --> event handlers --> workflow executions
is in paralysis because of NullPointerException
ocurring in the Conductor server. This kind of exception is thrown when handling an SQS message on the queue:<SQS_QUEUE_NAME>
queue. After some invetigation and experiments, I believe this issue is related to the absence of an event handler in the Redis, even though its primary key appears in the index of event handlers by event. This data inconsistency leads to NullPointerException
mentioned above.
Full logs in Conductor server can be found here
3442759355 [RxComputationScheduler-2] ERROR com.netflix.conductor.core.events.DefaultEventProcessor [] - Error handling message: 4f0491de-eb2a-43b0-b27b-ef1225acc5ba on queue:<SQS_QUEUE_NAME>
java.lang.NullPointerException: null
at com.netflix.conductor.redis.dao.RedisEventHandlerDAO.getEventHandlersForEvent(RedisEventHandlerDAO.java:124) ~[conductor-redis-persistence-3.14.0.jar!/:?]
at com.netflix.conductor.service.MetadataServiceImpl.getEventHandlersForEvent(MetadataServiceImpl.java:218) ~[conductor-core-3.14.0.jar!/:?]
at com.netflix.conductor.service.MetadataServiceImpl$$FastClassBySpringCGLIB$$726d989d.invoke(<generated>) ~
RedisEventHandlerDAO.java:124
@Override
public List<EventHandler> getEventHandlersForEvent(String event, boolean activeOnly) {
String key = nsKey(EVENT_HANDLERS_BY_EVENT, event);
Set<String> names = jedisProxy.smembers(key);
List<EventHandler> handlers = new LinkedList<>();
for (String name : names) {
try {
EventHandler eventHandler = getEventHandler(name);
recordRedisDaoEventRequests("getEventHandler", event);
// —————————Line 124————————
if (eventHandler.getEvent().equals(event)
// —————————Line 124————————
&& (!activeOnly || eventHandler.isActive())) {
handlers.add(eventHandler);
}
} catch (NotFoundException nfe) {
LOGGER.info("No matching event handler found for event: {}", event);
throw nfe;
}
}
return handlers;
}
Details
- Conductor version: 3.14.0 (This issue still exists in the latest version of main branch 8981878 because
RedisEventHandlerDAO.java
remains unchanged since 3.14.0) - Persistence implementation: Redis
- Queue implementation: SQS
- Lock: Redis
- Workflow definition: Not specified
- Task definition: Not specified
- Event handler definition:
{
"name": "xxx",
"event": "sqs:<SQS_QUEUE_NAME>",
"condition": "true",
"actions": [
{
"action": "start_workflow",
"start_workflow": {
"name": "xxx",
"version": 1
},
"expandInlineJSON": false
}
],
"active": true,
"evaluatorType": "javascript"
}
To Reproduce
Steps to reproduce the behavior:
- Create an event handler listening on
"sqs:<SQS_QUEUE_NAME>"
. - Update this event handler to listen on
"sqs:<ANOTHER_QUEUE_NAME>"
. - Delete this event handler.
- Send a message to the
<SQS_QUEUE_NAME>
SQS queue. - Conductor server throws a
NullPointerException
with logs as in code block above.
Code that leads to this bug
@Override
public void updateEventHandler(EventHandler eventHandler) {
Preconditions.checkNotNull(eventHandler.getName(), "Missing Name");
EventHandler existing = getEventHandler(eventHandler.getName());
if (existing == null) {
throw new NotFoundException(
"EventHandler with name %s not found!", eventHandler.getName());
}
// if eventHandler would be updated with an event different from its existing settings
// in this case, old index of event handlers by event should be delete
// but it won't get deleted based on current logic
index(eventHandler);
jedisProxy.hset(nsKey(EVENT_HANDLERS), eventHandler.getName(), toJson(eventHandler));
recordRedisDaoRequests("updateEventHandler");
}
Expected behavior
Conductor should handle the SQS message without any NullPointerException
and maintain consistency in the event handlers.
Screenshots
None
Additional context
- The strongest evidence includes inspecting Redis data, revealing that the event handler
<EVENT_HANDLER_NAME>
is missing but its index still exists, causing the inconsistency.