alexandre-normand/slackscot

Is Single Threaded Behavior For Commands Normal?

pmcintyresfdc opened this issue · 8 comments

Describe the bug

When our bot has a command with a long running Answerer function it delays handling of other messages until the long running Answerer has completed.

Version

v1.37.0

Reference code

n/a

To Reproduce

Steps to reproduce the behavior:

  1. Create command XYZ that runs for at least 10 seconds
  2. Run xyz via slack in channel a
  3. Run xyz via slack in channel b
  4. Results for channel a post
  5. 30 seconds later the results for channel b post

Expected behavior

Commands Answerer functions run in parallel when multiple messages/commands are queued up.

Anything you'd like to add

Based on the code in slackscot.Run() I'm thinking this is expected. It may be easily solved by running more go s.runInternal(....) workers. Although a through review for any maps and other structures that aren't thread safe would be advisable.

If we can figure out a good way to do this change I may have the bandwidth to make the change.

Right. That's indeed the expected behavior at the moment. Mostly because all plugins so far have been fast reactions to triggering messages. I think your use-case is interesting though.

As for the implementation, you're mostly correct but the function you'd want to dispatch in a go routine would be the one processing a single event. And because the bots are actually triggered only on the *slack.MessageEvent, I think that would be the only one that would need to be changed to be asynchronous. Roughly, this could be

case *slack.MessageEvent:
    go s.processMessageEvent(deps.chatDriver, e)

But there are a few more things to think about:

  • slackscot manages a cache of triggering messages to answers to update reactions when appropriate. That means there might be edge-cases like a message not yet having been registered in the triggeringMsgToResponse cache which could cause a message update event from triggering a new message in reaction as opposed to a message update. This might be something that could be considered an acceptable limitation but I'd like to think of a way to preserve the correct behavior, if possible. Maybe we can queue up processing of messages based on the original message id?
  • The handling of *slack.DisconnectedEvent (which is useful for testing) would have to change to make sure any pending async processing of MessageEvents is complete before returning. That one is annoying but probably easier to address.

Those are just quick thoughts. I think this would be a worthwhile addition but it might get complicated as we get into the details. Let me know if you have more ideas. I might do a deeper investigation of what would be required to enable async processing of messages in the coming days but I can't make any promises at the moment.

Think we could even run it as late as in tryPluginActions around answer := action.Answer(m)? My thinking is that we'd still need to have a separate thread for s.processMessageEvent(deps.chatDriver, e) like you defined.

I've not dug into it too much more yet. I'll try to this week. Maybe try a few experiments too.

I'd like to thank you again for raising this use case. This is going to be a meaningful improvement which will be able to support plugins doing IO or slower operations.

Launching action.Answer in a go routine probably won't be sufficient to preserve the reaction updating functionality but I think there might be ways to build this while avoiding compromising functionality too much.

Right now, I'm thinking that we might want a number of work queues that we'd send messages to react to and which would send back reactions. The important part would be to ensure that messages and any associated update/delete messages would be consistently directed to the same work queue/partition (via some hashing function). There would still be a chance that two unrelated messages could end up on the same hash and be delayed as we are now but, if the number of partitions is high enough, that might not be such a common problem in practice.

The spot to do this might be in processMessageEvent. We'd probably move the incomingMessageID := SlackMessageID{channelID: m.Channel, timestamp: m.Timestamp} there and hash the messageID to send it to its proper work queue.

@pmcintyresfdc: Just to be sure I'm not stepping on your toes here. Do you mind if I start playing around with implementing this feature?

Go for it! I've been focused on the command prefix stuff anyway.

Thanks. I'm making good progress with a general outline and I'm happy so far but it's going to be at least a few more days before I get a PR out.

This is resolved in v1.38.0.

@pmcintyresfdc: I added layering of the slackscot defaults to any user-provided viper instance in v1.39.0.