Message queues
Opened this issue · 4 comments
Tracking issue for issues for implementing message queue
- Client
- Mainframe
- Cronjob/uploader thing
Minimum project spec for rabbitMQ integration:
- The Loader restarting should never cause duplicate messages to propagate down to clients.
- The RabbitMQ instance restarting should never cause duplicate messages to propagate down to clients.
- A client failing to ack a job by either being terminated, losing connection, or entering an unhandled fail state should cause the package to be requeued.
- The RabbitMQ instance must support and implement authentication to modify the queue in any respect.
- Messages should NEVER by default enter a state where they are being repeatedly queued and retried. The most robust example of this is file size failures, whereby a client should never receive a job that has already been rejected for a file size too large state if it is not explicitly flagged to handle those files.
Rationale: The premise is to keep work off client nodes as much as possible. Sending duplicate information to the client and relying on the server side deduplication represents an enormous amount of wasted compute. This deduplication must occur prior to a client ever interfacing with a job, and an effort should be made to address a robust number of edge cases.
Furthermore:
- Modifications to the current codebase will not be accepted without tests. Our system works now, and switching over to the MQ setup represents a massive number of modifications to all components of our scanning framework. This will be tested extensively before acceptance.
Some thoughts on authentication -
We can use RabbitMQ's built in Authentication, Authorization, and Access Control feature to provision each client with a username/password combination.
All clients should be restricted to basic.publish
on the results queue, and basic.consume
on the incoming jobs queue.
Mainframe should also have provisioned credentials with only basic.consume
on the results queue
Loader should have basic.publish
on the incoming jobs queue
On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API?
I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API
On further thought - is there a need for a return queue? Can clients simply POST their results directly to the Dragonfly API? I assume since they will have to interface with the API anyway to fetch their ruleset when they detect they're out of date, they may as well POST results directly the API
Having a return queue likely helps alleviate situations where many clients are scanning many packages (and the API cannot keep up) but I'm not sure that's a goal worth aspiring to right now. It's definitely an effective future-proof scenario though.
The intention when we discussed this was that clients would POST directly to the API anyway. Clients bouncing a single request for rules off the API itself wasn't something I had really considered-- typically these rules are queued with the current ruleset SHA correct? I don't see that being an issue, but I'd be a little concerned that if we ever spin up multiple clients, and they're moving through packages quickly, we would be making potentially hundreds of requests to this endpoint per second. Not that it's serving that much, and I don't anticipate it would be an issue, but it bares mentioning nonetheless. If we put it behind any sort of rate limiting, we'll likely footgun ourselves in that regard.