sapcc/openstack-audit-middleware

do not block API calls

Closed this issue · 1 comments

jobrs commented

The event publication shall not block the webserver threads serving the API calls. Failures must not affect the API functionality.

This could be done by having either fail-fast publication of events with minimal timeouts, no immediate retries and in-memory queueing on error. Or we could move the event publication into a background thread.

The background thread approach might come with some not-so-unexpected complications:

  • green-threads: Darren learned the hard way that they do not offer preemptive multitasking but rather interfere with the parent process
  • multi-processing: a separate Python process requires interprocess communication. For sure not with files ... But do we have something like memory-pipes or shared-memory queues in Python?

The fail-fast approach would try to circumvent additional threads with the following strategy:

  • set time-outs to near zero
  • disable automatic retries by RabbitMQ client
  • queue events that could not be published and deliver them alongside the next event
  • back-off in case of sequential errors (queueing the events that arrive in the back-off period)
  • limit the queue and drop anything that does not fit in
jobrs commented

if the in-memory queue is full we could as well flush it to the logs or some other file.

the problem with this approach is this:

  • that will eat-up disk space, at some point affecting the application again
  • that will put the logging system under stress
  • if we do not use the logging system we have to build our own log-rotation with cron

there we are again with our own hand-crafted queue