shotover/shotover-proxy

shotover is a high performance proxy

Opened this issue · 0 comments

#1476 kicked off a bunch of work to refactor shotover in order to achieve its full performance potential.
I had a plan for future work but it was mostly in my head, so I really need a high level issue to track the high level goal and keep track of remaining tasks.

The remaining tasks in order:

  • remove old transform invariants
    • Rewrite RedisCache transform to use MessageId
    • Remove deprecated invariants from transform method docs
  • outgoing connection logic returns immediately when any messages are received
    • when an outgoing response receives a message it should trigger a tokio notify that starts the transform chain
  • Move Transform::transform_pushed into Transform::transform. #1524
  • Remove duplicated pubsub logic from RedisSinkSingle #1532
  • unify outgoing connection logic
    • currently cassandra has a whole lot of custom connection logic.
    • Initial PR #1532
  • batch outgoing messages #1471
  • swap single sinks to use SinkConnection::try_recv instead of SinkConnection::recv
  • add backpressure throughout the codebase:
  • Investigate swapping Encoder's to take a single message at once, and have the writer task call feed + flush instead of send on the FramedWrite.
    • Maybe we should have the connections deal with batching instead of the transforms. The transform just sends all the messages it wants to, then calls flush on all the connections it sent to.
  • Investigate TCP_NODELAY
    • #985
    • If we can reproduce issues with nagles, then reimplement minor nagles algorithm with a 1ms timeout within shotover so we can enable TCP_NODELAY

Once this is complete, shotover should:

  • have higher throughput
    • processing requests is no longer blocked on responses
  • have lower latency
    • responses are processed immediately without waiting for ALL responses in a batch to come back.
  • not be blocked by slow cassandra responses
    • cassandra is not transformed into an in order protocol.
  • be resilient by exerting backpressure when large messages or large overall throughput is received.
    • we stop receiving messages from the client or DB when certain criteria is met about the current in flight requests/responses.