oban-bg/oban

Performance of Postgres notifier - pg_notify is O(N^2)

Closed this issue · 3 comments

Problem

It looks like oban_notify trigger can become a serious bottleneck when scaling from hundreds to thousands jobs per second because pg_notify has de-duplication mechanism which has O(N^2) complexity.

I tried the following benchmark (elixir:1.13.4, 4 vCPU, 120 connections in an Ecto pool, Oban 2.13.4, Postgres 11.19 on AWS RDS t2.medium).

Benchee.run(%{
   schedule: fn -> 
      Repo.transaction(fn -> 
         job = ExampleWorker.new()
         Oban.insert(job)
      end)
   end,
}, parallel: 100)

I launched it twice: before and after disabling the trigger:

 ALTER TABLE oban_jobs disable trigger all;

Observation: disabling the trigger increased the throughput of Oban.insert() operation from 698 ops/sec to 1967 ops/sec ...

Name                    ips        average  deviation         median         99th %
with_pg_notify          6.98      143.32 ms    ±16.00%      136.13 ms      236.24 ms
without_pg_notify      19.67       50.84 ms    ±84.70%       40.84 ms      284.95 ms

... while decreasing the DB load by 2x (if measured in Average Active Sessions )

On the screenshot below, the first spike is with pg_nofity, the second spike is without. The notification trigger contributes to object lock type and higher CPU.

image

Expected solutions

A. Just set expectations in the docs. e.g. "the default Postgres notifications work fine for hundreds RPS.... Consider the PG notifier to handle thousands RPS"

B. Investigate throttling with pg_notify. Maybe a configuration setting to balance between "reactiveness" and throughput.

Alternatives Considered

Redis Pub/Sub notifier for high-load Elixir systems without proper cluster setup

Additional Context

Thanks to @marty-stranger for the finding.

Thanks for reporting on your findings. The issue is with the trigger and the notifier, not the notifier by itself. It's perfectly valid to use Postgres notifications without the triggers (necessary for some functionality, even).

Adding some documentation to set expectations is a great idea. Where would you expect to see such a comment? BTW, there's already a note in the PG docs about disabling triggers when migrating.

I'd expect this performance-related notice to be placed into "Caveats" section of Oban.Notifiers.Postgres moduledoc.

@vovayartsev Warning documentation updated. Side note, you mentioned testing on PG 11.19, but only PG 12+ is officially supported (and 14+ is highly recommended).