oban-bg/oban

Notifier logs after upgrading to 2.17.4

Closed this issue · 21 comments

I'm upgrading my application from 2.16.3 -> 2.17.4 and am seeing this getting constantly logged out in development. I haven't tried it out on production because I wasn't sure if it's something to be concerned with or if it's a dev only thing.

[info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
[info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}

What does this mean and how do I stop it?

The upgrade guide is a little confusing too. It says

"You must either run the v12 migrations or disable insert triggers in your Oban configuration, otherwise you'll receive duplicate insert notifications for each job."

In the docs, each method says it's optional, so is there one that is preferred over the other?

@beerlington Thanks for opening this issue. The constant warning was definitely a bug, as explained in the linked commit. We'll have v2.17.5 out soon with that fix.

What does this mean and how do I stop it?

It is supposed to mean that the notifier can't receive messages from other nodes in the cluster. It's only a concern in production. For now, you can safely ignore it (the mere fact that it was flipping means you had connectivity in the first place).

In the docs, each method says it's optional, so is there one that is preferred over the other?

You should probably run the v12 migration to disable database triggers. It's up to you whether you want to keep any triggers at all (it's what allows execution immediately as jobs are inserted).

@sorentwo

stale = System.system_time(:second) - state.interval * state.stale_mult

It seems that there is a subtraction of milliseconds to seconds. I think the unit should be :millisecond (if I understand correctly you want to ping every N seconds and consider a node stale if older than 2 pings – a bit like in the peer/leader mechanism with the inserted date).

@lud-wj You understand correctly, and I totally messed up that change. There's a fix in 6b9bf1d along with a test for the logic. Thank you!

Although this is closed, i'm still getting those notifier log messages in my local environment after upgrading to the latest version (2.17.9). @sorentwo

[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}

@noozo If you see them only once it is expected.

Nope... keep popping up periodically in the iex logs:

[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched back to global mode","mode":"global","source":"oban","event":"stager:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"args":{"scheduled":true},"id":6034319,"meta":{"cron":true,"cron_expr":"*/5 * * * *"},"system_time":1713885300476121676,"max_attempts":20,"queue":"project_matches_state_machine","worker":"PassionFruit.BackgroundJobs.Periodic.ProjectMatches.StateMachine.ChemistryCallsDoneWorker","source":"oban","event":"job:start","tags":[],"attempt":1}
[debug] QUERY OK db=1.0ms idle=699.6ms
begin []
[info] {"args":{"scheduled":true},"id":6034320,"meta":{"cron":true,"cron_expr":"*/5 * * * *"},"state":"success","max_attempts":20,"queue":"project_matches_state_machine","worker":"PassionFruit.BackgroundJobs.Periodic.ProjectMatches.StateMachine.ChemistryCallsDoneWorker","source":"oban","event":"job:stop","tags":[],"duration":23708,"queue_time":16398,"attempt":1}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched back to global mode","mode":"global","source":"oban","event":"stager:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}

Which notifier are you using? Those log entries are a sign that ping messages aren’t always being received, which indicates that something is wrong in the setup. Is it after a live reload, waking from sleep, or some other event?

my setup:

config :myapp, Oban,
  repo: MyApp.Repo,
  queues: MyAppObanQueues.queues(),
  plugins: [
    {Oban.Plugins.Cron, crontab: cron_jobs},
    # Prune jobs every hour
    {Oban.Plugins.Pruner, max_age: 3600}
  ]

queues() is just a list
cron_jobs is also a list of cron declarations.

everything else i think is just defaults. but actually now that i look at it running overnight, it seems they don't pop up recurring, though i get 2 messages per enqueued job. is that normal (albeit annoying)?

You’re using the default Postgres notifier, which isn’t the most reliable. There may be some logs about notifications in the Postgres logs.

Yes, three is a log entry for each job when it starts and when it stops. That’s normal.

The logger is opt in and you can change the level or not attach it in dev.

i'm still getting those notifier log messages in my local environment after upgrading to the latest version (2.17.9)

I'm encountering a similar situation when I start a iex on local, and I'm not sure if this is normal. (Oban latest version 2.17.9)

iex -S mix phx.server
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched back to global mode","mode":"global","source":"oban","event":"stager:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched back to global mode","mode":"global","source":"oban","event":"stager:switch"}

config

config :my_app, Oban,
  repo: MyApp.Repo,
  queues: [default: 10],
  plugins: [
    {Oban.Plugins.Pruner, max_age: 60 * 60 * 24 * 7}
  ]
postgres -V
postgres (PostgreSQL) 15.3 (Homebrew)

@apayu How close together were those messages? Are they spaced out every 15s?

This is the interval I observed in the logs.

# start
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
# about 1min
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
# about 30 sec
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
# about 30 sec
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
# about 1 min 45 sec
[info] {"message":"notifier can't receive messages from any nodes, functionality degraded","status":"isolated","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban","event":"stager:switch"}
# about 15 sec
[info] {"message":"notifier only receiving messages from its own node, functionality may be degraded","status":"solitary","source":"oban","event":"notifier:switch"}
[info] {"message":"job staging switched back to global mode","mode":"global","source":"oban","event":"stager:switch"}

@apayu The interval isn't consistent, so it doesn't seem like there's a bug in the detection (definitely not like what was originally happening in this issue).

I am also experiencing these the log messages (on 2.17.10). Here's a sample:

18:52:20.191 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
18:54:20.211 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
18:54:20.212 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
18:59:35.259 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
18:59:35.259 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:01:05.274 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:01:05.274 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:02:20.287 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:02:20.500 [info] {"event":"stager:switch","message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban"}
19:02:35.290 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:02:35.540 [info] {"event":"stager:switch","message":"job staging switched back to global mode","mode":"global","source":"oban"}
19:03:05.296 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:03:05.296 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:04:50.315 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:04:50.315 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:05:20.321 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:05:20.321 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:05:50.327 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:05:50.327 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:09:20.361 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:09:20.361 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:10:05.368 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:10:05.368 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:10:35.374 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:10:35.374 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:11:05.379 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:11:05.575 [info] {"event":"stager:switch","message":"job staging switched to local mode. local mode polls for jobs for every queue; restore global mode with a functional notifier","mode":"local","source":"oban"}
19:11:50.388 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:11:50.697 [info] {"event":"stager:switch","message":"job staging switched back to global mode","mode":"global","source":"oban"}
19:12:20.394 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:12:20.395 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:14:05.412 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:14:05.413 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}
19:18:42.142 [info] {"event":"notifier:switch","message":"notifier can't receive messages from any nodes, functionality degraded","source":"oban","status":"isolated"}
19:18:42.142 [info] {"event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban","status":"solitary"}

@Munksgaard Thanks for the extra logs. The timing info implies a race condition between the ping and the notification, which leads me to believe it's down to time warp/clock drift. This commit should solve the problem.

Feel free to give try by running from main in dev and let me know if it has any effect.

That does indeed seem to have done the trick! I get a single message at the start of my run, but after ~7 minutes I have not received any more messages:

Interactive Elixir (1.16.2) - press Ctrl+C to exit (type h() ENTER for help)
14:49:32.837 [info] {"connectivity_status":"solitary","event":"notifier:switch","message":"notifier only receiving messages from its own node, functionality may be degraded","source":"oban"}
iex [14:49 :: 1] >

I observed an issue equivalent to #1040 (comment) after upgrading to Oban 2.17.10. Pinning Oban in mix.exs to this commit fixed my problem.

Thanks for your work @sorentwo. Might you consider making a new release?

Oh damn, sorry!