elixir-ecto/postgrex

Postgrex.Notifications does not receive notifications from hosted PostgreSQL DB on Azure

Closed this issue · 18 comments

We are using Microsoft Azure with our app running on VMs and our PostgreSQL DB on a separate VM. Using Postgrex.Notifications we are able to emit events from our DB and respond accordingly in our app. We have been exploring moving to a hosted DB in Azure, but in our testing we have been unable to receive any events through Postgrex.Notifications.

Using the psql client, I have verified that our database is emitting the pg_notify events and I have confirmed that Postgrex.Notifications.start_link and Postgrex.Notifications.listen return the {:ok, _} tuples that are expected.

Thanks @petersenlance for the heads up!

We don't have resources to look into this, so I would recommend you to dig deeper and let us know of anything you find or if we can help with something.

Thanks for the response, @josevalim. We'll keep working on this and let you know if we make progress.

If you do end up having some resources and it would help, we could set up a dummy hosted DB through our Azure account and DM you the credentials for you to connect to it for testing.

Could this be duplicate of #359 ?

It is definitely connected, since #359 was opened by one of my co-workers. However, that issue was different. In that case, we had a VM with our own PostgreSQL DB set up. When our app first connected to our DB we would receive notifications through Postgrex.Notifications.

We are looking into moving to a hosted solution with Azure, but when we connect to the hosted DB we never receive notifications.

Hi @petersenlance, any news here?

Which PGSQL version are you running in production? If you can create a minimal app that reproduces the issue by connecting to Azure, I can take a look at it. Ideally the app would:

  1. have a process sending notifications through azure
  2. have a process receiving those notifications through azure

Since the app may have private credentials, it is ok for it to be private. You can give it access to my GH account.

@josevalim I've encountered the exact same issue with a Postgres (v9.6) database hosted in Azure.

Running the postgrex notification tests, via a fork of this repo and by adding this commit with a hard-coded connection to an Azure hosted Postgres instance, results in the following failures:

$ mix test test/notification_test.exs 

  1) test listening, notify, then receive (using registered names) (NotificationTest)
     test/notification_test.exs:51
     No message matching {:notification, ^receiver_pid, ^ref, "channel", ""} after 1000ms.
     The following variables were pinned:
       receiver_pid = #PID<0.189.0>
       ref = #Reference<0.1943451657.2336751618.90446>
     The process mailbox is empty.
     code: assert_receive {:notification, ^receiver_pid, ^ref, "channel", ""}
     stacktrace:
       test/notification_test.exs:58: (test)

  2) test listening x2, unlistening, notify, receive (NotificationTest)
     test/notification_test.exs:70
     No message matching {:notification, ^pid, ^ref1, "channel", ""} after 1000ms.
     The following variables were pinned:
       pid = #PID<0.192.0>
       ref1 = #Reference<0.1943451657.2336751618.90560>
     The process mailbox is empty.
     code: assert_receive {:notification, ^pid, ^ref1, "channel", ""}, 1_000
     stacktrace:
       test/notification_test.exs:80: (test)
.
  3) test listening, notify, then receive (without payload) (NotificationTest)
     test/notification_test.exs:43
     No message matching {:notification, ^receiver_pid, ^ref, "channel", ""} after 1000ms.
     The following variables were pinned:
       receiver_pid = #PID<0.199.0>
       ref = #Reference<0.1943451657.2336751618.90753>
     The process mailbox is empty.
     code: assert_receive {:notification, ^receiver_pid, ^ref, "channel", ""}
     stacktrace:
       test/notification_test.exs:48: (test)
.
  4) test listening, notify, then receive (with payload) (NotificationTest)
     test/notification_test.exs:35
     No message matching {:notification, ^receiver_pid, ^ref, "channel", "hello"} after 1000ms.
     The following variables were pinned:
       receiver_pid = #PID<0.205.0>
       ref = #Reference<0.1943451657.2336751618.90946>
     The process mailbox is empty.
     code: assert_receive {:notification, ^receiver_pid, ^ref, "channel", "hello"}
     stacktrace:
       test/notification_test.exs:40: (test)
...

Finished in 62.6 seconds
9 tests, 4 failures

I've looked through the Postgres server parameters that are available to configure in the Azure portal but cannot see anything that is noticeably different than on my local Postgres instance which is working fine.

@slashdotdash can you ping me on IRC or somewhere where we can chat? :)

Basically, I cannot try your commit out because of "(invalid_authorization_specification) no pg_hba.conf entry for host "188.146.238.184", user "postgrex", database "postgrex_test", SSL on". If you can share that privately it would be welcome. :)

@josevalim I've just sent you an email.

Thanks @slashdotdash!

I was able to reproduce the issue.

I am thinking this is an issue with Azure. If the connection that is listening is idle, then the notification is never received. However, if you issue commands in the connection, then the notifications do arrive.

Take the following test:

  test "listening, notify, then receive (without payload)", context do
    assert {:ok, ref} = PN.listen(context.pid_ps, "channel")

    assert {:ok, %Postgrex.Result{command: :notify}} = P.query(context.pid, "NOTIFY channel", [])
    receiver_pid = context.pid_ps
    Process.sleep(1000)
    # _ = PN.listen(context.pid_ps, "channel2")
    assert_receive {:notification, ^receiver_pid, ^ref, "channel", ""}
  end

If you uncomment the line above, then the test pass, meaning that you receive the notification as long as you are writing to the connection, which defeats the whole purpose of LISTEN/NOTIFY.

Hey!
We had the same issue and triggered @slashdotdash as it occurred in Commanded.

We raised the issue with Azure support, current status being:

I replicated the behavior outside the Elixir environment.
I engaged the Azure Database for PostgreSQL engineering team to analyze the problem.

and

The Azure Database for PostgreSQL engineering team is investigating the issue.
I will provide an update as soon as possible.

@slashdotdash I'm glad you were you able to set up something that reproduces this.

We have been working with Azure support. They pointed me at tcp_keepalives (https://www.postgresql.org/docs/9.5/static/runtime-config-connection.html and https://stackoverflow.com/questions/20856599/mysql-ping-hangs-with-amazon-rds) but I have not had a chance to test this out yet.

@petersenlance not sure if keep alive will make a difference because those things are happening in a matter of miliseconds and we still do not receive any data.

Do we know anything more here? From reading the discussion it looks like the problem is at the azure side not ours? @timbuchwaldt @petersenlance could you confirm this?

So, is there a build of postgrex out there that will we can try to use to solve our problems?

@SkipMike if you use the master branch, we have added periodic pings. you can change the ping frequency and that will specify how frequent you receive messages. That may workaround the issue but I still think the behaviour is broken on the Azure side of things.

@josevalim Thanks. we'll try it out.

Closing this as we believe this issue is fixed in master. We hope a new release will be out in the weeks. Thanks!