Cluster members loose communication (SSL error in keep alive)
mars opened this issue · 4 comments
The following error appears in the logs when the app is scaled beyond a single dyno:
2016/02/08 22:46:39 [error] 91#0: [lua] cluster.lua:84: Cassandra error: 10.1.16.105, context: ngx.timer
Uncertain if the error causes problems with the runtime. The /cluster
API status appears healthy and Kong proxy services requests as expected.
The issue was originally opened against Kong itself, but later found that the error is only reproducible with this app.
Update: the Kong cluster looses cohesion after several days. A restart fixes it, but then will regress again within a few days. Even though all of the Kong instances are still running, their Admin /cluster API only lists a single node (the instance itself.) Suspecting this Cassandra error is from the cluster "keep alive" code.
Thanks @thibaultcha for the lead on improved Cassandra error messages. Here's what we see now:
2016/04/25 23:26:22 [error] 109#0: [lua] cluster.lua:80: Cassandra error: Error during SSL handshake with host at 10.1.46.97:9042: 18: self signed certificate, context: ngx.timer
Seems strange, since Cassandra/SSL works fine in other contexts. It's just these cluster timers that loose the certificate somehow. Any idea why a self-signed cert is assumed here?
Finally found the underlying bug, and fortunately the fix is in ngx_lua master for the next release 0.10.3.
No longer an issue with Kong 0.11