Worker: do something better when the JWT expires
Opened this issue · 0 comments
We've had some problems lately with the JWT on the run channel expiring , causing messages to fail.
It looks a bit like this:
The worker should do better in these cases, throwing a clear error and existing the channel and maybe the socket.
The question is WHERE we report. We can't tell lightning because, well, the JWT expired. Any attempts to send a message back will be rejected.
This comes into monitoring - we don't have a monitoring solution yet, other than lightning and GCP.
Perhaps a good approach is to shut the whole server down (probably gracefully) with a clear error like "expired JWT detected" and stop requesting traffic. That depends a bit whether it's one run that's expired or whether all JWTs are wrong.
We probably don't unit test for any of this very well at the moment. I think unit testing and clear logs to GCP are the first step. That also makes it easier for us to trace where to add more monitoring later.