resgateio/resgate

Server not stopping with correct signals

Closed this issue · 4 comments

I setup resgate to run in tandem with some other processes as part of my setup.

Using go-cmd, and raised an issue here:

go-cmd/cmd#19 (comment)

The test code to reproduce I don't have accessible but the 20 line example code in go-cmd is the same and so very easy to try.

It is important that spawned servers inside NATS terminate correctly in order for them to be well managed by the process manager.
I have not tried resgate with systemd or launchd or windows service yet.
I intend to try that next

Great that you try these things. Thanks!

Stopping of resgate can be initiated in two ways:
By signal (os.Interrupt, os.Kill, syscall.SIGHUP, syscall.SIGTERM, or syscall.SIGQUIT)
Or by resgate internally deciding to stop, which happens if:

In any of these cases, resgate will try to do a graceful shutdown for 10 seconds, after which it will exit with a panic (to provide a stacktrace as of why it failed to shut down):
https://github.com/jirenius/resgate/blob/master/main.go#L202

That means, resgate should always exit within 10 seconds after receiving a signal.

But it might exit without calling os,Exit, or it might exit with a panic.

What sort of termination is required by go-cmd (or any of the other wrappers, for that matter)?

I will try to reproduce the issue as well.

I assume we are talking about solving the problem of deploying/starting/stopping a set of micro services, that together composes a server application, to ease up the deployment and development process?

I haven't tried nomad, consul, or vault, but have rather just used docker as container for a group of micro services. Or for smaller projects, with limited number of services (1 - 3), the micro services are running as separate processes.

I don't really have any objectives myself, rather than to enable using resgate+microservices with the technology of choice, be it go-cmd, docker, K8, or HashiCorps solutions. And for this, if the resgates fails to send the proper error codes, preventing a solution to be used, it needs to be fixed!

But from what I understand, nomad is more used for creating service-to-service communication. RES protocol concerns service-to-client, and while RES-service protocol can also be used for service-to-service communcation, it doesn't have to. Both can exists in parallel: nomad with gRPC for faster service-to-service communcation, and RES for the client, with all the realtime data, security and caching that it provides.

I've investigated the exit codes returned by Resgate.

Resgate does respond to the Interrupt signal, as expected, exiting with code 0 on successful shutdown.
In other cases, such as disconnects from NATS server, Resgate correctly exits with error code 1.

I’m closing this issue because it has been inactive for a few months, and without being able to reproduce the issue.