uber-go/fx

`index out of range` on (*Lifecycle).Stop()

siennathesane opened this issue · 8 comments

Describe the bug

When stopping the application with a SIGINT, the lifecycle hooks can panic.

panic: runtime error: index out of range [9] with length 5

goroutine 1079 [running]:
go.uber.org/fx/internal/lifecycle.(*Lifecycle).Stop(0x140003aa0b0, {0x101e480d8, 0x1400098e840})
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/internal/lifecycle/lifecycle.go:151 +0x718
go.uber.org/fx.withTimeout.func1()
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/app.go:784 +0x74
created by go.uber.org/fx.withTimeout
        /Users/sienna/go/pkg/mod/go.uber.org/fx@v1.18.2/app.go:772 +0xd8
exit status 2

To Reproduce

Unclear, I'm not sure what's causing it. I would assume it's something to do with how the lifecycle hooks are registered.

Expected behavior

I expect it to error out instead of panicking.

Additional context

I am migrating an existing application to fx.

Thanks for reporting this. I'll try to repro this and triage this accordingly.

@JacobOaks to take a look at this issue.

Hey,

So I looked at a couple ways this panic can occur, and they all involve using the lifecycle a little differently. Can you clarify a little more about how you're running your app so I can try to narrow down what's happening? Specifically:

  • Are you calling fx.Run or fx.Start/fx.Stop?
  • If the latter, when/how are you stopping the app?
  • Are you running or starting the app multiple times? Concurrently or once at a time?
  • Are you specifying custom timeouts for your hooks or are any of them long-running?

Thanks!

  • Are you calling fx.Run or fx.Start/fx.Stop?

I am calling fx.New, then fx.Start, then fx.Stop when the signal is caught.

  • If the latter, when/how are you stopping the app?

Just CTRL+C directly or SIGINT indirectly.

  • Are you running or starting the app multiple times? Concurrently or once at a time?

No, one instance of the application.

  • Are you specifying custom timeouts for your hooks or are any of them long-running?

No custom timeouts but all of my hooks are long-running processes that require a specific startup and stop order.

I was able to move past the issue, but I'm not sure how I fixed it because I don't quite understand what was causing it.

Thanks for those details.

For some context, fx keeps an internal counter that gets incremented for every OnStart hook that completes. When the app begins shutting down, the counter is used as an index into the slice containing the registered hooks, and is decremented for each OnStop hook run, this way the corresponding hooks can be run in reverse order.

This panic is occurring because when the app starts shutting down, this internal counter is larger than the actual number of registered hooks. Meaning an OnStart hook was run more than once. In your case, it looks like there were only 5 actual hooks, but of those 5 hooks, there were 10 OnStart ones that ran by the time shutdown began.

I'll keep looking into this to see if there's an internal bug that could cause these to be double-counted, but in the mean time, do you have any reason to believe your OnStart hooks are/were being run twice?

Also, if you have a code snippet showing exactly how you're catching the signal to stop the application, that could be useful.

Unfortunately, I can't really provide a simple code snippet as it's a complex application, but based on your explanation of the problem, I agree that would be the case. Some of the issues that I had to struggle with were downstream libraries (re: NATS server) catching signals, stop hooks not having a matching start hook, and lazy constructor calling.

I believe part of the reason this issue popped up is because the lifecycle concept fx presents feels out of order. There are OnStart and OnStop hooks that get called, but you have to "[kick-start your application] since constructors are called lazily", which is a bit confusing when there are start and stop hooks. In my application, I need to start and stop the entire application through the lifecycle hooks or it won't resolve dependencies properly. For example, an HTTP handler in my application needs a NATS client as a dependency, but I can't generate the NATS client without the NATS server being started, which needs to start before the lifecycle hooks are called because the lifecycle won't be called until after the constructors are chained together. When I tried to register it with an OnStop hook, my application was panicking (re: this issue). While I'm "kick-starting" it through the constructor, it's happening outside the lifecycle hooks because those hooks are called too late. Right now, I'm handling startup outside the lifecycle hooks because the order of operations is breaking my application.

I spent several years working in .NET, and one of the things I really came to appreciate was HostBuilder and IHostedService. In a nutshell, the host builder is a framework that allows you to run multiple IHostedService implementations and uses DI to wire them all together, much like fx. What I really like about IHostedService is the simple start and stop interface that gets called as part of the lifecycle. It's easy to work with, especially since it uses DI to resolve dependencies, and services which implement IHostedService are started in the order in which dependencies are resolved (reversed for application shutdown). For example, if you have internal service X which implements IHostedService, and client builder Y for hosted service X, the IHostedService.StartAsync() method will get called on X before client builder Y's constructor is called because the DI framework recognizes that a service needs to be started before it can be consumed.

While I personally like the IHostedService model, I do think that might be something to consider for fx at some point. I believe it would allow fx to model more complex applications that use different application designs. It would break the lifecycle separation fx currently implements, but it would allow for service dependencies to be started so client builders can be resolved. I'm not sure how hard it would be to implement something like that. My use case might not be a good match for fx because I am hosting multiple internal services (monolithic design) on the application lifecycle, and that's okay. I wanted to surface the panic in case it was bug.

@mxplusb thanks for the thoughtful feedback. I used to work on the .NET team at Microsoft, and am somewhat familiar with the HostBuilder interface.

In Fx, the start/stop lifecycle hooks are executed separately from the dependency. The assumption here is that whatever you need to instantiate (such as the NATS server/client in the example you mentioned above) should be provided in the form of a fx.Provided constructor or a fx.Invoked function. The lifecycle methods (OnStart/OnStop hooks) are then called separately, in the order they were provided during the dependency resolution.

I understand that coming from the .NET world this subtle difference can cause some confusion.

Separately, thanks for reporting the panic! This issue did uncover a bug in the current implementation of lifecycle hook execution, so we will fix that panic shortly.

Closing this issue as we fixed the panic in #994. Thanks again for the report!