uber-go/fx

app.Done() not notified when shutdown is called before all OnStart hooks have completed

phsym opened this issue · 6 comments

phsym commented

Describe the bug
When Shutdowner.Shutdown() is called before all OnStart hooks have been called, app.Done() and app.Wait() channels opened after the shutdown is triggered will never get notified.

I'm using v1.19.2

To Reproduce
Below test fails while I believe it should not

func TestFxShutdownOnStart_Fail(t *testing.T) {
	srv := fx.New(fx.Invoke(
		fx.Annotate(
			func(s fx.Shutdowner) fx.Shutdowner { return s },
			fx.OnStart(
				func(s fx.Shutdowner) error {
					go func() {
						s.Shutdown()
					}()
					time.Sleep(1 * time.Second)
					return nil
				}),
			fx.OnStop(func() {
				fmt.Printf("\n####### Stop hook called ########\n\n")
			}),
		),
	))
	assert.NoError(t, srv.Start(context.Background()))
	time.Sleep(1 * time.Second)
	ctx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancelFunc()
	select {
	case <-srv.Done():
	case <-ctx.Done():
		assert.NoError(t, ctx.Err(), "srv.Done() timed out")
	}
	assert.NoError(t, srv.Stop(context.Background()))
}

output:

[Fx] PROVIDE	fx.Lifecycle <= go.uber.org/fx.New.func1()
[Fx] PROVIDE	fx.Shutdowner <= go.uber.org/fx.(*App).shutdowner-fm()
[Fx] PROVIDE	fx.DotGraph <= go.uber.org/fx.(*App).dotGraph-fm()
[Fx] INVOKE		fx.Annotate(XXXXXXX_test.TestFxShutdownOnStart_Fail.func1()
[Fx] HOOK OnStart		go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1.1() executing (caller: go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1)
[Fx] HOOK OnStart		go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1.1() called by go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1 ran successfully in 1.000347365s
[Fx] RUNNING
[Fx] HOOK OnStop		go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1.1() executing (caller: go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1)

####### Stop hook called ########

[Fx] HOOK OnStop		go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1.1() called by go.uber.org/fx.(*lifecycleHookAnnotation).buildHookInstaller.func1 ran successfully in 15.141µs
--- FAIL: TestFxShutdownOnStart_Fail (7.00s)
    XXXXXXX_test.go:89: 
        	Error Trace:	XXXXXXX_test.go:89
        	Error:      	Received unexpected error:
        	            	context deadline exceeded
        	Test:       	TestFxShutdownOnStart_Fail
        	Messages:   	srv.Done() timed out
FAIL

The workaround is to acquire the srv.Done() channel before starting the application

func TestFxShutdownOnStart_OK(t *testing.T) {
	srv := fx.New(fx.Invoke(
		fx.Annotate(
			func(s fx.Shutdowner) fx.Shutdowner { return s },
			fx.OnStart(
				func(s fx.Shutdowner) error {
					go func() {
						s.Shutdown()
					}()
					time.Sleep(1 * time.Second)
					return nil
				}),
			fx.OnStop(func() {
				fmt.Printf("\n####### Stop hook called ########\n\n")
			}),
		),
	))
	ch := srv.Done()
	assert.NoError(t, srv.Start(context.Background()))
	time.Sleep(1 * time.Second)
	ctx, cancelFunc := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancelFunc()
	select {
	case <-ch:
	case <-ctx.Done():
		assert.NoError(t, ctx.Err(), "srv.Done() timed out")
	}
	assert.NoError(t, srv.Stop(context.Background()))
}

Expected behavior
Both test cases above should pass.

Additional context
A quick investigation led me to the conclusion that app.receiver is started after all OnStart hooks have been run as seen here https://github.com/uber-go/fx/blob/master/app.go#L681
recv.last is resetted to nil when receiver is started as seen here https://github.com/uber-go/fx/blob/master/signal.go#L112
but recv.last holds the last received shutdown signal, which is used to immediately notify Done() channels created after shutdown has been triggered, so any channel opened after that will never know that the app has been shut down.

r-hang commented

Duplicates #1035.

phsym commented

Hi @r-hang looking at issue #1035, i'm not sure if this is a duplicate

This is not a duplicate.

r-hang commented

Internally tracked with GO-1939

I believe this will be resolved by #1076

This should be fixed in the latest release.