census-instrumentation/opencensus-go

goroutine leak detected

Opened this issue ยท 4 comments

Please answer these questions before submitting a bug report.

What version of OpenCensus are you using?

v0.22.0

What version of Go are you using?

go1.13.6

What did you do?

I was running the new goleak tool on my codebase and ran into a leak coming from OpenCensus.

goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 37 in state select, with go.opencensus.io/stats/view.(*worker).start on top of the stack:
goroutine 37 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000143310)
	/Users/adam/code/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:154 +0x100
created by go.opencensus.io/stats/view.init.0
	/Users/adam/code/pkg/mod/go.opencensus.io@v0.22.0/stats/view/worker.go:32 +0x57
]

What did you expect to see?

I expected no goroutine leaks in libraries I depend on.

What did you see instead?

A crash from goleaks.

Additional context

I'm pulling this library in via gocloud.dev/secrets v0.17.0, but that doesn't matter here since v0.22.0 of OpenCensus is used in both.

This appears to be coming from internal global state of a default recorder being set.

https://github.com/census-instrumentation/opencensus-go/blob/v0.22.0/stats/view/worker.go#L30-L34

Also, there doesn't seem to be a way to retrieve this reporter to shutdown on application shutdown.

https://github.com/census-instrumentation/opencensus-go/search?q=DefaultRecorder&unscoped_q=DefaultRecorder

Here is a workaround that I used:

ignoreOpenCensus := goleak.IgnoreTopFunction("go.opencensus.io/stats/view.(*worker).start")
defer goleak.VerifyNone(suite.T(), ignoreOpenCensus)

Just wanted to bump one and add a little context โ€” a lot of us are running goleak in our tests because it's easy to make mistakes around leaking goroutines in Go, and once you have enough code written in a project, you probably are somewhere, and goleak is a great way to hedge against that and protect yourself against bad production problems before they happen.

We're not using this module directly, but it comes in transitively via a Google module we're using, and even when we don't use that Google module directly, this package's goroutine still starts because it's in an init, so we have to protect against its presence in every one of our project's package test suites.

I think you'd find reasonably good consensus (pun intended) out there that it's bad form to (1) starting goroutines in init, and (2) not provide any way of stopping goroutines that've started. Our project sits at around 160 module dependencies including transitive ones, and OpenCensus is the only one that behaves this way.

If you're open to it, even moving the goroutine to start lazily on first use would be an improvement. Google's Go APIs that are using OpenCensus provide a WithTelemetryDisabled option to disable it, but as far as the goroutine is concerned it doesn't make a difference โ€” it starts regardless.

@brandur That all sounds reasonable. Given the state of the project, we have to be careful not to break existing users, but lazily starting the goroutine should be backwards-compatible. Let me know if you are planning to put up a PR to fix it.

@dashpole I opened a pull request at #1287 in case you have a change to take a look. I think it fulfills the backwards compatibility requirement.

@adamdecaf One thing I noticed while in there is that there is a package-level view.Stop() function that you can probably use to satisfy the goroutine checker.