ethersphere/bee

2.0.0 Pin Deadlock?

ldeffenb opened this issue · 5 comments

Context

bee 2.0.0 on mainnet - Actually running my forked branch at https://github.com/ldeffenb/bee/tree/2.0.0-prerelease-hacks

Summary

I'm working on re-pinning missing chunks from the swarm into my OSM dataset. I do parallel pins from a TypeScript script. After successfully pinning 3 references, my scripts are hanging with outstanding pin requests that are not being satisfied.

Expected behavior

Pin requests should either succeed or fail, they shouldn't just hang out.

Actual behavior

Several pin requests are hanging. Here's the pin-related sections of a /debug/pprof/goroutine, but your line numbers might not line up because I have some logging hack in my pin services (see github fork link above). I'll see if I can find the logs that go along with these pending pins as well.

3 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x1107a2e 0x11a3d85 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
#	0x101033d	github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd								github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
#	0xa3d714	github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34									github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#	0x100eb10	github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0								github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#	0x1107a2d	github.com/ethersphere/bee/v2/pkg/traversal.(*service).Traverse+0x10d								github.com/ethersphere/bee/v2/pkg/traversal/traversal.go:58
#	0x11a3d84	github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x484								github.com/ethersphere/bee/v2/pkg/api/pin.go:66
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x10626cb	github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b									github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
#	0x1062564	github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44								github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
#	0x114e944	github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4										github.com/gorilla/mux@v1.8.0/mux.go:210
#	0x1186e89	github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9							github.com/ethersphere/bee/v2/pkg/api/api.go:646
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d1d11	github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91		github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d1bae	github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e	github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d055e	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e		github.com/gorilla/handlers@v1.4.2/compress.go:148
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11cfdfa	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda								github.com/ethersphere/bee/v2/pkg/api/router.go:93
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11b2cf9	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139					github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x83f76d	net/http.serverHandler.ServeHTTP+0x8d												net/http/server.go:2938
#	0x83b653	net/http.(*conn).serve+0x5f3	
3 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46de25 0x48fa48 0x11a3da5 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
#	0x46de24	sync.runtime_Semacquire+0x24													runtime/sema.go:62
#	0x48fa47	sync.(*WaitGroup).Wait+0x47													sync/waitgroup.go:116
#	0x11a3da4	github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x4a4								github.com/ethersphere/bee/v2/pkg/api/pin.go:113
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x10626cb	github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b									github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
#	0x1062564	github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44								github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
#	0x114e944	github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4										github.com/gorilla/mux@v1.8.0/mux.go:210
#	0x1186e89	github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9							github.com/ethersphere/bee/v2/pkg/api/api.go:646
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d1d11	github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91		github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d1bae	github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e	github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11d055e	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e		github.com/gorilla/handlers@v1.4.2/compress.go:148
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11cfdfa	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda								github.com/ethersphere/bee/v2/pkg/api/router.go:93
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x11b2cf9	github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139					github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
#	0x83ca88	net/http.HandlerFunc.ServeHTTP+0x28												net/http/server.go:2136
#	0x83f76d	net/http.serverHandler.ServeHTTP+0x8d												net/http/server.go:2938
#	0x83b653	net/http.(*conn).serve+0x5f3	
2 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x11a5052 0x471f81
#	0x101033d	github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd		github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
#	0xa3d714	github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34			github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#	0x100eb10	github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0		github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#	0x11a5051	github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1	github.com/ethersphere/bee/v2/pkg/api/pin.go:94
2 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46df45 0xfe71d5 0xfe71d6 0xa3d715 0x1010ec2 0xa3d715 0x100eb11 0x101008c 0xa3d715 0x100eb11 0x11a5052 0x471f81
#	0x46df44	sync.runtime_SemacquireRWMutexR+0x24							runtime/sema.go:82
#	0xfe71d4	sync.(*RWMutex).RLock+0x1b4								sync/rwmutex.go:71
#	0xfe71d5	github.com/ethersphere/bee/v2/pkg/storer/internal/cache.(*Cache).Getter.func1+0x1b5	github.com/ethersphere/bee/v2/pkg/storer/internal/cache/cache.go:183
#	0xa3d714	github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34				github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#	0x1010ec1	github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1.(*DB).Lookup.func3+0xe1	github.com/ethersphere/bee/v2/pkg/storer/cachestore.go:76
#	0xa3d714	github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34				github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#	0x100eb10	github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0			github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#	0x101008b	github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x32b			github.com/ethersphere/bee/v2/pkg/storer/netstore.go:95
#	0xa3d714	github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34				github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
#	0x100eb10	github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0			github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
#	0x11a5051	github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1		github.com/ethersphere/bee/v2/pkg/api/pin.go:94

Steps to reproduce

Attempt to pin references that are not necessarily local to the node. My script is rummaging the swarm to find chunks owned by the OSM batch and then attempts to pin them locally. Some of these references are multi-chunk BMTs with some intermediate chunks not present. I don't know if that is the case on these hung pin requests, but it is distinctly possible. The 3 pins that actually succeeded since upgraded to 2.0.0 were single chunk references, likely mantaray nodes.

Possible solution

Wish I had one, but I wanted to get this report started ASAP.

The full goroutines page:

issue-4624-goroutines.txt

And the full goroutines stackdump page:

issue-4624-full-goroutine-stackdump.txt

Extract of my hacked-version verbosity 5 logs for pin and the 3 references that don't seem to have completed at the end.

issue-4624-pin.log

Actually, it is quite possible this deadlock only exists in my fork. I just found this bad merge that duplicated the locking in the cache putter. Granted, the defers should unlock it, but I'm not sure you can lock either of these locks twice from a single goroutine.

https://github.com/ldeffenb/bee/blob/bbd504f82bc95a47f6c89018f93e1898aae22f94/pkg/storer/internal/cache/cache.go#L114

Looks like this was my own stupid merge misteak (sic). Closing unless it happens again now that I've removed my duplicated locking block.