2.0.0 Pin Deadlock?
ldeffenb opened this issue · 5 comments
Context
bee 2.0.0 on mainnet - Actually running my forked branch at https://github.com/ldeffenb/bee/tree/2.0.0-prerelease-hacks
Summary
I'm working on re-pinning missing chunks from the swarm into my OSM dataset. I do parallel pins from a TypeScript script. After successfully pinning 3 references, my scripts are hanging with outstanding pin requests that are not being satisfied.
Expected behavior
Pin requests should either succeed or fail, they shouldn't just hang out.
Actual behavior
Several pin requests are hanging. Here's the pin-related sections of a /debug/pprof/goroutine
, but your line numbers might not line up because I have some logging hack in my pin services (see github fork link above). I'll see if I can find the logs that go along with these pending pins as well.
3 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x1107a2e 0x11a3d85 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
# 0x101033d github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
# 0xa3d714 github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34 github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
# 0x100eb10 github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0 github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
# 0x1107a2d github.com/ethersphere/bee/v2/pkg/traversal.(*service).Traverse+0x10d github.com/ethersphere/bee/v2/pkg/traversal/traversal.go:58
# 0x11a3d84 github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x484 github.com/ethersphere/bee/v2/pkg/api/pin.go:66
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x10626cb github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
# 0x1062564 github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44 github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
# 0x114e944 github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4 github.com/gorilla/mux@v1.8.0/mux.go:210
# 0x1186e89 github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9 github.com/ethersphere/bee/v2/pkg/api/api.go:646
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d1d11 github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91 github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d1bae github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d055e github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e github.com/gorilla/handlers@v1.4.2/compress.go:148
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11cfdfa github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda github.com/ethersphere/bee/v2/pkg/api/router.go:93
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11b2cf9 github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139 github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x83f76d net/http.serverHandler.ServeHTTP+0x8d net/http/server.go:2938
# 0x83b653 net/http.(*conn).serve+0x5f3
3 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46de25 0x48fa48 0x11a3da5 0x83ca89 0x10626cc 0x1062565 0x114e945 0x1186e8a 0x83ca89 0x11d1d12 0x83ca89 0x11d1baf 0x83ca89 0x11d055f 0x83ca89 0x11cfdfb 0x83ca89 0x11b2cfa 0x83ca89 0x83f76e 0x83b654 0x471f81
# 0x46de24 sync.runtime_Semacquire+0x24 runtime/sema.go:62
# 0x48fa47 sync.(*WaitGroup).Wait+0x47 sync/waitgroup.go:116
# 0x11a3da4 github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash+0x4a4 github.com/ethersphere/bee/v2/pkg/api/pin.go:113
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x10626cb github.com/ethersphere/bee/v2/pkg/jsonhttp.HandleMethods+0x12b github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:25
# 0x1062564 github.com/ethersphere/bee/v2/pkg/jsonhttp.MethodHandler.ServeHTTP+0x44 github.com/ethersphere/bee/v2/pkg/jsonhttp/handlers.go:17
# 0x114e944 github.com/gorilla/mux.(*Router).ServeHTTP+0x1c4 github.com/gorilla/mux@v1.8.0/mux.go:210
# 0x1186e89 github.com/ethersphere/bee/v2/pkg/api.(*Service).corsHandler.func1+0x4c9 github.com/ethersphere/bee/v2/pkg/api/api.go:646
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d1d11 github.com/ethersphere/bee/v2/pkg/api.(*Service).pageviewMetricsHandler-fm.(*Service).pageviewMetricsHandler.func1+0x91 github.com/ethersphere/bee/v2/pkg/api/metrics.go:89
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d1bae github.com/ethersphere/bee/v2/pkg/api.(*Service).responseCodeMetricsHandler-fm.(*Service).responseCodeMetricsHandler.func1+0x6e github.com/ethersphere/bee/v2/pkg/api/metrics.go:97
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11d055e github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1.CompressHandler.CompressHandlerLevel.func1+0x69e github.com/gorilla/handlers@v1.4.2/compress.go:148
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11cfdfa github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.func1.1+0xda github.com/ethersphere/bee/v2/pkg/api/router.go:93
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x11b2cf9 github.com/ethersphere/bee/v2/pkg/api.(*Service).MountAPI.NewHTTPAccessLogHandler.func2.1+0x139 github.com/ethersphere/bee/v2/pkg/log/httpaccess/http_access.go:41
# 0x83ca88 net/http.HandlerFunc.ServeHTTP+0x28 net/http/server.go:2136
# 0x83f76d net/http.serverHandler.ServeHTTP+0x8d net/http/server.go:2938
# 0x83b653 net/http.(*conn).serve+0x5f3
2 @ 0x43eb0e 0x44ef85 0x101033e 0xa3d715 0x100eb11 0x11a5052 0x471f81
# 0x101033d github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x5dd github.com/ethersphere/bee/v2/pkg/storer/netstore.go:106
# 0xa3d714 github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34 github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
# 0x100eb10 github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0 github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
# 0x11a5051 github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1 github.com/ethersphere/bee/v2/pkg/api/pin.go:94
2 @ 0x43eb0e 0x44ffb8 0x44ff8f 0x46df45 0xfe71d5 0xfe71d6 0xa3d715 0x1010ec2 0xa3d715 0x100eb11 0x101008c 0xa3d715 0x100eb11 0x11a5052 0x471f81
# 0x46df44 sync.runtime_SemacquireRWMutexR+0x24 runtime/sema.go:82
# 0xfe71d4 sync.(*RWMutex).RLock+0x1b4 sync/rwmutex.go:71
# 0xfe71d5 github.com/ethersphere/bee/v2/pkg/storer/internal/cache.(*Cache).Getter.func1+0x1b5 github.com/ethersphere/bee/v2/pkg/storer/internal/cache/cache.go:183
# 0xa3d714 github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34 github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
# 0x1010ec1 github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1.(*DB).Lookup.func3+0xe1 github.com/ethersphere/bee/v2/pkg/storer/cachestore.go:76
# 0xa3d714 github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34 github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
# 0x100eb10 github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0 github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
# 0x101008b github.com/ethersphere/bee/v2/pkg/storer.(*DB).Download.func1+0x32b github.com/ethersphere/bee/v2/pkg/storer/netstore.go:95
# 0xa3d714 github.com/ethersphere/bee/v2/pkg/storage.GetterFunc.Get+0x34 github.com/ethersphere/bee/v2/pkg/storage/chunkstore.go:60
# 0x100eb10 github.com/ethersphere/bee/v2/pkg/storer.getterWithMetrics.Get+0xb0 github.com/ethersphere/bee/v2/pkg/storer/metrics.go:203
# 0x11a5051 github.com/ethersphere/bee/v2/pkg/api.(*Service).pinRootHash.func1.1+0x2d1 github.com/ethersphere/bee/v2/pkg/api/pin.go:94
Steps to reproduce
Attempt to pin references that are not necessarily local to the node. My script is rummaging the swarm to find chunks owned by the OSM batch and then attempts to pin them locally. Some of these references are multi-chunk BMTs with some intermediate chunks not present. I don't know if that is the case on these hung pin requests, but it is distinctly possible. The 3 pins that actually succeeded since upgraded to 2.0.0 were single chunk references, likely mantaray nodes.
Possible solution
Wish I had one, but I wanted to get this report started ASAP.
The full goroutines page:
And the full goroutines stackdump page:
Extract of my hacked-version verbosity 5 logs for pin and the 3 references that don't seem to have completed at the end.
Actually, it is quite possible this deadlock only exists in my fork. I just found this bad merge that duplicated the locking in the cache putter. Granted, the defers should unlock it, but I'm not sure you can lock either of these locks twice from a single goroutine.
Looks like this was my own stupid merge misteak (sic). Closing unless it happens again now that I've removed my duplicated locking block.