Tracing: looking at how incoming messages move through the system
hannahhoward opened this issue · 5 comments
Goals
Now that we have basic request tracing, it would be really helpful to understand how incoming responses feed into traversals on the request side.
What we'd like is:
- record spans for the processing of incoming response messages to see how long processing takes
- link those spans to block loads on a traversal in order to see when a traversal made usage of an incoming block
- this will require some fairly deeply nested passing of spans. The suggested set of steps is:
- start a new span inhttps://github.com/ipfs/go-graphsync/blob/main/requestmanager/server.go#L265
- pass into AsyncLoader.ProcessResponse https://github.com/ipfs/go-graphsync/blob/main/requestmanager/server.go#L271 / https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/asyncloader.go#L106
- pass into ResponseCache.ProcessResponse https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/asyncloader.go#L122 / https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/responsecache/responsecache.go#L70
- make a span Link https://pkg.go.dev/go.opentelemetry.io/otel/trace#LinkFromContext and pass to UnverifiedBlockStore.AddUnverifiedBlock https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/responsecache/responsecache.go#L76
- save the span Link in the UnverifiedBlockStore map -- will need to augment this data structure :( -- https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/unverifiedblockstore/unverifiedblockstore.go#L19
- in UnverifiedBlockStore.VerifyBlock -- if the data is found (https://github.com/ipfs/go-graphsync/blob/main/requestmanager/asyncloader/unverifiedblockstore/unverifiedblockstore.go#L67) , start a span from the context in the ipld.LinkContext -- this should contain the span for a request -- and use the WithLinks option to link the span block loading from the response cache to the original message that block came in through -- https://pkg.go.dev/go.opentelemetry.io/otel/trace#WithLinks
start a span from the context in the ipld.LinkContext -- this should contain the span for a request
I've just tried tracing the source of the ipld.LinkContext
in VerifyBlock()
and I don't think it comes with a span. It seems to originate in ipldutil.TraversalBuilder{}.Start(ctx)
in RequestManager#requestTask()
which uses rm.ctx
as its parent, which I believe comes from the root GS instance and not with a span.
I'll try and implement this as outlined but I'm suspecting that we're going to get unparented block load traces out of this.
yeah, the VerifyBlock()
linked spans don't have a parent because of the above reason:
([]interface {}) (len=128) {
(string) (len=53) "responseMessage(0)->loaderProcess(0)->cacheProcess(0)",
(string) (len=14) "verifyBlock(0)",
(string) (len=53) "responseMessage(1)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(2)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(3)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(4)->loaderProcess(0)->cacheProcess(0)",
(string) (len=14) "verifyBlock(1)",
(string) (len=14) "verifyBlock(2)",
(string) (len=14) "verifyBlock(3)",
(string) (len=14) "verifyBlock(4)",
(string) (len=14) "verifyBlock(5)",
(string) (len=53) "responseMessage(5)->loaderProcess(0)->cacheProcess(0)",
(string) (len=14) "verifyBlock(6)",
(string) (len=14) "verifyBlock(7)",
(string) (len=14) "verifyBlock(8)",
(string) (len=14) "verifyBlock(9)",
(string) (len=15) "verifyBlock(10)",
(string) (len=53) "responseMessage(6)->loaderProcess(0)->cacheProcess(0)",
(string) (len=15) "verifyBlock(11)",
(string) (len=53) "responseMessage(7)->loaderProcess(0)->cacheProcess(0)",
(string) (len=15) "verifyBlock(12)",
(string) (len=53) "responseMessage(8)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(9)->loaderProcess(0)->cacheProcess(0)",
(string) (len=15) "verifyBlock(13)",
(string) (len=54) "responseMessage(10)->loaderProcess(0)->cacheProcess(0)",
(string) (len=15) "verifyBlock(14)",
(string) (len=15) "verifyBlock(15)",
(string) (len=15) "verifyBlock(16)",
(string) (len=15) "verifyBlock(17)",
(string) (len=15) "verifyBlock(18)",
(string) (len=15) "verifyBlock(19)",
(string) (len=54) "responseMessage(11)->loaderProcess(0)->cacheProcess(0)",
(string) (len=15) "verifyBlock(20)",
...
OK, so we have a span available in the right place, so that can be wired in to the passed context, and we get:
(string) (len=53) "responseMessage(0)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(1)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(2)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(3)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(4)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(5)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(6)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(7)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(8)->loaderProcess(0)->cacheProcess(0)",
(string) (len=53) "responseMessage(9)->loaderProcess(0)->cacheProcess(0)",
(string) (len=54) "responseMessage(10)->loaderProcess(0)->cacheProcess(0)",
... (x 39)
(string) (len=26) "request(0)->verifyBlock(0)",
(string) (len=26) "request(0)->verifyBlock(1)",
(string) (len=26) "request(0)->verifyBlock(2)",
(string) (len=26) "request(0)->verifyBlock(3)",
(string) (len=26) "request(0)->verifyBlock(4)",
(string) (len=26) "request(0)->verifyBlock(5)",
(string) (len=26) "request(0)->verifyBlock(6)",
(string) (len=26) "request(0)->verifyBlock(7)",
(string) (len=26) "request(0)->verifyBlock(8)",
(string) (len=26) "request(0)->verifyBlock(9)",
(string) (len=27) "request(0)->verifyBlock(10)",
... (x 99)
nice!