openzipkin/zipkin-gcp

Implement pubsub

codefromthecrypt opened this issue · 7 comments

In #45, there was a proposal to add pubsub transport (collector and sender), but that never happened.

@javierviera is currently adding openzipkin/zipkin-go#142 on the golang side, but that's asymmetric and confusing if no server side exists.

In any case, the message format should be standard (ListOfSpans in proto or json)

Implementation wise, sender (java client) should likely use grpc as that's typical. collector should likely use armeria as that's less dependencies and fits into our normal observability tools better (logs metrics tracing) (see stackdriver-storage as an example).

Looking at the api, it seems there's no grpc endpoint for pubsub, but there's a rest api which likely shares similar auth etc. There seems to be a pull api which could be run in a loop similar to our kafka collectors https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/pull

@anuraaga have you done any work in pubsub in curiostack?

There is a gRPC Pub/Sub API. The subscriber side has streaming and pull methods, with pull being somewhat more predictable (streaming results in a single subscriber hogging a fairly large cache of messages, not allowing other load balanced subscribers to help).

I've worked with Pub/Sub plenty, but I am still getting oriented in Zipkin, and I seem to be behind on everything I've already promised to do. /temporarily backs into the bushes/

https://github.com/curioswitch/curiostack/tree/master/common/google-cloud/pubsub

Is an implementation of pubsub using armeria. It still depends on the official client library for the interface so it can be swapped in the transparently but that's easy to remove.

It would still depends on the gRPC stubs though. To remove gRPC dependency, similar to our stackdriver storage, we'd need to add stubless streaming support to armeria-grpc-protocol. Not trivial :)

I don't think there are any billing implications. For what it's worth, the official Java client, which exposes it's own API and hides the grpc, uses the streaming API so I suspect it is effectively better tested than the older unary pull.

Agree that the messaging API itself would be agnostic to whether it is served by the official library, or armeria-grpc, or armeria-grpc-protocol.

Like @elefeint said, there are some gotchas with the streaming pull, especially if you have more than one consumer. I would suggest reading the "StreamingPull" section in the docs.