Why does zipkin lose so much data with high concurrency?

Question

Why does zipkin lose so much data with high concurrency?

JoeHooo opened this issue 5 years ago · 3 comments

Concurrent 400,Loop 10 times
Jmeter:

ES:

Answer 1 · 2019-08-22T13:32:40.000Z

probably this is on the wrong repo. you can chat here https://gitter.im/openzipkin/zipkin

also you can look at STORAGE_THROTTLE_ENABLED
https://github.com/openzipkin/zipkin/tree/master/zipkin-server#throttled-storage-experimental

at the end of the day you will see drop metrics if you overwhelm elasticsearch (pretty easy to do on laptop). STORAGE_THROTTLE_ENABLED tries to respond to errors from elasticsearch and backup accordingly. The best way is actually to use a queue/topic like kafka instead of http when using an underprovisioned elasticsearch.

hope this helps.

Answer 2 · 2019-08-23T01:18:46.000Z

probably this is on the wrong repo. you can chat here https://gitter.im/openzipkin/zipkin

also you can look at STORAGE_THROTTLE_ENABLED
https://github.com/openzipkin/zipkin/tree/master/zipkin-server#throttled-storage-experimental

at the end of the day you will see drop metrics if you overwhelm elasticsearch (pretty easy to do on laptop). STORAGE_THROTTLE_ENABLED tries to respond to errors from elasticsearch and backup accordingly. The best way is actually to use a queue/topic like kafka instead of http when using an underprovisioned elasticsearch.

hope this helps.

Yes, I've set STORAGE_TYPE before, but it doesn't solve the problem of lost data.
Then I thought about whether I could get kong's data through kafka. I set the environment variable KAFKA_BOOTSTRAP_SERVERS of zipkin, but felt no improvement.I guess kong's data is still sent over HTTP
Now, my question is how to transfer kong's data to zipkin through kafka

This is my expected architecture diagram

I don't know if KAFKA_BOOTSTRAP_SERVERS are configured in the environment variable, so zipkin can collect data through kafka

Answer 3 · 2019-08-23T02:09:50.000Z

Yes, I've set STORAGE_TYPE before, but it doesn't solve the problem of lost data. Then I thought about whether I could get kong's data through kafka. I set the environment variable KAFKA_BOOTSTRAP_SERVERS of zipkin, but felt no improvement.I guess kong's data is still sent over HTTP

zipkin can receive from many transports at the same time. I don't think kong has an option besides http right now. https://github.com/Kong/kong-plugin-zipkin/blob/master/kong/plugins/zipkin/reporter.lua It does seem to have some means to sample. Also, as mentioned, you can enable storage throttling which will reduce impact of a lot of things thrashing at the same time. Also, someone can check how flushing works here.. usually we bundle many spans together and flush on occasion. ES is known to have problems when a whole bunch of things write at the same time. If flushing mechanisms here can result in that, this could be an issue also. Also remember the "flush concern" applies to not just here but anything you have sending data. Kafka fixes this problem because it decouples reporting of data and how parallel that is from writes to ES.