performance degradation because of additional data in collector
ramey opened this issue · 2 comments
Problem
In Raccoon
following buffered channel is used to collect events coming from clients.
bufferChannel := make(chan collection.CollectRequest, config.Worker.ChannelSize)
CollectRequest
struct is defined as follows-
type CollectRequest struct {
ConnectionIdentifier identification.Identifier
TimeConsumed time.Time
TimePushed time.Time
*pb.SendEventRequest
}
SendEventRequest
is part of auto generated go
code from proto. The autogenerated go code has a lot of extra data that is passed in the channel which also includes objects of type sync.Mutex and unsafe.Pointer
After changing the function definition of ProduceBulk
method for publisher.KafkaProducer
from
ProduceBulk(events []*pb.Event, deliveryChannel chan kafka.Event) error
to
ProduceBulk(request collection.CollectRequest, deliveryChannel chan kafka.Event)
There was an increase in event_processing_duration_milliseconds
What is the impact?
Intermittent latency spikes.
Which version was this found?**
Issue was observed in v0.1.3
Solution
Change definition for CollectRequest
to just pass the data required by the worker to process the event like EventBytes
, Type
and SentTime
.
The original implementation creates a new EventsBatch request from the incoming message payload and passes its to the channel. There was never a need to flatten out the request. The publisher & the workers worked with this pointer reference.
With the fix in this PR, I see that the request is flattened out in the CollectRequest with []Events. Why is there a need to do this?
@chakravarthyvp will not do additional refactoring here and just optimize on other aspects