Graylog2/graylog2-server

Beats input counts not single messages, but received bulks

Closed this issue · 14 comments

hc4 commented

Problem description

I installed new Beats plugin and found, that IN messages counter on top right showing not mesages count, but bulks count (in Beats single tcp packet contains a lot of messages).
Also it seems, that this bulks counted throught whole processing engine:

  • input and processing buffer on Node page shows count of bulks, not messages
  • journal usage shows count of bulks

and only place there I see actual messages count is Output buffer and OUT counter on top-right :)

It might be important that BeatsCodec provides decodeMessages instead of decode method.

Environment

  • Graylog Version: 2.0.1
  • Beats plugin Version: 1.0.1

@hc4 Please file this issue in the repository of the Graylog Beats plugin at https://github.com/Graylog2/graylog-plugin-beats

Upon thinking about it, it's more of a display issue with codecs supporting the spawning of multiple messages (like the mentioned BeatsCodec), so I'll reopen the issue.

hc4 commented

Yep. I also thought about that and so created issue here

I believe what is describe here is by design.

Graylog counts incoming messages at the input level. Historically we only allowed a single incoming message to produce a single outgoing messages. That has changed somewhere between 1.3 and 2.0.

Since Graylog sees a single beats packet, it only counts a single "message". The beats codec then produces multiple logical messages from it. That's what the output counts.

I don't believe this is a bug. If you look at the process buffer counts, you should see equal numbers.

hc4 commented

no... process buffer shows 1-2 messages.
And also journalling utilization shows count of bulks.
And even read/appended messages count shows bulks count
image

hc4 commented

also showing number of bulks on top-right have no sense, because bulk size is a huge random (could be 1, and could be 10000 messages)

The bulk size is unknown at the time the (raw, unparsed) message from *beats is written to the journal, so the display is correct. Maybe we should rename it from "messages" to "entries" as each journal entry can contain multiple messages.

From the time those raw messages have been parsed in BeatsCodec, the display of the number of messages (e. g. in the process and output buffers) should be correct.

hc4 commented

Internally BeatsFrameDecoder knows messages count.
Is it possible somehow to transfer this count to counting logic?
For example ChannelHandlerContext could be used to transfer actual messages count from frame decoder to RawMessageHandler, which can set new MessagesCount field in RawMessage, which could be later processed by statistics and other modules.

Or maybe return not single ChannelBuffer, but collection of ChannelBuffer's from FrameDecoder.

It is important for me to see actual messages count and not bulks, because currently I always see ~10 incoming messages regardless of actual incoming information amount.

Or maybe it is easier to convert Beats input to single message model, like it was in old version?

hc4 commented

I think I found good solution for protocols like Lumberjack, where you know actual messages count in bulk
Unfold feature of FrameDecoder could be used (see example in doc).
Simply decode mthod should return array of messages and unflod flag should be set to true.
return new Object[] { firstMessage, secondMessage, ... };

So it could be implemented in Beats input, and thus this issue could be closed.

hc4 commented

I've implemented unfold logic in Beats plugin - works great :)
Thereis changed files

Please open a pull request for discussing changes, otherwise it is really difficult to refer to them.

Without having seen the diff, I guess that it changes the behavior to do more work when receiving the beats packet, which keeps the IO thread busier.
It is very important to perform less work on that level, which is why the message decoding happens after persisting incoming raw messages into the journal.

From your previous comments I guess what you are really asking is to see throughput in terms of Message objects, rather input and output on the edges of graylog.

That could well be done, since the data is available and is merely a matter of requesting different metrics to display.

However the true incoming data metrics are based on whatever an input is producing, and everything that can be slower (due to unknown amount of work represented by the decoding step) must not be on the IO threads to avoid blocking it.

hc4 commented

To make pull request I need to fix tests, and I don't want to :)
Also I'am not a pro in java.

But there is main cahnge:

         if (events == null) {
             return null;
         } else {
-            return ChannelBuffers.copiedBuffer(objectMapper.writeValueAsBytes(events));
+           final Object[] result = new Object[events.size()];
+           for (int i = 0; i < result.length; i++) {
+               result[i] = ChannelBuffers.copiedBuffer(objectMapper.writeValueAsBytes(events.get(i)));
+           }
+           return result;
         }

I think there is shouldb't be much overhead.

hc4 commented

From your previous comments I guess what you are really asking is to see throughput in terms of Message objects, rather input and output on the edges of graylog.

Yep, but Lumberjack is specific protocol. It could transfer a lot of messages as a single zipped bulk. And I think it is logically correct, that Beats transport should extract single messages, not bulks from channel.
So actually I want to see throughtput in terms of messages on edges of graylog :)

Also I think current implementation of Beats transport is not fully ideal. According to current graylog model, frame decoder shouldn't parse data. It should just cut stream to byte buffers, each containing single message. And Codec should parse this bytes to actual messages.

After some internal discussion we've decided that we are not going to implement the general change at the moment.

For the change in the beats plugin itself we can benchmark the actual overhead and then make a decision. I'll close this in favor of Graylog2/graylog-plugin-beats#5.

Thanks!