Add instrumentation about volume of data parsed during resynchronization

Question

Add instrumentation about volume of data parsed during resynchronization

Closed this issue 7 months ago · 10 comments

The spicy profiler provides no information about the time spent in the resynchronization code. It would be great to have metrics around the volume of data required to achieve resynchronization and the total time taken.

Answer 1 · 2024-02-21T12:33:59.000Z

@Mohan-Dhawan give #1676 a try.

Answer 2 · 2024-02-23T05:35:13.000Z

Thanks @rsmmr . I do get the volume stats in the output. It would also be nice to know if a higher reported volume in gaps or synchronize is detrimental to performance.

Answer 3 · 2024-02-23T08:18:09.000Z

Thanks @rsmmr . I do get the volume stats in the output.

Can I see the output?

It would also be nice to know if a higher reported volume in gaps or synchronize is detrimental to performance.

I don't follow what you mean, can you elaborate how the numbers could be improved?

Answer 4 · 2024-02-23T08:21:30.000Z

spicy/unit/<pdu>/__gap__                   172043743 12695491160       0.00      14.53      2013841114
spicy/unit/<pdu>/__synchronize__               82553 3648394463       0.00       4.18             184

What I wanted to know is what are the acceptable limits for performance for gap and for synchronize?

Answer 5 · 2024-02-23T08:35:06.000Z

There's no general answer to that. You need to put in relation to the input volume / standard parsing.

Answer 6 · 2024-02-23T10:28:06.000Z

The context here is that I have a 698MB trace with 8883 connections but having gaps in content. A flamegraph for its execution yielded that close to 66% of samples were in the unit responsible for synchronization. About half of those samples were from MatchState::advance and majority of the calls from it were to the function jrx_regexec_partial_min. Given that the volume of bytes in the __synchronize__ entry is just 184, is the high volume of jrx_* calls indicative of any edge case?

Answer 7 · 2024-02-27T09:07:06.000Z

Can you send me the full output please?

Answer 8 · 2024-03-06T12:27:23.000Z

For the record, I never received the full output, so we need to take the measurement with a grain of salt for now.

Answer 9 · 2024-03-06T13:42:32.000Z

Hi @rsmmr . Sorry, it completely slipped out of my mind. I've sent you the detailed output in the Zeek Slack DM.

Answer 10 · 2024-06-10T20:23:03.000Z

Somewhat related, I bumped #1133 into TODO.