elastic/support-diagnostics

Kibana Diag Output | Paginated Alerts JSON repeats IDs and misses last page

stefnestor opened this issue ยท 9 comments

๐Ÿ‘‹ Howdy!

I'm experiencing issues on the Kibana diagnostic's alerts paginated output. I believe I shouldn't attach my v7.17.5 diagnostic since it has sensitive data, so will outline as best I can.

My cluster has 1493 Kibana(/SIEM) Rules.

When I run the diagnostic it paginates kibana_alerts_NUMBER.json at per_page:100 appropriately recognizing my total:1493.

However, ( Problem 1 ) I only ended up with max(NUMBER)=14 which means I only export 1400 of my expected 1493 Rules.

Sanity checking further ( Problem 2 ), I pulled $ cat kibana_alerts_NUMBER.json | jq '.data[]|.id' for all exported page NUMBER and combined all IDs into a file (had 1400) then performed a uniq line command which resulted in 1211 unique Rule IDs. To confirm, I grabbed one of the deduplicated IDs and compared its JSON (on page NUMBER 11 & 14) and they were equivalent.

My guess is the pagination problem may need raised to Kibana but I wanted to start here since the last page is missed which I believe is directly related to this repositories code.

TIA!

@kimholyszko @Heidi-Sager Can you take a look at this one

This should have been fixed in this commit here:
db9a072
of this PR
#594
I found this one out in my tests of that PR apart from seeing this issue.

I think the problem still persists.
In my Kibana I have 686 items in alerts and I was getting 6 files of 100 items, while I would have expected 7.

I've prepared a PR #602 but I have no automated tests yet about it.
I did try it against my cloud Kibana.

image

I also use perPage set to 1 to just get the total.

The problem on the current code is (int)Math.ceil(total / perPage); leads to int/int, so the decimal part is thrown away.

Weird, have not been able to reproduce this issue with my cloud environment from before the change, however it seems your approach is better @lucabelluccini

#602 appears to resolve Problem 1; however, Problem 2 would remain. Does it reproduce in either of your tests?

Also noting this fails as fully empty even when โ‰ฅ1600 Rules

image

Also noting this fails as fully empty even when โ‰ฅ1600 Rules

image

Is there a way for us to reproduce?

Is there a way for us to reproduce?

I'm not sure. I can't tell surrounding conditions. Right now it seems to be 1 in 15 clusters I pull a Kibana diagnostic this page is empty.

It turns out latest symptom โ˜๐Ÿผ was because of #578