Filter out flood of member & hidden event spam when we detect the scrollback is full of it

Question

Filter out flood of member & hidden event spam when we detect the scrollback is full of it

Opened this issue 2 years ago · 24 comments

Originally opened as a element-web issue (on 2022-04-22) that was incorrectly moved to #491 and then a discussion

Your use case

Why would you like to do it?

Rooms can be overwhelmed by bulk spam users joining rooms (thousands and thousands). Each one of those joins and leaves creates an event in the timeline.

Currently in rooms like this and trying to scrollback, you just get stuck on the thousands of member events that we only paginate 20 at a time. Each request is so slow and it doesn't even get me further back in actual results I want to see.

The goal of this change is to make the room scrollback usable again and be able to view the history of the room. Otherwise, when these spam incidents occur, that whole time period in the room is essentially a black hole.

What would you like to do? / How would you like to achieve it?

When we detect that the whole /messages response is filled with m.room.member join, leave, and invite events, we can ask the user whether they want to continue scrolling back without them. If they accept, we should add a filter to /messages to not include them.

Here is a mockup of what the user prompt could look like: "It looks like you're paginating through a lot of member events, would you like to scrollback without them?"

Another option is to automatically start back-paginating by a much bigger value (500).

Another option is to use MSC3030 jump to date to jump past all of the messages. Behind the scenes, we could use /messages with a filter to find the spot and then jump.

Have you considered any alternatives?

It's possible to hide all join/leave messages in the timeline with Settings -> Preferences -> Timeline section -> toggle the Show join/leave messages (invites/removes/bans unaffected) (showJoinLeaves) setting. But this just affects the display of the event. It doesn't help with filtering them out of the /messages pagination requests to being with to speed things up and get to the results we care about.

Additional context

Hide member events
Filter out member events when paginating /messages
Scrollback should filter member events when there is too many
Scrollback is slow and filled with member events
Flood of member state spam
Filter out bulk spam member events when we detect the scrollback is full of them

Answer 1 · 2023-02-02T23:05:43.000Z

Re-opening here as we're seeing this in the case of Gitter rooms where I synced the room membership. Giant blocks of membership that are impossible to paginate past. My original proposal still seems reasonable to me but this is really just tracking the problem with one potential solution of many.

In the last issue, @t3chguy noted some caveats with historical profiles not working correctly if we skip fetching membership events. And also affects push rules since historical profiles are needed to evaluate if a given message pings. These seem minor in comparison to the room being unnavigable though. And are technical problems we can overcome like with using MSC3952 for intentional mentions or just ignore the problem since the chance of changing your profile (and receiving a notification) in a scenario like this is probably very small so you probably won't miss any notifications anyway.

element-hq/element-web#19086 (suggested by @catscratchedme)
element-hq/element-web#22662 (not a duplicate)

Answer 2 · 2023-02-07T15:28:42.000Z

This also impacts things such as live location sharing. In that case, however, the events are in the timeline and possibly encrypted. It would be nice if we could handle these cases with the same approach but I think the /messages filter wouldn't work there, right?

Answer 3 · 2023-02-07T15:46:52.000Z

Not entirely sure, if what I'm seeing is covered by this issue here (see also element-hq/roadmap#26 (comment)) - but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im). I got multiple people to confirm this using element desktop and element web. I.e. ever since the mass joining of gitter users, the room remains pretty much unusable via element.

Answer 4 · 2023-02-07T15:53:12.000Z

This is having an impact on the Gitter migration, so might need prioritising @daniellekirkwood @Johennes

Answer 5 · 2023-02-07T16:04:57.000Z

but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im).

@ekpyron Please note that the setting mentioned in the issue won't help at all. As mentioned in the description "[that setting] just affects the display of the event. It doesn't help with filtering them out of the /messages pagination requests to being with to speed things up and get to the results we care about."

Answer 6 · 2023-02-08T14:11:14.000Z

Investigating this today. The first thing is to establish the behaviour, since we thought there might be a bug where we don't actually keep back-paginating when we should.

I've created a room on my local synapse with 10000 hidden events followed by some chat messages. When Element Web tries to display the room, it does keep making requests to the messages API, but they get slower and slower until it seems to grind to a halt.

Looking at the actual requests from my Synapse, according to Firefox they are taking ~9ms consistently, so the slowdown is in the client.

Answer 7 · 2023-02-08T14:16:42.000Z

When I set the limit to 2000 instead of 20, I got responses of size 1000, presumably due to a Synapse limit.
After the first 1000 were received, Element Web slowed to a halt and didn't request the next batch for a long time.
This is the most important thing to investigate, I think.

Answer 8 · 2023-02-08T14:22:58.000Z

Running it through the Firefox profiler, I see almost all the time is spent inside decryptGroupMessage, and actually in the WASM code of olm. This might be a red herring, or at least a different problem, because I am assuming the Gitter rooms are unencrypted (right?) so I'm going to try this again with an unencrypted room.

Answer 9 · 2023-02-08T14:33:41.000Z

Without encryption, Element Web appears to be loading the hidden event at a rate of 1000 per 7 seconds, which is more reasonable (if not great).

Answer 10 · 2023-02-08T14:36:29.000Z

Although having said that, even when its count of events has reached 10K, it's still processing very heavily and mostly unusable for several minutes. Trying to get a profile.

Answer 11 · 2023-02-08T14:45:57.000Z

It appears to be calling processSyncResponse ~1000 times / second and doSync ~500 times / second

Profile is here: https://share.firefox.dev/3JWEZ7b

Answer 12 · 2023-02-08T14:46:21.000Z

5-10 minutes later it's still unresponsive.

Answer 13 · 2023-02-09T13:07:24.000Z

Nope, I misread the profile. It's spending a lot of time inside doSync, but not (necessarily) calling it a lot.

Answer 14 · 2023-02-09T14:26:48.000Z

MessagePanel.shouldShowEvent is being called many, many times, taking ~20ms per time. Also Room.eventShouldLiveIn inside there.

Answer 15 · 2023-02-09T14:59:16.000Z

Something is happening repeatedly for all 10K events whenever we re-render.

In MessagePanel.getTiles we have only 5 events (in my test case) so it must be above there.

Answer 16 · 2023-02-09T16:22:13.000Z

I have a test that crashes node:

In MessagePanel-test.tsx:

    it("should handle large numbers of hidden events quickly", () => {
        const events = [];
        for (let i = 0; i < 10000; i++) {
            events.push(
                TestUtilsMatrix.mkEvent({
                    event: true,
                    type: "unknown.event.type",
                    content: { key: "value" },
                    room: "!room:id",
                    user: "@user:id",
                    ts: 1000000 + i,
                }),
            );
        }
        render(getComponent({ events }, { showHiddenEvents: false }));
    });

crashes with:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

This seems bad :-)

Answer 17 · 2023-02-09T16:25:03.000Z

Adjusting the 10000 above to 1000 passes the test but it runs in 1395 ms which seems slow, so I can investigate from here.

Answer 18 · 2023-02-09T16:27:25.000Z

If I replace getEventTiles() with return [] the test passes in 55ms, so I can zoom in on that.

Answer 19 · 2023-02-09T17:17:44.000Z

If I replace getNextEventInfo in MessagePanel with a simple impl, the test passes in 80ms.
This code contains a deeply suspicious array.slice, so we might be getting somewhere.

Answer 20 · 2023-02-09T17:21:31.000Z

Removing the slice didn't help, so I'll have to think more deeply :-(

Answer 21 · 2023-02-09T17:31:18.000Z

We are re-running shouldShowEvent O(n^2) times, so I am experimenting with briefly caching the results to make it O(n).

Answer 22 · 2023-02-09T17:35:04.000Z

That has helped a lot, but it still looks like we are calling sync 10K times, or maybe many more times than that.

Answer 23 · 2023-02-10T07:12:02.000Z

Removing X-Needs Product for now as we may be able to fix the performance issue without a UX change.

Answer 24 · 2023-02-10T09:49:31.000Z

I created element-hq/element-web#24480 to track further performance work. I think this issue should be used to think about batch sizes for /messages API, which is something I didn't consider so far, because the performance problems mask the need to do that.

FWIW I think we should probably double the batch size every time we receive a full batch of hidden events, up to a max of 1000, which seems to be Synapse's default max.