Filter out flood of member & hidden event spam when we detect the scrollback is full of it
Opened this issue ยท 24 comments
Originally opened as a element-web
issue (on 2022-04-22) that was incorrectly moved to #491 and then a discussion
Your use case
Why would you like to do it?
Rooms can be overwhelmed by bulk spam users joining rooms (thousands and thousands). Each one of those joins and leaves creates an event in the timeline.
Currently in rooms like this and trying to scrollback, you just get stuck on the thousands of member events that we only paginate 20 at a time. Each request is so slow and it doesn't even get me further back in actual results I want to see.
The goal of this change is to make the room scrollback usable again and be able to view the history of the room. Otherwise, when these spam incidents occur, that whole time period in the room is essentially a black hole.
What would you like to do? / How would you like to achieve it?
When we detect that the whole /messages
response is filled with m.room.member
join
, leave
, and invite
events, we can ask the user whether they want to continue scrolling back without them. If they accept, we should add a filter
to /messages
to not include them.
Here is a mockup of what the user prompt could look like: "It looks like you're paginating through a lot of member events, would you like to scrollback without them?"
Another option is to automatically start back-paginating by a much bigger value (500).
Another option is to use MSC3030 jump to date to jump past all of the messages. Behind the scenes, we could use /messages
with a filter to find the spot and then jump.
Have you considered any alternatives?
It's possible to hide all join/leave messages in the timeline with Settings -> Preferences -> Timeline section -> toggle the Show join/leave messages (invites/removes/bans unaffected) (showJoinLeaves
) setting. But this just affects the display of the event. It doesn't help with filtering them out of the /messages
pagination requests to being with to speed things up and get to the results we care about.
Additional context
- Hide member events
- Filter out member events when paginating
/messages
- Scrollback should filter member events when there is too many
- Scrollback is slow and filled with member events
- Flood of member state spam
- Filter out bulk spam member events when we detect the scrollback is full of them
Re-opening here as we're seeing this in the case of Gitter rooms where I synced the room membership. Giant blocks of membership that are impossible to paginate past. My original proposal still seems reasonable to me but this is really just tracking the problem with one potential solution of many.
In the last issue, @t3chguy noted some caveats with historical profiles not working correctly if we skip fetching membership events. And also affects push rules since historical profiles are needed to evaluate if a given message pings. These seem minor in comparison to the room being unnavigable though. And are technical problems we can overcome like with using MSC3952 for intentional mentions or just ignore the problem since the chance of changing your profile (and receiving a notification) in a scenario like this is probably very small so you probably won't miss any notifications anyway.
Related:
- element-hq/element-web#19086 (suggested by @catscratchedme)
- element-hq/element-web#22662 (not a duplicate)
This also impacts things such as live location sharing. In that case, however, the events are in the timeline and possibly encrypted. It would be nice if we could handle these cases with the same approach but I think the /messages
filter wouldn't work there, right?
Not entirely sure, if what I'm seeing is covered by this issue here (see also element-hq/roadmap#26 (comment)) - but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im). I got multiple people to confirm this using element desktop and element web. I.e. ever since the mass joining of gitter users, the room remains pretty much unusable via element.
This is having an impact on the Gitter migration, so might need prioritising @daniellekirkwood @Johennes
but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im).
@ekpyron Please note that the setting mentioned in the issue won't help at all. As mentioned in the description "[that setting] just affects the display of the event. It doesn't help with filtering them out of the /messages
pagination requests to being with to speed things up and get to the results we care about."
Investigating this today. The first thing is to establish the behaviour, since we thought there might be a bug where we don't actually keep back-paginating when we should.
I've created a room on my local synapse with 10000 hidden events followed by some chat messages. When Element Web tries to display the room, it does keep making requests to the messages
API, but they get slower and slower until it seems to grind to a halt.
Looking at the actual requests from my Synapse, according to Firefox they are taking ~9ms consistently, so the slowdown is in the client.
When I set the limit to 2000 instead of 20, I got responses of size 1000, presumably due to a Synapse limit.
After the first 1000 were received, Element Web slowed to a halt and didn't request the next batch for a long time.
This is the most important thing to investigate, I think.
Running it through the Firefox profiler, I see almost all the time is spent inside decryptGroupMessage
, and actually in the WASM code of olm. This might be a red herring, or at least a different problem, because I am assuming the Gitter rooms are unencrypted (right?) so I'm going to try this again with an unencrypted room.
Without encryption, Element Web appears to be loading the hidden event at a rate of 1000 per 7 seconds, which is more reasonable (if not great).
Although having said that, even when its count of events has reached 10K, it's still processing very heavily and mostly unusable for several minutes. Trying to get a profile.
It appears to be calling processSyncResponse
~1000 times / second and doSync
~500 times / second
Profile is here: https://share.firefox.dev/3JWEZ7b
5-10 minutes later it's still unresponsive.
Nope, I misread the profile. It's spending a lot of time inside doSync, but not (necessarily) calling it a lot.
MessagePanel.shouldShowEvent
is being called many, many times, taking ~20ms per time. Also Room.eventShouldLiveIn
inside there.
Something is happening repeatedly for all 10K events whenever we re-render.
In MessagePanel.getTiles
we have only 5 events (in my test case) so it must be above there.
I have a test that crashes node:
In MessagePanel-test.tsx
:
it("should handle large numbers of hidden events quickly", () => {
const events = [];
for (let i = 0; i < 10000; i++) {
events.push(
TestUtilsMatrix.mkEvent({
event: true,
type: "unknown.event.type",
content: { key: "value" },
room: "!room:id",
user: "@user:id",
ts: 1000000 + i,
}),
);
}
render(getComponent({ events }, { showHiddenEvents: false }));
});
crashes with:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
This seems bad :-)
Adjusting the 10000 above to 1000 passes the test but it runs in 1395 ms which seems slow, so I can investigate from here.
If I replace getEventTiles()
with return []
the test passes in 55ms, so I can zoom in on that.
If I replace getNextEventInfo
in MessagePanel
with a simple impl, the test passes in 80ms.
This code contains a deeply suspicious array.slice, so we might be getting somewhere.
Removing the slice didn't help, so I'll have to think more deeply :-(
We are re-running shouldShowEvent
O(n^2) times, so I am experimenting with briefly caching the results to make it O(n).
That has helped a lot, but it still looks like we are calling sync 10K times, or maybe many more times than that.
Removing X-Needs Product
for now as we may be able to fix the performance issue without a UX change.
I created element-hq/element-web#24480 to track further performance work. I think this issue should be used to think about batch sizes for /messages
API, which is something I didn't consider so far, because the performance problems mask the need to do that.
FWIW I think we should probably double the batch size every time we receive a full batch of hidden events, up to a max of 1000, which seems to be Synapse's default max.