matrix-org/matrix-bifrost

Duplicated messages in XMPP bridged IRC rooms.

Rampoina opened this issue · 17 comments

Describe the bug
Duplicated messages are appearing

To Reproduce

  1. Join https://matrix.to/#/#xmpp_#general_irc.freegamedev.net:matrix.org
  2. Check the duplicated messages from 05-06-2021 from 12:51 UTC

Expected behavior
No duplicated messages

Screenshots
2021-06-05-151628_668x877_scrot

Desktop (please complete the following information):

  • OS: Arch linux
  • Browser: Electron (Element Desktop)
  • Version: 1.7.28

Additional context
At first the behaviour seemed to be related to replying, but then it happened spontaneously.

The room in question is an irc channel bridged to XMPP with Bibomi

This issue seems to be related to #210

poVoq commented

It should be noted that this is purely Matrix side. Neither IRC nor XMPP users see the duplicated messages.

Restarting the Biboumi gateway seems to temporarily fix the issue.

@Half-Shot: Have you looked this bug?

Right, this is a bit of a can of worms, and boils down to the way biboumi send messages to MUCs.

Episode 1: the biboumi duplicates

First, as an example of what happens when bifrost talks to a regular MUC. We have an XMPP connection open for each matrix user, each of them receiving messages. In order to avoid duplicate messages there we utilize the fact that the same message is going to have the same ID (here) and drop it if it shows up.

For demonstration purposes, we have two matrix users in a matrix room bridged to a regular MUC with bifrost. When a message is sent to the MUC from XMPP (and with some extra logging added), this happens:

Aug-3 14:23:21.260 INFO XmppJsInstance Got text message "message from pidgin to conf.localhost". ID: purple29d69ec
Aug-3 14:23:21.262 INFO XmppJsInstance Duplicate stanza received (purple29d69ec), ignoring

We properly figure out that the same message arrived twice, and all is well.

When a matrix user sends a message to the room all is good also: we record the message that we've sent to XMPP the same way (note the Matrixy message ID) and ignore it when it comes back:

Aug-3 14:28:21.125 INFO XmppJsInstance Duplicate stanza received ($2JtTaL3WVKyOoC_KjA3Xqbd5dMKYcZngWZcZmKlSu0s), ignoring
Aug-3 14:28:21.125 INFO XmppJsInstance Duplicate stanza received ($2JtTaL3WVKyOoC_KjA3Xqbd5dMKYcZngWZcZmKlSu0s), ignoring

So far so good. Problems arise the moment biboumi comes into play. For a similar setup (two matrix users in a bridged room), when a message is sent from XMPP, biboumi logs the following:

[DEBUG]: src/xmpp/xmpp_component.cpp:162:       XMPP RECEIVING: <message from='buddy@localhost/sfqTpaGN' id='purple29d69f3' to='#duplifest%ngircd.local@biboumi.localhost' type='groupchat' xmlns='jabber:component:accept'><body xmlns='jabber:component:accept'>hello from pidgin!</body></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='purple29d69f3' to='buddy@localhost/sfqTpaGN' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='c2e16a29-4f72-439e-9984-7d855d23acc9' xmlns='urn:xmpp:sid:0'/></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='27fda391-26cd-4ab3-995c-328e26509709' to='test2_localhost@matrix.localhost/matrix-bridge' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='fd7f62f4-4567-4ce9-91fd-4bb8f408cdb2' xmlns='urn:xmpp:sid:0'/></message>
[DEBUG]: src/xmpp/xmpp_component.cpp:77:        XMPP SENDING: <message from='#duplifest%ngircd.local@biboumi.localhost/buddy' id='232a1fe4-abed-49da-8cb6-77da5d1886cd' to='test_localhost@matrix.localhost/matrix-bridge' type='groupchat'><body>hello from pidgin!</body><stanza-id by='#duplifest%ngircd.local@biboumi.localhost' id='0493d5d8-8956-4c22-92fa-46b455cc80db' xmlns='urn:xmpp:sid:0'/></message>

Biboumi sends the message back to the sender with the original ID (fair), but then also forwards them to the matrix users, with a unique ID each time. Bifrost gets confused here and duplicates the message since it has no way of telling that they're in fact the same:

Aug-3 14:30:07.823 INFO XmppJsInstance Got text message "hello from pidgin!". ID: 27fda391-26cd-4ab3-995c-328e26509709
Aug-3 14:30:07.825 INFO XmppJsInstance Got text message "hello from pidgin!". ID: 232a1fe4-abed-49da-8cb6-77da5d1886cd

This is why the duplicate message bug happens 1) only on the matrix side of things 2) only when bifrost interacts with biboumi handling the MUC.

Similar thing happens when a Matrix user speaks in the IRC room. Biboumi resends the same message with the same ID to the sender XMPP connection (which gets dropped as it should), but then it sends the same message to the other Matrix user's XMPP connection with a different ID, so that gets propagated to matrix unnecessarily.

Ideally biboumi would be more considerate here, but since we can't really expect that in the wild, a way out of this situation is for the xmpp.js backend to opt-in to the message deduplicator that purple uses (which ignores duplicate room-sender-messagebody entries) which we currently disable here.

Episode 2: bifrost's message echoing

The problem seemingly goes away after enabling the deduplication, at least until the bridge gets restarted. Once it does, deduplicator loses track of its usersInRoom counter (which is not persistent), and inserts undefined amount of hashes into its expectedMessages – causing it to echo back the messages that we get back from biboumi.

To be continued...

@tadzik very interesting, thanks for sharing your findings.

Ideally biboumi would be more considerate here

Is there anything particual you have on your mind?

@tadzik very interesting, thanks for sharing your findings.

Ideally biboumi would be more considerate here

Is there anything particual you have on your mind?

The easiest thing for us would be biboumi sending each identical message with the same ID – it currently seems to do that when resending the original message that came from XMPP (see the biboumi log above), but generates new IDs for subsequent copies (same log). Not sure if the standard cares either way, to be honest, and whether we should actually expect that to be true in bifrost or not, but it'd certainly make the problem go away :) MUCs in regular XMPP servers seem to behave that way, which is why only the biboumi interaction is proving problematic.

poVoq commented

I think this just exposes an edge-case as in XMPP the source stanza ids can not be expected to be the same all the time. But don't quote me on that :) Anyways, probably better to find a way to deal with that Bifrost side as you already mentioned.

@selurvedu @poVoq biboumi is ignoring a SHOULD as of 2018 here. It seems this was discussed shortly before the standard was updated.

the service MUST change the 'from' attribute to the sender's occupant JID and reflect the message out to the full JID of each occupant.
The service SHOULD reflect the message with the same 'id' that was generated by the client, to allow clients to track their outbound messages. If the client did not provide an 'id', the server MAY generate an 'id' and use it for all reflections of the same message
Note: the requirement to reflect the 'id' attribute was added in version 1.31 of this XEP.
Version 1.31 (2018-03-06)

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

@emorrp1 yes, we discussed this with @tadzik today in Biboumi Discussion chat:

(13:44:14) selurvedu: tadzik, thanks for replying so fast here #251 (comment)

If I got what's hapening there right, the duplicate messages on the Matrix side appear only when there are two or more users that joined via Matrix<-bifröst->XMPP<-biboumi->IRC since the messages get reflected to each individual groupchat occupant, as the XEP says, and bifröst has no idea the messages are the same.

(13:45:49) tadzik: selurvedu: correct. However, like I mentioned, this only happens with Biboumi, not with regular pure XMPP MUCs. It seems like a regular MUC does reflect a message to each occupant, yes, but sends them with the same message ID – so bifrost sees the second message and goes "ah okay, we've had that before"

(13:46:16) tadzik: whereas the messages that biboumi emits all have distinct IDs, which makes bifrost treat them as distinct

(13:46:51) tadzik: I'm not sure if XEP has an opinion either way on this, but all XMPP servers we've encountered so far (except biboumi) seem to use the same ID for each reflected message

(13:47:02) tadzik: (and so does bifrost for when it does that)

(13:48:55) selurvedu: I was gonna ask if anyone here has an opinion on sending multiple message stanzas with same id to multiple recipients, but it appears that this example from the XEP suggests doing so: https://xmpp.org/extensions/xep-0045.html#message

Below the example there's a paragraph:

The service SHOULD reflect the message with the same 'id' that was generated by the client, to allow clients to track their outbound messages. If the client did not provide an 'id', the server MAY generate an 'id' and use it for all reflections of the same message (e.g. using a UUID as defined in RFC 4122 [18]).

This does not say the service should preserve the id for reflections sent to other occupants as well, but that example does so, so I guess it's not forbidden at least.

So I have a question to the devs instead: do different ids for reflected messages serve a purpose? If not, could those be carried over from the incoming message?

(13:49:24) selurvedu: > I'm not sure if XEP has an opinion either way on this, but all XMPP servers we've encountered so far (except biboumi) seem to use the same ID for each reflected message

Cool, thanks for confirming that.

(13:50:21) tadzik: ah, good find, thanks for that. So we SHOULD generally be able to rely on that, but it's not a MUST so we'd best be ready for some server implementations diverging on that

(13:53:27) selurvedu: tadzik, right. But still, if biboumi devs are not against using the same ids and if this will get changed, I don't think a fix on the bifrost side would be necessary. On the other hand, as you said, it may still be needed for other server implementations, or just for older biboumi installations.

(13:55:31) tadzik: right, bifrost should not need changes if biboumi changes its behaviour

(13:55:42) tadzik: but ideally it'd be more resilient regardless

Thanks for pointing at the issue in the biboumi issue tracker. I can see there that preserving the id for the first reflection was suggested by ge0rg.

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

As I said in the chat, it would be happy if biboumi did this, however, I don't see where it is explicitly stated that all reflections should have the same id, not just the one that is being reflected back to the sender. Same id for all reflections is only shown in the example. Could you elaborate how you came to such conclusion?

@Neustradamus don’t do that.

@tadzik I have not done any significant developement on biboumi; however the issue here is that each XMPP<->IRC connection is independent from the other (which helps for security reasons, e.g. a user banned from a chatroom can no longer receive the history of the room, etc), and changing this for message IDs might be quite a bit of work (biboumi was designed to be an individual bridge, which is why this has never been an issue). This might better be explored in a new biboumi issue though.

Even if you interpret the first SHOULD sentence as only requiring the id to be reflected back to the original sender, the second clearly states the same ID should be reused for all reflections of the same message and the example given backs it up.

As I said in the chat, it would be happy if biboumi did this, however, I don't see where it is explicitly stated that all reflections should have the same id, not just the one that is being reflected back to the sender. Same id for all reflections is only shown in the example. Could you elaborate how you came to such conclusion?

I'll trim a few words from the XEP to hopefully make it clearer. I don't believe I'm changing any meaning, just bringing phrases closer together to show how heavily its implied by the overall phrasing of the definition:

and "reflect the message" out to ... each occupant ... "reflect the message" with the same 'id' ... use it for all "reflections of the same message"

don’t do that

+1

@mathieui

the issue here is that each XMPP<->IRC connection is independent from the other (which helps for security reasons, e.g. a user banned from a chatroom can no longer receive the history of the room, etc)

Interesting. I did not think of that.

@emorrp1 thanks for clarifying, I see it now.

I'll trim a few words from the XEP to hopefully make it clearer. I don't believe I'm changing any meaning, just bringing phrases closer together to show how heavily its implied by the overall phrasing of the definition:

and "reflect the message" out to ... each occupant ... "reflect the message" with the same 'id' ... use it for all "reflections of the same message"

I will add that the only sane way is for MUCs to send the same IDs to everyone, because otherwise any feature such as corrections, retractions, references, etc, which rely on message IDs will be broken, unless the MUC intercepts and rewrites everything (which is insane and will not work when the conversation is encrypted).

Biboumi is not violating the spec as the messages you receive from other users are from IRC after being (possibly) sent from biboumi to IRC, they are not from biboumi itself.

louiz commented

As mathieui said, a same channel (on the same IRC server) is a different room for each different biboumi user.
Biboumi doesn’t know that #c++%irc.freenode.net@biboumi.localhost for user louiz@localhost is the same channel as #c++%chat.freenode.net@biboumi.localhost for user mathieui@localhost. The two are not linked in any way.

The only way this could be done in biboumi would be to have deterministic IDs, like hashing the IRC nick + message + date + something (well, not the date, since the message may arrive at different times on each socket). But this probably violates many XEPs, and will certainly create some new edge-cases.

I don’t think that’s possible.

@louiz

a same channel (on the same IRC server) is a different room for each different biboumi user.

I see, this is what I feared the most. I was thinking it's like a regular MUC. Thank you for clarifying. It seems that bifrost will have to figure our its own deduplication mechanism indeed.

poVoq commented

Just for reference: https://git.singpolyma.net/cheogram-muc-bridge
Is having similar issues and seemingly does the de-duplication with the help of a bot user in the channel, but it is still work in progress and doesn't work perfectly either.

Confirming this is still an issue, we're considering it a S-Minor as it only impacts IRC bridged rooms for the moment. That said, it's still something we'd like to solve.