mautrix/signal

Bridge crashes when receiving a new message in a group

ZaeGitHub opened this issue · 11 comments

I am receiving the following when using the bridge via bbctl (Beeper's self-hosted version of this bridge) and receiving a new message to a group chat:

goroutine 2210 [running]:
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChange(0xc0002fa000, {0x1ab74b8, 0xc000b22000}, 0xc00081dda0, {0xc000942420, 0x2c}, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:946 +0x350f
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChanges(0xc0002fa000, {0x1ab74b8, 0xc000b22000}, 0xc0009bc080, {0xc000942420, 0x2c})
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1884 +0x41b
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).GetGroupHistoryPage(0xc0002fa000, {0x1ab74b8, 0xc000b22000}, {0xc000406ed0, 0x2c}, 0x1, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1858 +0x716
main.(*Portal).catchUpHistory(0xc0000c9860, 0xc000186c00, 0x1, 0x7, 0x18eca204943)
        /builds/mautrix/signal/portal.go:941 +0x449
main.(*Portal).handleSignalDataMessage(0xc0000c9860, 0xc000186c00, 0xc0000cadc0, 0xc0008f0240)
        /builds/mautrix/signal/portal.go:902 +0x3f5
main.(*Portal).handleSignalMessage(0xc0000c9860, {0xc000281540?, 0xc000186c00?})
        /builds/mautrix/signal/portal.go:871 +0x194
main.(*Portal).messageLoop(0xc0000c9860)
        /builds/mautrix/signal/portal.go:358 +0xa5
created by main.(*SignalBridge).NewPortal in goroutine 1043
        /builds/mautrix/signal/portal.go:240 +0x725
Bridge exited
exit status 2

Same here. The error message I get is:

panic: runtime error: cannot convert slice with length 0 to array or pointer to array with length 65

goroutine 1287 [running]:
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChange(0xc000316000, {0x1ab74b8, 0xc00087e7e0}, 0xc00001fe00, {0xc0003b82d0, 0x2c}, 0x0)
       /builds/mautrix/signal/pkg/signalmeow/groups.go:1002 +0x34ef
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChanges(0xc000316000, {0x1ab74b8, 0xc00087e7e0}, 0xc000703ac0, {0xc0003b82d0, 0x2c})
       /builds/mautrix/signal/pkg/signalmeow/groups.go:1884 +0x41b
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).GetGroupHistoryPage(0xc000316000, {0x1ab74b8, 0xc00087e7e0}, {0xc000900660, 0x2c}, 0x1, 0x0)
       /builds/mautrix/signal/pkg/signalmeow/groups.go:1858 +0x716
main.(*Portal).catchUpHistory(0xc0006e05b0, 0xc000000840, 0x1, 0xf, 0x18eccdcb1cd)
       /builds/mautrix/signal/portal.go:941 +0x449
main.(*Portal).handleSignalDataMessage(0xc0006e05b0, 0xc000000840, 0xc0002fa960, 0xc000409b00)
       /builds/mautrix/signal/portal.go:902 +0x3f5
main.(*Portal).handleSignalMessage(0xc0006e05b0, {0xc000283800?, 0xc000000840?})
       /builds/mautrix/signal/portal.go:871 +0x194
main.(*Portal).messageLoop(0xc0006e05b0)
       /builds/mautrix/signal/portal.go:358 +0xa5
created by main.(*SignalBridge).NewPortal in goroutine 61
       /builds/mautrix/signal/portal.go:240 +0x725
Bridge exited
exit status 2

It looks similar to #466, but I'm using bridge version 0.5.1 so it should already have the fix for that.

The failure is on trying to typecast the UserId of PromotePendingMember to UUIDCiphertext. Apparently the UserId is zero bytes in length, while UUIDCiphertext is 65 bytes, hence the panic.
I'm not sure why the UserId would be empty, because so far my understanding was that the PromotePendingMember group change action essentially shares the profileKey with the group, while before only the ACI was known. I wouldn't know how only the profileKey could be used to know which ACI the groupchange action belongs to.
A quick fix is probably to check for UserId length and skip if the length is zero - that will probably cause some breakage, but should get rid of the crash. Or return an error to trigger a group sync.
I would probably need to do quite a bit of digging to figure out how to fix this, but I'm not sure if I'm going to have time before next week.

Here's probably the answer:
https://github.com/signalapp/Signal-Android/blob/30cc3ff9fc33ed6c2a867df14e7e0ba11cdfa7be/libsignal-service/src/main/java/org/whispersystems/signalservice/api/groupsv2/GroupsV2Operations.java#L639
So apparently both the profilekey and the UserId can be empty, in which case we need to decrypt the presentation (and generate profilekey and userid from it). Which is odd, because https://github.com/signalapp/Signal-Android/blob/30cc3ff9fc33ed6c2a867df14e7e0ba11cdfa7be/libsignal-service/src/main/protowire/Groups.proto#L116 clearly says that presentation would only be set when sending to the server, which is not the case here. Maybe someone forgot to remove the comment there.

I still might not be able to fix this before next week :/ converting presentation to profilekey and userid is not implemented yet.

@ZaeGitHub @shrugal
could you please try whether this fixes your issue:
#498

I attached a binary in case you don't want to set up a development setup
you should be able to trigger the bug (if it's still there or I made a new one) by sending a message from the client to an affected group. Observe the logs.

Not yet running the development version, but in case it helps with this version. The bridge crashes when I send a reaction on a message in the group chat:
panic: runtime error: cannot convert slice with length 0 to array or pointer to array with length 65

goroutine 749 [running]:
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChange(0xc000000a80, {0x1ab88d8, 0xc00085bcb0}, 0xc00060a7e0, {0xc0009803c0, 0x2c}, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:946 +0x350f
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChanges(0xc000000a80, {0x1ab88d8, 0xc00085bcb0}, 0xc0004c6fc0, {0xc0009803c0, 0x2c})
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1884 +0x41b
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).GetGroupHistoryPage(0xc000000a80, {0x1ab88d8, 0xc00085bcb0}, {0xc0008ff3e0, 0x2c}, 0x1, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1858 +0x716
main.(*Portal).catchUpHistory(0xc000892340, 0xc00026a000, 0x1, 0x7, 0x18ecef3d1da)
        /builds/mautrix/signal/portal.go:941 +0x449
main.(*Portal).handleSignalDataMessage(0xc000892340, 0xc00026a000, 0xc0000b8500, 0xc0008fa6c0)
        /builds/mautrix/signal/portal.go:902 +0x3f5
main.(*Portal).handleSignalMessage(0xc000892340, {0xc0007be800?, 0xc00026a000?})
        /builds/mautrix/signal/portal.go:871 +0x194
main.(*Portal).messageLoop(0xc000892340)
        /builds/mautrix/signal/portal.go:358 +0xa5
created by main.(*SignalBridge).NewPortal in goroutine 85
        /builds/mautrix/signal/portal.go:240 +0x725
Bridge exited
exit status 2

Or when I send a message to the group chat as well:

panic: runtime error: cannot convert slice with length 0 to array or pointer to array with length 65

goroutine 764 [running]:
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChange(0xc00010ac00, {0x1ab88d8, 0xc000614450}, 0xc0005f2f00, {0xc0009058f0, 0x2c}, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:946 +0x350f
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).decryptGroupChanges(0xc00010ac00, {0x1ab88d8, 0xc000614450}, 0xc000323d80, {0xc0009058f0, 0x2c})
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1884 +0x41b
go.mau.fi/mautrix-signal/pkg/signalmeow.(*Client).GetGroupHistoryPage(0xc00010ac00, {0x1ab88d8, 0xc000614450}, {0xc000888390, 0x2c}, 0x1, 0x0)
        /builds/mautrix/signal/pkg/signalmeow/groups.go:1858 +0x716
main.(*Portal).catchUpHistory(0xc000bdc270, 0xc00010aa80, 0x1, 0x7, 0x18ecef459f3)
        /builds/mautrix/signal/portal.go:941 +0x449
main.(*Portal).handleSignalDataMessage(0xc000bdc270, 0xc00010aa80, 0xc0005e6aa0, 0xc0004acd80)
        /builds/mautrix/signal/portal.go:902 +0x3f5
main.(*Portal).handleSignalMessage(0xc000bdc270, {0xc000323b40?, 0xc00010aa80?})
        /builds/mautrix/signal/portal.go:871 +0x194
main.(*Portal).messageLoop(0xc000bdc270)
        /builds/mautrix/signal/portal.go:358 +0xa5
created by main.(*SignalBridge).NewPortal in goroutine 62
        /builds/mautrix/signal/portal.go:240 +0x725
Bridge exited
exit status 2

Unfortunately I am unable to test the new binary, because I use bbctl, and it automatically overwrites the new development binary when you run the bridge using bbctl (Beeper Bridge Manager):

Failed to get current bridge version: exit status 127 - reinstalling
Finding latest version of mautrix-signal from mau.dev
Installing mautrix-signal (commit: e369a56c)

If you know a way around this, let me know.

It's all the same error. UserId has length zero so it cannot be converted to a type that is length 65.

Its triggered by each message (reaction or regular) because each message comes with a group revision that is later than the revision the bridge knows about. This triggers a sync that causes a panic because some userid has length zero.

I should be able to verify this by writing a function that fetches and decrypts random group history until I find an item that also has userid of length zero, but it would be easier if I didn't have to.

I've never used bbctl, so I can't tell you.

Maybe @tulir wants to weigh in? 😄

While waiting for the other person to chime in - if it helps... I think I know what is causing the UserId length to be 0, potentially. This could help you reproduce.

This group chat has 1 person in it that I do not have in my phone's contacts. They were added by someone else. This is clear when I open the group chat's participants, as they don't have a "contact" icon.

Okay, apologies for the second comment here. I registered your dev version of the bridge as a 3rd-party bridge in bridge manager and configured it. The fix appears to work! In terms of not crashing, anyway.

I'm not seeing replies to my messages synced back to Beeper, but that could be me misconfiguring it I suppose.

Setting up an entirely new bridge would re-create portals from scratch, it would not try to catch up on group history. So it won't see the relevant group change (with userid of length zero)

@ZaeGitHub @shrugal could you please try whether this fixes your issue: #498

I attached a binary in case you don't want to set up a development setup you should be able to trigger the bug (if it's still there or I made a new one) by sending a message from the client to an affected group. Observe the logs.

The binary didn't work with bbctl run --custom-startup-command for some reason (fork/exec /data/binaries/mautrix-signal: no such file or directory), but I compiled it myself and it seems to fix the issue for me.

@ZaeGitHub @shrugal could you please try whether this fixes your issue: #498
I attached a binary in case you don't want to set up a development setup you should be able to trigger the bug (if it's still there or I made a new one) by sending a message from the client to an affected group. Observe the logs.

The binary didn't work with bbctl run --custom-startup-command for some reason (fork/exec /data/binaries/mautrix-signal: no such file or directory), but I compiled it myself and it seems to fix the issue for me.

Following this guide, I can confirm this also appears to fix the problem for me.