42wim/matterircd

Insane amount of replays, with 13/14 second frequency

guilhermepiccoli opened this issue ยท 14 comments

Hi @42wim , I'm running matterircd built yesterday, at commit ed39f40. I've seen an insane amount of replays this morning (I have DisableAutoView set), like hundreds or more. In #town-square, the only message was some user leaving..so it kept repeating the following at every 13/14 seconds:

Replaying since 2020-08-17 20:54:14
<some_user> [20:54] some_user removed from the team.
Replaying since 2020-08-17 20:54:14
<some_user> [20:54] some_user removed from the team.
Replaying since 2020-08-17 20:54:14

Thanks in advance for any help with this!

42wim commented

The replaying is only triggered on connect, is this something you can repeat every time when you start matterircd and connect?

If yes, then please add debug and tracing to your matterircd.toml and put the logs somewhere.

debug=true
trace=true

feel free to join also the #matterircd channel on irc.freenode.net

Just for the sake of the registry , I talked with Wim on freenode, it seems in my case we have no login/logout messages, but I could see 413 "Found version 5" strings, one every 13/14 sec, until I restart matterircd.

Tnx again for the great support Wim =)

Hey @42wim , thanks for adding more debug in matterircd! Today the issue reproduced, and I was able to see the following, in a repeating manner:

time="2020-08-19T16:54:25+02:00" level=debug msg="WS PING" prefix=matterclient time="2020-08-19T16:54:25+02:00" level=debug msg="sendWsRequest &model.WebSocketRequest{Seq:1, Action:"ping", Data:map[string]interface {}(nil), Session:model.Session{Id:"", Token:"", CreateAt:0, ExpiresAt:0, LastActivityAt:0, UserId:"", DeviceId:"", Roles:"", IsOAuth:false, Props:model.StringMap(nil), TeamMembers:[]*model.TeamMember(nil), Local:false}, T:(i18n.TranslateFunc)(nil), Locale:""}" prefix=matterclient time="2020-08-19T16:54:30+02:00" level=debug msg="StatusLoop() timeout" prefix=matterclient time="2020-08-19T16:54:30+02:00" level=debug msg="logout as gpiccoli (team: teamname) on chat.teamname.org" prefix=matterclient time="2020-08-19T16:54:30+02:00" level=info msg="Found version 5.23.0.5.23.1.2604dfb92139cce46aebb9d34323f6a7.true" prefix=matterclient time="2020-08-19T16:54:30+02:00" level=debug msg="trying login teamname gpiccoli chat.teamname.org" prefix=matterclient time="2020-08-19T16:54:30+02:00" level=debug msg="trying login with personal token" prefix=matterclient time="2020-08-19T16:54:31+02:00" level=info msg="found 665 users in team teamname" prefix=matterclient time="2020-08-19T16:54:31+02:00" level=debug msg="initUser(): found our team teamname (id: sqmc4sz45prypmkfctwynm5yjr)" prefix=matterclient time="2020-08-19T16:54:31+02:00" level=debug msg="WsClient: making connection: wss://chat.teamname.org/api/v4/websocket" prefix=matterclient

Notice that, after some minutes, I just killed matterircd and restarted..problem vanished! So, seems a bad condition that is transient but triggers a kind of loop of reconnects...
Is it enough logging to understand the issue? Wanna add more logs? Or maybe I should run with trace? Thanks again!

42wim commented

No, the StatusLoop() timeout says enough, it means the server stopped responding to pings.
Could be a flaky connection, but I'm not sure yet why the reconnects keeps reconnecting after being connected ;-)

OK, to be sure I understood that properly: the mattermost server stopped responding to pings? Or my ZNC stopped responding pings?

If the mattermost server, I'll talk to the company infrastructure team, this is odd...or could be my vps also.
One interesting thing is that I'm not seeing the flood of replays anymore; it seems to be replaying only once.

Just a small addition: I just faced another StatusLoop() timeout, but this time I noticed the following right before:

"time="2020-08-21T20:58:34+02:00" level=error msg="error:websocket: unexpected reserved bits 0x60" prefix=matterclient"

Not sure if it helps, hopefully it does. Thanks again!

42wim commented

should be fixed in master.

Tested with master and confirmed fixed. I can still see my client reconnecting, but no flood of replays nor missed messages so far.

Thanks @42wim !

Gah, spoke too soon.

Even with the reconnecting issue fixed, it seems with the latest matterircd there are periods where no messages are relayed / received.

should be fixed in master.

@42wim thanks! Do you have a specific commit to point me? Just so I can understand how the fix works ?

42wim commented

@guilhermepiccoli 15b7612 rewrite of matterclient connection

Thanks, gonna test now =)

42wim commented

@hloeung is this still happening on master for you?

@42wim fixed with master. Thank you, much appreciated.