anycable/anycable-client

Reliable streams reconnection - SubscriptionTimeoutError: Haven't received subscription ack in 10000ms

adam-h opened this issue · 5 comments

adam-h commented

Tell us about your environment

npm packages:
@anycable/turbo-stream @ 0.4.0
@anycable/core @ 0.7.9
@anycable/web @ 0.7.3

import '@hotwired/turbo'
import { start } from '@anycable/turbo-stream'
import { createCable } from '@anycable/web'

start(createCable({ protocol: 'actioncable-v1-ext-json' }))

AnyCable-Go version:
anycable-go-pro:1.4.6-alpine

      ANYCABLE_HOST: "0.0.0.0"
      ANYCABLE_PORT: 8080
      ANYCABLE_REDIS_URL: redis://redis:6379/3
      ANYCABLE_RPC_HOST: host.docker.internal:5100
      ANYCABLE_DEBUG: 1
      ANYCABLE_JWT_ID_KEY: my-key-
      ANYCABLE_JWT_ID_ENFORCE: "true"
      ANYCABLE_HEADERS: "cookie,origin"
      ANYCABLE_PRESETS: "broker"

AnyCable gem version:
anycable-rails (1.4.0)
anycable (1.4.3)

default: &default
  access_logs_disabled: false
  log_grpc: false
  broadcast_adapter: redisx
  redis_channel: "__anycable__"
  jwt_id_key: '<%= ENV.fetch("ANYCABLE_JWT_ID_KEY") %>'
  jwt_id_ttl: 600

What did you do?

Trying to setup reliable streams and it's initially working then failing after a short period of time when reconnecting.

  • Connect and confirm broadcasts work
    • Using latest Chrome (118.0.5993.117) as the client I am letting it connect to anycable and confirm it's receiving broadcast messages from my rails application.
  • Set Chrome to 'Offline' in the Network tab of dev tools
    • Stops receiving broadcast messages
  • Set Chrome back to online after ~20s
    • 🟢 Websocket reconnects and I get the history (messages I'd missed) correctly
    • 🟠 New broadcast messages initially come through on the replaced websocket connection
    • 🔴 Shortly after I get a failed to subscribe error
      • ℹ️ Note this doesn't happen every time, seems to depend on how long I wait to reconnect but I can't figure out a pattern exactly.

What did you expect to happen?

Reconnection works, historical messages are received and new broadcasts continue to function indefinately.

What actually happened?

I get a SubscriptionTimeoutError: Haven't received subscription ack in 10000ms for {"channel":"...","signed_stream_name":"...} error ~10s after I go back online.

Up until this error I still receive new live broadcasts. Strangely I still see subsequent live broadcast messages in the websocket messages of the dev tools even after this error - they are just not processed at all by the client, I assume they're just ignored as it thinks the subscription is dead.

This doesn't happen every time, but only when the client websocket re-submits a subscription request to anycable to something it was previously subscribed to before it disconnected.
Sometimes, particularly after a short time offline, it will just reconnect, fetch history, and keep going on perfectly smoothly and never re-subscribe.

Browser console log:

image

Websocket browser messages:

Note how it is trying to subscribe again, even though the backend thinks it already is

image

From the anycable go service (debug enabled):

The backend thinks the client is still subscribed. It does even keep sending messages 'correctly' over the websocket (which are ignored).

anycable  | D 2023-10-30T07:53:39.053Z context=node sid=oKJ-9PStdmrO74aWH9ZX0 Incoming message: &{subscribe {"channel":"ActivityInstanceVisualizationChannel","charts-_lifecycle_target":"activationChannel","signed_stream_name":"Imluc3Rh~~snip~~"} <nil> {1698652416 map[activity_instance_visualization:instance=Z2lk~~snip~~:{YITt 30}]}}
anycable  | W 2023-10-30T07:53:39.054Z context=node sid=oKJ-9PStdmrO74aWH9ZX0 Failed to handle incoming message '{"command":"subscribe","identifier":"{\"channel\":\"ActivityInstanceVisualizationChannel\",\"charts-_lifecycle_target\":\"activationChannel\",\"signed_stream_name\":\"Imluc3Rh~~snip~~\"}","history":{"since":1698652416,"streams":{"activity_instance_visualization:instance=Z2lkO~~snip~~":{"epoch":"YITt","offset":30}}}}' with error: Already subscribed to {"channel":"ActivityInstanceVisualizationChannel","charts-_lifecycle_target":"activationChannel","signed_stream_name":"Imluc3Rh~~snip~~"}

Similar to anycable/anycable-go#141 and anycable/anycable-go#142

palkan commented

Hey @adam-h!

Thanks for the detailed report!

Looks like a bug in the @anycable/core library (I'll transfer the issue there).

Strangely I still see subsequent live broadcast messages in the websocket messages of the dev tools even after this error

That's not strange: from the server point of view, there is a subscription, so you continue receiving updates (the history request is confirmed—that indicates that the subscription is alive at the server side).

The actual question is why does the client tries to send the subscribe command again? It should not. It would be helpful if you can share the full WS log (some important pieces are not visible on the screenshots, like full identifiers and restore_ids from the welcome message). Even better if you can enable debug logging at the client side (createCable({logLevel: 'debug', ...})) and share the logs.

adam-h commented

@palkan Thanks for getting back to me and putting this in the right place. Attached are the full logs of me reproducing this.

Client console log (Chrome)

anycable-client-debug.log

  • Slightly modified from the browser to indent stack traces and shorten script hashes, otherwise unchanged
  • Disconnection was between lines 30 and 31
  • Reconnection was on line 89
  • Re-subscribe (the error?) is on line 115
  • Total failure on line 223

Server log (docker)

anycable-go-debug.log

  • Line 154 is when the client (which will go offline) initially connects
    • Lines before this are from a reload or two and another browser also connecting to triggers all the broadcasts. I doubt they're relevant to the issue
    • Included them all as the start of the file includes the bootup logs
  • Line 193 is when the client reconnects after being offline
palkan commented

Thanks for the logs. The relevant part is here:

// Looks like page refresh / DOM update triggered unsubscription from all streams
// In anycable-client, we synchronize subscribe/unsubscribe commands to the same channel to prevent concurrent commands; so we try to acquire the lock here
@anycable--web-731193.js:1 acquiring unsubscribe lock {identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}'}

// New HTML brings back the same `turbo-stream-source` elements, initiating the subscription (again, trying to acquire the corresponding locks)
@anycable--web-731193.js:1 acquiring subscribe lock {identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}'}

// The lock has been granted for the unsubscribe command...
@anycable--web-731193.js:1 unsubscribe lock has been acquired {identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}'}

// However, we detect that it's no longer needed—we want to subscribe! So, we cancel this command
@anycable--web-731193.js:1 cancel unsubscribe, no longer needed {identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}', intent: 'subscribed'}

// Now, the subscribe lock has been granted 
@anycable--web-731193.js:1 subscribe lock has been acquired {identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}'}

// And here is the problem — we don't check that we're already subscribed and must cancel this operation
@anycable--web-731193.js:1 subscribing {identifier: 'ActivityVisualizationChannel', params: {…}}
@anycable--web-731193.js:1 outgoing message {command: 'subscribe', identifier: '{"channel":"ActivityVisualizationChannel","signed_…42272f7d8d35c5c5e0d00f34ca156a49c614ceda5c77b0f"}', history: {…}}

I will try to publish a fix asap

palkan commented

Released a fix in @anycable/core 0.7.10. Please, let me know, if it helped.

adam-h commented

@palkan Thank you so much, this is great! Really appreciate your help on this.

Verified it's resolved my issue (manually applied the same patch to my local package as 0.7.10 isn't published yet).