share/sharedb

`Connection` does not reconnect automatically

jahewson opened this issue ยท 23 comments

When I restart my web socket server, the Connection on the client emits a state event with the newState set to "disconnected", which according to the documentation means:

'disconnected' Connection is closed, but it will reconnect automatically

However, the connection does not reconnect. Given that the socket which was passed to the Connection's constructor is now closed, I don't understand how reconnection would be possible. Don't we need a new WebSocket? What's going on here?

The inline documentation you are referring to is incorrect. I recommend using one of the following modules for re-connectable websockets:

websocket.js
reconnectingwebsocket

Ok, that's good to hear. I had thought that maybe it was just me. Might want to update the README which says:

  • Reconnection of document and query subscriptions
  • Offline change syncing upon reconnection

I too encountered reconnection problems, everything works fine locally, but Heroku seems to kill the socket if no event occurs within a minute. I have now added a ping-pong to help, and replaced my WebSocket with a ReconnectingWebSocket, but running into a weird issue. The socket does indeed reconnect, but the events no longer sync between clients after the reconnection.

Oddly, it seems like they still make it to the server, just not the other client - almost as if the two clients started thinking they're in different rooms until one of the pages is reloaded, or as if the subscription to remote changes broke after the disconnection. Do I need to rerun subscribe events after the reconnection? Do I need to refetch the docs? Do I need to recreate the ShareDB handle on every reconnection? On the main page in the list of features it says that ShareDB handles reconnection, what exactly does that mean and what should we handle ourselves?

Thanks

This is addressed in PR #138 .

I've had quite good luck using https://github.com/pladaria/reconnecting-websocket , dropping it in in place of raw WebSocket.

FWIW, initially I tried https://github.com/joewalnes/reconnecting-websocket , but decided not to use it because it depends on window (breaks in a Node environment), and is no longer maintained.

Yep, I've also come across this issue of Connection not reconnecting automatically.

On previous versions of ShareDB/ShareJS the communication was established through browserchannel, but browserchannel comes from a time where WebSockets weren't widely supported and so it resorted to this 'long-polling' protocol to establish the connection. Now, also as stated in its README, we should strongly consider WebSockets as an approach to this.

I also note that ShareDB pretty much requires a WebSocket, or another API-compatible way of transport, that assures in-order-message delivery, it seems.

But I do miss the resilience of browserchannel between connects, disconnects and network issues.

I've been setting up a small (WIP) experiment at pedrosanta/quill-sharedb-cursors and I've resorted to reconnectingwebsocket like @DetweilerRyan suggested, along with some ping/keep-alive mechanism on server-side to help maintain connections. So far seems to be working quite fine.

I recognize that @mbalex99 on #138 updates the documentation, but I don't know if this really shouldn't be taken care of on ShareDB Connection module? I think I would like to have this 'resilience'/reconnection out-of-the-box on ShareDB.

Thoughts?

@atsepkov That's a bit strange. I've been doing a few tests also on Heroku with pedrosanta/quill-sharedb-cursors and, like, the events resume after a reconnect, and even syncs the edits made while offline on reconnect. ๐Ÿ˜• LMK if it helps or if you can make it work too somehow.

In production we at Retrium no longer use a reconnectable websocket. The logic around how, when and why we attempt to reconnect to our servers is complex and none of the existing reconnecting websocket npm modules provide the flexibility we required. Reconnection logic in a production grade application should take into account a number of things including if the bowser has a network connection, if the browser tab is active and visible, and finally should use an exponentially increasing backoff interval for reconnection attempts. This logic is difficult to get right and the details depend on the needs of the application and therefore ShareDB should not contain reconnection logic.

Instead, we at retrium use the .bindToSocket method on the ShareDB.Connection instance to pass in new WebSockets to ShareDB.

First we instantiate a ShareDB.Connection with a mock WebSocket that is already 'CLOSED'.

const CLOSED = 3;
const mock_web_socket = {
  close: () => {}, 
  readyState: CLOSED
};
const sharedb_connection = new sharedb.Connection(mock_web_socket);

Then each time we instantiate a new WebSocket we pass it into ShareDB:

WebSocketConnector.on('connection', (socket) => {
  sharedb_connection.bindToSocket( socket );
});

Where WebSocketConnector is a proprietary abstraction that encapsulates all of the complexities around reconnecting to our production servers. Note: ShareDb.Connection.bindToSocket must never be called if the current socket referenced by the connection instance has not closed yet.

Nice insight @DetweilerRyan. ๐Ÿ‘

So, the way you see it is assume really bare WebSocket interface/API-compatible transport layer, and leave/assume connection/reconnection policy and concern to 'downstream' usage?

I think that can work out too. It would just need IMHO to:

  1. Make the documentation clear ( #138 already does that, although we need to remove another line from README and probably update a few Connection comments, etc);
  2. Probably, provide additional documentation on how to achieve simple reconnection capabilities and more info on that topic (probably on a Wiki page or something);

How does this sound?

Also, @DetweilerRyan, ever thought of open-sourcing a version of that WebSocketConnector? ๐Ÿ™‚

I'm curious, is anyone still using Browserchannel? I'm almost tempted to, because an additional benefit would be the ability to use Cookies set by Express on the req objects.

Currently I need to go through what seems like a bunch of unnecessary hoops in order to access the Express session in the ShareDB middleware that handles WebSocket connections.

In case it might be useful for anyone, and in case there is a better way to do this, I'd like to share some snippets for the approach I'm using now for accessing sessions on WebSockets in ShareDB:

import express from 'express'  
import http from 'http'
import expressSession from 'express-session'
import cookie from 'cookie'    
import cookieParser from 'cookie-parser'
import ShareDB from 'sharedb'  
import { Server as WebSocketServer } from 'ws'
import JSONStream from 'websocket-json-stream'

const app = express()
const server = http.createServer(app)
const wss = new WebSocketServer({server})
const db = sharedbMongo(MONGO_URL)
const sharedb = ShareDB({db})

import connectRedis from 'connect-redis'

import {
  SESSION_SECRET as secret,
  REDIS_HOST as host,
  REDIS_PORT as port
} from './config'

const RedisStore = connectRedis(expressSession)
const store = new RedisStore({ host, port })

app.use(expressSession({ ... }))

// Gets the current session from a WebSocket connection.
// Draws from http://stackoverflow.com/questions/36842159/node-js-ws-and-express-session-how-to-get-session-object-from-ws-upgradereq
export const getSession = (ws, callback) => {

  // If there's no cookie, there's no session, so do nothing.
  if(!ws.upgradeReq.headers.cookie) return callback() 

  // If there's a cookie, get the session id from it.
  const cookies = cookie.parse(ws.upgradeReq.headers.cookie)
  const sessionId = cookieParser.signedCookie(cookies['connect.sid'], secret)

  // Get the session from the store and pass it to the callback.
  store.get(sessionId, callback)
}

wss.on('connection', (ws) => {
  getSession(ws, (err, session) => {
    if(err) return console.log(err)
    ws.upgradeReq.session = session
    sharedb.listen(new JSONStream(ws), ws.upgradeReq)
  })
})

// Expose the session from initial connection as agent.session.
sharedb.use('connect', (request, done) => {
  request.agent.session = request.req.session
  done()
})

sharedb.use('apply', (request, done) => {

  // Unpack the ShareDB request object.
  const {
    op,
    agent: { session },
    snapshot
  } = request

  // Get the id of the currently logged in user from the session.
  const userId = get(session, 'passport.user.id')

  .. your code here ..
}

The above should work well with reconnecting-websockets, but with Browserchannel you can just access req.session and the above code is not necessary.

@pedrosanta Agreed, reconnection at the network layer is and probably should remain out-of-scope for the ShareDB project.

Make the documentation clear ( #138 already does that, although we need to remove another line from README and probably update a few Connection comments, etc);

I'm not sure #138 really solves the confusion around the documentation regarding reconnection. The statement in the documentation, "Reconnection of document and query subscriptions", is not incorrect but is just misunderstood. While ShareDB does not initiate reconnection attempts at the network layer it does handle resubscribing to existing Document and Query subscriptions after a disconnect and subsequent reconnect occurs on the network layer. This is important to document because it means that that when using ShareDB and a reconnection occurs at the network layer you do not need to manually resubscribe to Documents and Querys that were previously subscribed to. The internal logic to reconnect to each Document and Query is complex and can be see here in the Connection.prototype._setState method.

Instead, documentation stating that ShareDB does not attempt to open a closed socket but will internally handle resubscribing to all Documents and Queries when the socket open event fires and the socket's readState changes to OPEN may be more appropriate.

Probably, provide additional documentation on how to achieve simple reconnection capabilities and more info on that topic (probably on a Wiki page or something);

The example in #138 using the reconnecting websocket is sufficient to get started with ShareDB. However, my earlier caution regarding websockets in a realtime production grade web application still stand.

Also, @DetweilerRyan, ever thought of open-sourcing a version of that WebSocketConnector? ๐Ÿ™‚

Yes; however, our current implementation of WebSocketConnector has some unresolved architectural issues and dependencies that are not suitable for an open source project.

@curran At retrium we have been using BrowserChannel in production for two years. For the first year we only used BrowserChannel but then switched over to WebSockets believing that WebSockets had enough browser support that It wouldn't affect many or any of our customers. Unfortunately, despite the almost universal support for WebSockets corporate firewalls and proxies will sometimes block WebSockets. After realizing this we quickly deployed a fallback mechanism from WebSockets to BrowserChannel. We have had no major problems with BrowserChannel and there remains a clear need for a alternative to WebSockets to fallback to despite the near universal support in browsers. That being said, I know little about why or how corporate firewalls and proxies sometimes block WebSockets and any insight here would be appreciated.

Currently I need to go through what seems like a bunch of unnecessary hoops in order to access the Express session in the ShareDB middleware that handles WebSocket connections.

In case it might be useful for anyone, and in case there is a better way to do this, I'd like to share some snippets for the approach I'm using now for accessing sessions on WebSockets in ShareDB:

Sometimes bridging technologies together gets a bit messy, nevertheless your approach for accessing sessions on WebSockets in ShareDB middleware looks pretty solid. Thanks for sharing!

@curran:

I'm curious, is anyone still using Browserchannel?

We're still using Browserchannel, but we might move to WebSockets in the near future. And yea, with WebSockets, handling session can be nightmarish. ๐Ÿ˜• Nice snippet, thanks!

@DetweilerRyan:

I'm not sure #138 really solves the confusion around the documentation regarding reconnection. The statement in the documentation, "Reconnection of document and query subscriptions", is not incorrect but is just misunderstood. While ShareDB does not initiate reconnection attempts at the network layer it does handle resubscribing to existing Document and Query subscriptions after a disconnect and subsequent reconnect occurs on the network layer. This is important to document because it means that that when using ShareDB and a reconnection occurs at the network layer you do not need to manually resubscribe to Documents and Querys that were previously subscribed to. The internal logic to reconnect to each Document and Query is complex and can be see here in the Connection.prototype._setState method.

Yap, got it. You're absolutely right. Just network/transport level.

Instead, documentation stating that ShareDB does not attempt to open a closed socket but will internally handle resubscribing to all Documents and Queries when the socket open event fires and the socket's readState changes to OPEN may be more appropriate.

Yap, agreed. This documentation could use some clarification. ๐Ÿ‘

Yes; however, our current implementation of WebSocketConnector has some unresolved architectural issues and dependencies that are not suitable for an open source project.

๐Ÿ‘ Got it.

Yikes, nice cautionary tale regarding firewalls and proxies @DetweilerRyan... ๐Ÿ˜Ÿ I think I'll be re-assessing the update from Browserchannel to WebSockets... didn't really thought of those kind of issues.

Hey @DetweilerRyan while you were experiencing bad issues with firewalls and websockets, was Retrium using SSL for the websockets (wss://)? I've heard that using SSL mitigates a lot of those issues, but wanted to check before investigating myself. Thanks!

Hi @ianstormtaylor, yes, we were using wss:// when we were experiencing connectivity issues. I haven't really investigated the causes of the connectivity issues beyond reading a few articles on the internet and instead opted for a fallback mechanism. Maybe there is a solution to this problem other than falling back to long polling and if you do end up investigating this further, I would love to hear your findings.

This article is a good read and is what convinced me that a fallback mechanism was required, namely the first paragraph of the summary:

While the HTML5 Web Socket protocol itself is unaware of proxy servers and firewalls, it features an HTTP-compatible handshake so that HTTP servers can share their default HTTP and HTTPS ports (80 and 443) with a WebSocket server. Some proxy servers are harmless and work fine with Web Sockets; others will prevent Web Sockets from working correctly, causing the connection to fail. In some cases additional proxy server configuration may be required, and certain proxy servers may need to be upgraded to support Web Sockets.

It's unfortunate that until your application actually has users behind corporate firewalls and proxies ( who are interested enough to actually tell you they're experiencing connectivity issues ), you will not become aware of these WebSocket connectivity issues. For what it's worth, a couple weeks ago we had a problem with a proxy server with one of our enterprise customers that failed to trigger our fallback logic. It turns out the proxy server allowed the WebSocket connection to open but did not allow any traffic which resulted in the customer being presented with an indefinite loading screen. We've since updated our fallback logic and it's smooth sailing again. Despite this one recent hiccup, falling back to browsercahnnel continues to be very reliable and robust.

Good luck!

PS @ianstormtaylor Slate looks really great. Love the technology choices you made and the plugin architecture, it is everything that I had hoped Draft.js was going to be but wasn't. I'm really looking forward to incorporating it into Retrium's front end stack at some point. I've done a bunch of custom production grade OT work with Immutable.js at this point and if you're interested I'm open to chatting.

@DetweilerRyan thank you! That is super super helpful, and thanks for the kind words. I definitely might take you up on that chat soon ๐Ÿ˜„ just starting into some of the server-side stuff now so I'll see how it goes!

@DetweilerRyan just curious why you generate a new socket object each time, instead of reopening the socket you initially generated?

Hi @DetweilerRyan, how do you handle the resubscription to the documents? I'm currently struggling with this now. :(

function connect () {
      const quill = new Quill('#editor', this.editorOption)
      this.quill = quill
      const recordId = this.recordId
      const _self = this
      const socket = new WebSocket(`${Env.memoHost}/${wsPath} // `eslint-disable-line`

      socket.onopen = function () {
        console.log('connected')
        const connection = new ShareDB.Connection(socket)
        const doc = this.doc = connection.get(_self.userId, recordId)

        doc.subscribe(function (err) {
          if (err) throw err
          quill.setContents(doc.data)
          _self.recordText = quill.container.firstChild.innerHTML
          quill.on('text-change', function (delta, oldDelta, source) {
            _self.recordText = quill.container.firstChild.innerHTML
            if (source !== 'user') return
            doc.submitOp(delta, { source: quill })
          })
          doc.on('op', function (op, source) {
            if (source === quill) return
            quill.updateContents(op)
          })
        })

        if (connection && connection.socket) {
          const closeFunc = connection.socket.onclose
          connection.socket.onclose = function () {
            closeFunc()
            handleClose()
          }
        }
      }

      socket.onclose = function () {
        handleClose()
      }

      const handleClose = function () {
        console.log('ws closed')
        setTimeout(() => {
          _self.connect()
          console.log('reconnecting...')
        }, 3000)
      }
    }

The docs say to use a reconnecting-websocket. Closing due to inactivity.

Hi, I'm running into the same issue as mention by @atsepkov from his post in 2017.

It runs fine locally, but NOT when hosted (Heroku). It gets into this weird "stale" state where the CLIENT thinks it is connected (ready state 1), but changes dont get to the server (and no error renders in the client).

I've tried a bunch of things to rectify, no success (such as following the basic examples, adding a server ping, reconnecting WebSocket, and using setInterval).

Does anyone have suggestions here? I really need a reliable way to detect if the "connection" is actually talking to sharedb

Server setup:

const ACTIVE_SOCKETS =  new Map()

const wss = new WebSocket.Server({ server })
  setInterval(() => { 
    for(const socket of ACTIVE_SOCKETS.values()) {
     socket.ping(function() {}, false, true)
    }
  }, 1000)

  wss.on('connection', async (ws, req) => {
    ACTIVE_SOCKETS.set(uid, ws)
     const stream = new WebSocketJSONStream(ws)
      backend.listen(stream)

    ws.on('close', (code, reason) => {
      ACTIVE_SOCKETS.delete(uid)
    })
  })

Client setup:

export default (fullPath, offlineCB) => {
  if (!socketVal) {
    const rws = new ReconnectingWebSocket(fullPath)
    let markedOnline = true
    setInterval(() => {
     // Using SetInterval here to determine what "state" it thinks it is in, when not data is being posted
      if(![1,3].includes(rws.readyState)){
        offlineCB(Date.now() + '___1or3')
      }
      if (rws.readyState === 3) {
        offlineCB(Date.now() + '___OFF')
      }
      if (rws.readyState === 1) {
        offlineCB(Date.now() + '___ON')
      }
    }, 5000)
    socketVal = new ShareDB.Connection(rws)
  }
  return socketVal
}
wires commented

I don't think browserchannel should be dismissed. Most cases websockets is much better, but for good reasons I cannot use websockets, and must instead use browserchannel.

It's not super clear to me how to do that. I was using racer which provides racer-browserchannel but would like to use sharedb directly without racer. I'd love to see continued support / docs for using browserchannel (and if I get it working I'll post some code of course), especially a snippet for server and client side for using browserchannel instead of reconnecting-websocket. (anyways thanks for the project)