ar-io/ar-io-node

feat(api): add support for websockets

Opened this issue · 19 comments

Summary

Create a WebSocket server for ar.io gateways to enable users/developers to subscribe to events on Arweave.

Motivation

There's no service that would enable event subscription in the Arweave ecosystem via websockets.

Use cases for devs

  • Wallets can subscribe to an event when a certain transaction gets mined
  • Other indexers or backends can subscribe to transactions with specific tags
  • Other use-cases coming from a fact that you can particularly sub to any event, e.g. new block mined, new tx mined that meets specific ANS, ...

Other motivations

  • The big advantage of WebSockets is that after the handshake procedure, the overhead of individual messages is low, making it good for sending a high number of requests.
  • Currently, there is no real-world use for indexed data in the database; this feature would finally give indexing some meaning
  • As mentioned above, there's no solution for ws in the space right now, so it would be a good selling point for ar.io
  • Note: This feature will be totally optional by enabling the WS_ENABLED variable in .env

Specification

Because WebSockets doesn't follow any strict rules/protocols there's a need to create its own communication protocol.

This is a type specification of each message in this new protocol:

{
  "method": "<specify the type of action; e.g. subscribe, unsubscribe, error states, etc.>",
  "event": "<specify type of event you want to sub; only used in 'ar_subscribe method'>",
  "params": "<specify parameters of the event; for every event/method it's different>",
  "result": "<specify the result data; used only by server for returning data>"
}

Example of communication

It works by subscribing to particular events. The gateway will return a subscription ID. For each event that matches the subscription, a notification with relevant data is sent together with the subscription ID.

  1. Create subscription (this example: subscribe to all new txs):
{
  "method": "ar_subscribe",
  "event": "new_tx",
  "params": null,
}
  1. Response from ar.io gateway (returns a subscription id - uuid v4):
{
  "method": "ar_subscribe_accepted",
  // Subscription ID
  "result": "bd52673d-832f-4b34-a79d-79058e7f6989",
}
  1. After the creation of the subscription, the client will receive a notification in this format:
{
  "method": "ar_subscribe_msg",
  "params": {
    "subscription": "bd52673d-832f-4b34-a79d-79058e7f6989"
  },
  "result": [
    { "id": "FPDHr4gKD...yG2I", "owner": "sb7fS07r5...AVm0", "tags": [...], ... },
    { "id": "4d0G6Nk4z...hm44", "owner": "pnqyui51Q...jLvQ", "tags": [...], ... },
    ...
  ],
}
  1. If the client wants to cancel the subscription:
{
  "method": "ar_unsubscribe",
  "params": {
    "subscription": "bd52673d-832f-4b34-a79d-79058e7f6989",
  },
}

Event methods

  • ar_subscribe
  • ar_subscribe_accepted
  • ar_subscribe_denied
  • ar_subscribe_msg
  • ar_unsubscribe

Event types

  • new_tx
    • params:
      • owner: string
      • target: string
      • tags: { name: string, value: string }[]
    • result: Transaction[]
  • new_block
    • params: none
    • result: Block

Rate-limiting

  • Gateway operator could be able to somehow rate-limit the ws server by:

This would probably need some discussion if it's needed or not.

ENVs

WS_ENABLED | type: boolean | default: false | desc: Start the WebSocket server
WS_PORT | type: number | default: 3333 | desc: Port of the WebSocket server
WS_CORS | type: string | default: * | desc: Allow access from specific origin

Implementation details

  • Use of the ws module in Node.js should be sufficient
  • Use of redis (or dragonflydb) in-memory db for storing subscription IDs and params or use just own implementation of some structure?

Considerations

  • Notifications are sent for current events and not for past events
  • Subscriptions are coupled to a connection. If the connection is closed all subscriptions that are created over this connection are removed

It's the first blueprint, so there will likely be some blind spots that I've missed. Feedback is highly appreciated. :)

@rixcian Hey, just wanted to say thanks for this well written ticket! The team is in general agreement that there's a need for something like this, and the details seem reasonable at least as a starting point. We have a lot of other work in flight as a team and probably can't take this on ourselves at the moment, but could provide support if you're interested. Is this something you have capacity to work on yourself? If so, we can spend more time reviewing the details.

@djwhitt Hey, thanks for the kind words! Yeah, if you're okay with the specs, I'm ready to start working on this right away.

@rixcian Great! We'll spend a bit more time digging in and get back to you with some design questions.

Alright, first off, minor quibble. 😆

Currently, there is no real-world use for indexed data in the database; this feature would finally give indexing some meaning

You do know that the indexed data is queryable using GraphQL, right? 🙂

{
"method": "<specify the type of action; e.g. subscribe, unsubscribe, error states, etc.>",
"event": "<specify type of event you want to sub; only used in 'ar_subscribe method'>",
"params": "<specify parameters of the event; for every event/method it's different>",
"result": "<specify the result data; used only by server for returning data>"
}

Can expand a bit on why this format in particular was chosen. It seems like having a type and perhaps a unique id and then letting the rest of the message format be determined by those might be more flexible.

  • Use of redis (or dragonflydb) in-memory db for storing subscription IDs and params or use just own implementation of some structure?

For single node, in memory data structures would be fine (I don't think Dragonfly DB is needed). Redis sounds good for a multi-node setup. Code should be written to an interface that can be implemented on both. It would also be very helpful to get a diagram of how you envision subscription state and message streams being managed in a multi-node setup.

I'm also curious if you've considered the pros and cons of implementing our own websocket protocol vs using something like GraphQL subscriptions.

Hey @rixcian! Thanks for issue, was about to create basically same issue myself.
I think subscribing to changes in gql results would be cool, however it might slow down node a lot if there are a lot of subscriptions.

Writing special protocol for subscriptions would make it more performant, but would also require much more hours to do.
One thing that is clear is that having to poll graphql is not best way to go for a lot of usecases.

@djwhitt LMK if you want PR with implementation of this functionality.

// Our possible usecase, not important for overall discussion, but might explain why it's good idea.

Good usecase for this would be some sort of system with a lot of updates not linked to each other (i.e a lot of SmartWeave contracts).

In RareWeave, we have a lot of NFT contracts (hundreds), and we need to check for interactions for each of them. Currently it's done by chunking it to multiple queries, each responsible for specified range of contracts. Then rinse and repeat, syncing new interactions over time.
Of course main drawback of this approach is speed, it takes around 1 minute to fully traverse through all contracts, and this time will grow with new contracts.
Also delay is needed to not get ratelimited by node, so our smart contract engine gets data with ~3 minute delay, which hurts UX a lot.
Subscription would make it much easier for such kinds of applications to function.

// Our possible usecase, not important for overall discussion, but might explain why it's good idea.

Definitely important to hear uses cases!

LMK if you want PR with implementation of this functionality.

I don't think we're quite ready for a PR on this, but if you wanted to take a stab at an architecture diagram for a multi-node setup as well, that could be helpful.

@djwhitt Multi-node as in multiple ar.io nodes? Or something else?

@angrymouse Yes. Assume that we have N ar.io nodes and the actual websocket can be connected to any of them. What's the architecture for streaming updates and maintaining subscription state. (note: there are probably some standard ways to do this with Apollo, but I haven't investigated them yet)

t8 commented

Supportive of this. Would serve several use cases we'd be interested in pursuing with ArConnect.

@t8, I'm interested in hearing more details about your use cases if you can share.

@djwhitt I think there shouldn't be some kind of special protocol for keeping socket updated. Each node just streams its data once it received it (I.e loaded bundle or block)

@angrymouse Do you mean unbundling, tx processing, etc. would only be streamed to the clients connected to the writer in a multi-node setup?

To add a bit more context - the multi-node scenario we need to support initially is 1 DB writer with multiple readers, but in the future we may have more complex topologies with unbundling handled by different nodes than L1 TX indexing (that's how arweave.net works today). We don't have to have all that sorted out initially, but we do need to have a design that accommodates growth in that direction.

@djwhitt If I understood correctly, we still can just do 1 writer>readers. Just that readers can be subscribed to multiple writers in case some of them doesn't process needed chunk of data.

@angrymouse I think we might be talking about different kinds of readers and writers. The scenario I'm thinking about is when you have 2 ar.io gateway services running. One of them reads and writes to a DB. The other one only reads. Both accept connections from users. We need a way for the users connected to the reader to receive the same messages as the users connected to the writer.