atomicdata-dev/atomic-server

Local-first resources - don't require a server when creating a resource

Opened this issue · 4 comments

joepio commented

Problem

Creating a new resource requires a lot: an active internet connection, a working server, a resolvable Agent. That makes creating data kinda bothersome, which hinders adoption. It would be nice if Resources can be created and update local-first.

Additionally, it would be nice if resources could be resolved without a server. A P2P solution, perhaps something that uses content-addressing (e.g. something like ipfs / iroh) would be swell.

Solutions

Solution 1: use Nextgraph

See #1012

Solution 2: commit hashes and own schema

  • User can create a Commit that does not refer to an HTTP subject. It has no target. It's a RootCommit (not sure about the name).
  • The signature of that commit is the identifier. (or perhaps the iroh hash?)
  • This RootCommit has a date, a publicKey (linking
  • You can refer to the initial version of this resource using a did:atomic:RootHash. So we'll introduce a new did:atomic schema for this.
  • You can refer to a specific following version of this resource using using a did:atomic:CommitHash
  • We still want pretty URLs, in some usecases, of course! So how do we deal with that? How do we map these?
  • We can still resolve using HTTP: https://atomicdata.dev/cid/aeg8ae9gyg98ea9eahge98gha

API (@tomic/lib)

Not a lot would change for the user.

  • When creating a resource, the subject becomes optional, as well as a server. The resource that is returned will get a subject through its constructor, by taking the hash of that resource.
  • The

URL schema (resolve method)

If we have local-first resources, these should be able to resolve without HTTP. How would we go about this? If we use NextGraph, we just use their did schema. Easy.

What information should be in the URL?

  • the Schema, e.g. http or atomic: or did:atomic. This describes how the resource is to be resolved.
  • The hash / id of the first commit, I think. That's the cryptographic "root"!
  • optional: the HTTP server (domain), where the resource lives by default. This is mostly there for performance reasons. it allows the client to quickly get the resource from a server
  • optional: the path + query. This is the /my-site/hello part, that links to the actual content. Not sure if this should be a thing.

But if we don't, we have a couple of options:

  • did:atomic, a new schema that we should register. The URLs themselves

Questions

  • Should the identifier of the resource be based on the signature of the first resource? Or should it be the hash? For compatibility with something like ipfs, a hash would be nice. But Atomic uses signatures of the agent.
  • For authorship, the public key (or subject?) of the signing agent should be included in that initial resource.
  • What kind of merge algo should we use for branching and resolving commits? Some CRDT like Yjs or Automerge might be a good solution.

Relevant issues

Relevant reads

  • AnySync, the protocol that syncs AnyType. A CRDT that uses IPFS under the hood.
  • Willow, powered by iroh. Has rust and ts implementations.
  • Loro. CRDT with Rust and TS libraries. Seems to be fast and lightweight, compared to yjs / automerge.

@Polleps suggested a different path:

Locally Process Atomic Commits

  • Keep the requirement of having an accessible server / a place for resources to be persisted. This kills some usecases, but not a lot.
  • Store commits locally, in the browser, if the server is offline. When the client connects to the server, send all commits.
  • Use something like YJS for merging / branching to deal with conflicts (see #720)

Other options:

Use Yjs for commits

  • Full CRDT! No need to do any of the hard work.
  • Yjs is bulky / heavy, way more complicated than Atomic Commits. There isn't even a Java implementation - wild, considering how popular it is.
  • Making edits requires Yjs state (contains history, if no garbage collection is turned on), so instead of sending plain Json objects to clients (who want to view the data) it might be smarter to send (minimal) Yjs state vectors, so they can edit.
  • Maybe we allow commits to be ephemeral (don't persist) because persisting all vectors is kinda mad if you use Yjs. The history is already inside the Yjs doc!
  • The server persists a Yjs doc per resource.
  • How do we do indexing? Currently, we use "add atoms" and "delete atoms". We could calculate these by comparing a before and after of a Yjs vector.
  • Do we want to allow updates using State Vectors? This means: can clients say "my state is X" and can the server create a tailored response "heres your update Y"? Would mean adding a new endpoint.

I think there are 3 options for integrating Yjs into Atomic.

Option 1:
Go full CRDT by replacing Atomic Commits with Yjs state
Described in Joeps comment above

Questions: What do we send to the client, the client needs a Yjs state to do edits but in most cases (CMS usecase) It only wants to read so having to include Yjs just to decode the data seems like overkill.

Option 2:
Add a Yjs datatype so resources can have Yjs documents as prop values. We would then add a new property to commits alongside set, push, delete etc called something like change or update that contains a Yjs update bin.
The propval would contain the full document state. We might also have to create a type of 'ephemeral commits' for data that is intended to change rapidly and doesn't really add value to the resource it self (things like user cursor position or online status). This would mean that replaying each commit would result in a different binary from the current state (although still with the same content if the client uses these commits responsibly).

This would be a lot less work to implement but also would not make it possible to edit every type of property conflict free

Option 3:
Add Yjs as a datatype or maybe class-extender and persist the state/updates separate from the rest of the data (like how files are handled). This is more in line how Yjs is intended to be used (at least that's the idea I get when reading the readme).
This would mean that a resources would reference some identifier that points to a Yjs doc stored somewhere else (can still be in sled just in a different tree). There would have to be an endpoint that handles syncing to the doc.
With this approach there would be no history stored about the value of the doc in commits, only in the documents state.

Option 4:
Change our commit logic to be a CRDT (and use option 2 for yjs crdt specifically for text documents)
Atomic Commits are already pretty close to being CRDTs. Currently they are not because they aren't deterministic, the server just applies the commits in the order it receives them. There is a timestamp component but that doesn't really guaranty anything. However we could use the lastCommit property together with some deterministic algorithm to resolve conflicts. For example if two commits reference the same lastCommit we could determine their order based on their hash.

I'm not sure if this is true but I think you don't even need to replay every commit in order to apply an out of order commit. Atomic commits only allow set, delete and push. Push can be played back without issue, set and delete can't but it doesn't matter because when everything gets played back the set will eventually occur completely overriding anything that happened before it (a bit like multiplying by 0 at the end of a long calculation).