aaronc/freactive

Question: Using freactive with Datascript / Datomic

Opened this issue · 6 comments

I think freactive has pretty well removed the insanity inherent to the task of providing coherent, coordinated views on ever-changing data, thanks to your work. Now, though, instead of using a simple hash-map wrapped in a reactive atom as my datasource, I want to use Datomic or possibly Datascript (an in-browser ClojureScript database which seeks to model/reproduce Datomic). Being nowhere near as intimately familiar with the internals of freactive as you, of course, I was wondering: what, in your opinion, would be the best way to go about using freactive with these kinds of more complex datasources?

I came across this gist (specifically lines 136-200) which uses a cursor over a reactive atom wrapped around a Datascript database, which is a hash-map anyway. The difference is that instead of using keywords or the like as cursor-fns, it uses a query function. How would freactive listen to the changes in the database which the query fn would intend to "subscribe" to?

Even so, the question of Datascript seems a relatively easy one compared to that of Datomic. After all, a Datascript database is really just a hash-map with schemas and Datalog-like queriablity (is that a word?) built in (and other things, but we'll ignore that). The entire Datascript database is in-memory and seems conceptually quite similar to the use cases in which I've used freactive (i.e., using a ClojureScript hash-map as the global application state). However, Datomic is an abstraction over an actual database, in this case remote with respect to the client, with latency and the potential for connection-interruptions, for which reactive watch notifications ("subscriptions") seem to be more conceptually difficult to implement. (However, not being entirely familiar with the internals of freactive, I can't with total certainty assert that this is the case.) Is the solution to instead cache the relevant parts of the database in a global-application-state hash-map on the client and have an push-queue with updates from the client to the database and a pull-queue with updates from the database to the client?

Bear with me, as many of these concepts are fairly new to me and I may not be expressing them very well.

I appreciate your time. Thanks!

Hey there, so that gist looks a little dated so I'm not sure I would go off of that. The Cursor updateCursor method is what you can use to notify a cursor of external changes: https://github.com/aaronc/freactive.core/blob/master/src/clojure/freactive/core.cljc#L261. It's not really documented yet (and is also a WIP) so you might have to muddle through the source code a bit to figure it out. But it should theoretically allow freactive's cursors to be backed by external data. Once I think this API is stable, I'll document it so others can do things like integrate with Datascript and Datomic.

Regarding how to handle changes from Datomic - I think it's complex. If you have a read-only view of some Datomic entity or query, that could be fairly simple if you just listen to datomic's tx report queue. But I think trying to write changes back to Datomic involves too many variables to have a non-trivial solution - coordination of conflicting changes, validation, etc. You could do it with freactive's cursor, but you'd still need to address those data management concerns. Datascript limited to a single client I think would be much simpler.

I've implemented something similar in intent in my application with a completely different stack. The aim is similar though, if I've understood you correctly. I might just re-state that here in my own words so as to make sure we're on the same conceptual page.

At the top level, you want to have as easy as possible an API for data synchronising in your front end clojurescript code. You effectively want set-and-forget semantics around your database updating. Specifically, you want to have an atom-like entity that you can create cursor-like entities on, that maintains a sync state with a backend of some description. David Nolen had something called om-sync which was an experiment in doing just this, so that might be worthwhile investigating. I have to admit that it was a little too magic for my liking, precisely because of the issues of validation and editing data: my app needed a layer or two: that is, I wanted "edit mode" to not make destructive edits on my database until a user has hit save.

However... the complexity increases when you start to need to store your "edit mode" data somewhere. Where do you store it? I wanted the feature that if a user begins to edit, then moves to another part of the app, their editing state is still in play, so, really, the editing data had to be part of the "big hashmap in an atom" style of local data store, but when I thought deeply about it, it's actually part of the UI elements (it's their temporary data).

So, what I ended up with was something where there is one way data flow through my application and I have two atoms - one for application/UI state (this thing is open, that one is in editing mode, here is the editing data the UI element currently holds, this menu is selected, and that other section of the app is the one the user is in now), and another atom to hold temporary and semi-temporary model state (that is, the data of the application's model as averse to its UI state). Things in the model state are marked as dirty when they change, and new when they're not yet in the backend, and I use core.async channels everywhere to create new items (create in in-memory database), save items (both create/update - persist to backend), delete items (remove from backend), and revoke items (remove them from my in-memory database). I chose to go down this slightly more classical route because if you are doing it the other way, you have to ensure the user is 100% aware of the magic the system is doing, and the UX actually feels like the user has to worry about things more when they go wrong. Also, you have to worry about all kinds of things, like what if they go offline? what if their connection goes down mid-save? what if your net connection is slow right now? what retry policies do you have? as Aaron says, what happens when the data is not valid, or something like that? The choices I made allow me a little more flexibility and allow the user more feeling of control, and speed, especially when things "go wrong". Also, things become incredibly inefficient if you're doing live-syncing (depending on how far away your store is), so you will have build some kind of debouncing queue in, a sort of batch-updater, which I did for my loads. I ues a core.async debouncing channel that manages a timeout against identical requests so far less go through.

I have it hooked up so that when a component mounts, it triggers a retrieve on the store of its data (via that same core.async channel), a retrieve is some code that checks whether the item is in the store, and if not, invokes a load (each object in the store has meta-data on when it marking when it was last loaded, and I have a standard timeout to freshen data). This part of the app is a little silly and could be done better, if I was doing it again from scratch, I would probably just have a single feed of what's new since a datetime, and not worry about all the retrieves.

This issue is very interesting to me because I do plan to re-write my app again armed with this knowledge and using either datomic, or more than likely amazon's dynamodb or something like it. I should mention, my backend is currently on an sql-backed antequated technology, so it's very slow at building the JSON that my transit-json API uses. Unfortunately, the association&table-based layout of the DB has informed a lot of the things I need to take care of, and throttled my development flexibility over time, so you've definitely got the right idea taking on datomic.

Whether you put a server between you and datomic, or not, this might be of help: https://github.com/zachallaun/datomic-cljs it's alpha-software but it would save you having to re-write your own datomic client in cljs. It's not exactly a client, though, it's just a datomic-API wrapper in cljs around the datomic REST API. This is an important distinction, because if you want to make a syncing local data store, there's a certain amount of lag involved for users if you don't have some kind of local store. That's the reason I chose to do two layers with updating. I can do things like load the data in small chunks over time, not blocking the UI, and have the app more immediately usable (so people can navigate to where they want to go as data is still loading, for example).

One of the biggest challenges in writing an app in anger in the browser is user experience: the single-threaded nature of the browser. We want to make sure the data layer doesn't impede the experience of the user either through doing too much work so that the browser locks up for a time, causing the user to wait in for a non-responsive program (very bad), OR in not loading data fast enough so that the user is waiting around (slightly bad). Users will generally wait for a small amount of time for an inital load, because the page has to load anyway.

Thanks so much for the replies and sorry for the delay.

@aaronc :
I will definitely look at the updateCursor method and experiment with it - that looks like it could be very useful. Thanks for the suggestion.

@JulianLeviston :
You described pretty accurately what I'm looking to do. I had never heard of om-sync — I'll take a look at that for sure and see if I can't tweak it to do what I want to do. Your thoughts and musings are very interesting to me — I've thought about many of the same things and I appreciate a second opinion on them from someone who's clearly had a good deal of experience with it.

As for putting a server between the (JS) client and Datomic, I've discovered that that's apparently necessary, even given the fact that a REST API exists for Datomic. Apparently a native REST client (really a server) needs to stand between the web client and Datomic. Given that fact, I'm just going to stick with sending EDN data to the server and having it pass it off to Datomic, and vice versa. I'm not yet sure what to do with binary data / blobs and I haven't thought about it too much, but to the extent that I have, I was thinking of having the client 1) request an authentication key and a URL from the server in order to upload them, 2) upload them, and then 3) send a transaction to Datomic via the server affirming that the upload is complete. But then the server would have to manage mismatches between Datomic's version of which blobs exist and which ones actually do exist, which adds complexity. So I'm not sure about that yet. Part of me foolishly wishes that Datomic handled blobs, but if that were the case, that would mean that the client would send a blob (possibly massive) to the server, and the server would send that to Datomic, which is highly inefficient. Again, I'm still new to all of this so I probably don't have any idea what I'm talking about.

Thanks for the great discussion so far.

A lot of these ideas and questions about architecture are things I've experimented with in building the v2 of my software - I have a working version that separates out the "backing store" into an identical cljx file that sits in the client and the server. This is very handy, and allows you to build reusable code. The semantics are directly translatable between client and server and you get to use the same intention-descriptive code.

Anyway, of current interest may be this video by the creator of Om, David Nolen. He talks about the architectural problems we have when we build this sort of software - it's very useful and worthwhile. https://www.youtube.com/watch?v=ByNs9TG30E8&

The ideas are that you basically want to be able to describe intent efficiently in a composible way toward the backend and REST doesn't quite fit this requirement. I also recommend looking into transit rather than direct standard EDN over a JSON transport: it's more efficient, and just better.

Interesting discussion.

@alexandergunnarson modern apps can use WebSockets for transport (see sente). For committing stuff to Datomic you'll alway have to go through your peer to the transactor anyway - no way to get around that really. If you're committing large blobs, Datomic may not be the place to store them anyway.

In general I find myself preferring the more "classical" approach @JulianLeviston is using. The approach I am using in my project is somewhat similar. I agree that too much magic can lead to undesirable UX and I think it can actually limit the programmer's expressivity. For a generic framework to be suitable for more complex apps it would need to provide some pretty robust ways to deal with app-specific permissions, schema and consistency rules. For now, I'm rolling my own - based on Datomic, but with some fairly custom schema/permission/consistency management and realtime updates only where it really serves our app's purpose.

Regarding freactive's role in all this, one of my primary goals for the project was to separate state management from rendering. So, freactive.core's cursor, atom, rx can all be used independently of freactive.dom and could be used with another rendering engine. I do actually have a working React renderer for freactive.core's datatypes (mainly prompted to support ReactNative). Also, I wrote fx-clj based on freactive.core before I even considered freactive.dom. freactive.dom, freactive.react and fx-clj in turn do not depend exclusively on freactive.core - javelin could be used for instance. freactive.core's root-cursor could also be used to wrap other atom implementations - like https://github.com/alandipert/storage-atom. I hope that freactive's Cursor will be a useful starting point for Datomic/Datascript integration. Regarding om-next, I think the graph query idea is interesting. I think this is something that could be supported using the approach I'm describing by creating "query atoms" that are renderer-agnostic. I'm not sure graph query will supplant other approaches entirely as I've already hit limitations with Datomic's pull and I do also have concerns regarding how it will integrate with things like permissions, schema, etc. But, we've yet to see what dnolen's full framework looks like, so we'll see. Anyway, I think my main contribution on this is that we can implement all of these state management ideas in a way that is renderer agnostic using Clojure's reference-type protocols (IDeref and IWatchable) and possibly the extensions I've made in freactive.core (register-dep, IReactive, ICursor, etc.). Then one person can write an atom/cursor/etc. for database X and someone else can write a renderer backend for platform Y and we know they'll work together. Also when we're trying to implement this DB/client syncing, let's not neglect the complexity of app-specific rules over the desire to have some real-time updates your users may not even care about.

It's good to hear that you feel a nicer way to express concise data-loading from the backend is possible.

The main takeaway from the Om Next stuff for me is that the description of what a component needs should be simply described in data rather than being opaquely built across many functions in code, because (at least in Clojure) the fully lazy nature of data is such that it is more easily composed, and transformed. (We're not yet expressing all of our computation and data in thunks, so code isn't as easily transformed yet). It's a little ironic, because Clojure seems to be slowly turning into data-described AST Haskell.

The beauty of this composability of data for queries is that as the query gets closer to the backend, it can be transformed and annotated in various ways to enable strategies such as throttled data-loading, prioritisation of parts of the data loading, caching, and more intelligent compact expression of the exact data that is required from the backing store. (eg in my main app I have a filtered collection I load where I can't express the filtering criteria to the backend through my current API, consequently I have to load the entire massive collection into the frontend, THEN filter, which is obvously non-ideal, wrong and bad).

As I go along this path, it feels like the true nature of code should be expressed as constrained data (call it typed if you will) when we're developing it and working with it, and reified as strongly typed, purely functional, lazily evaluated code when it is compiled, live and running.