hack-pad/hackpad

caching and etags semantics and perhaps wider reuse patterns

gedw99 opened this issue ยท 8 comments

Hey @JohnStarich

Hope all is well.... I am really fascinated with your project and wider implications of this.

I am not sure if this issue / question shoudl be here or in the FS repo but here goes....

One thing i am working on is the relationship between retained mode GUI's and data caches. A retained mode gui can be conceptually designed like a data cache system that we all know and love. A gui has a template that is computed on to produce some markup. In a retained mode gui system you want to know if the template has changed. It woudl be a signal at the control plane level (if you get my drift), so you can do optimistic re rendering of that gui component instance, which is sort of like eager caching. Hope that this is making sense :)

Some background might also help... I know this is pretty out there...
The DOM is a browser is doing the retained mode diffing for you. It knows when you changed the HTML in the inspector, and magically updates the HTML Screen for you. But when you working at the WASM level and rendering with WASM to a Webgl canvas, you need to do this yourself..

So now i get to my question... The FS implementation is super cool because it can work with a WASM, local or remote context, but i was wondering if it supports etags style caching or other meta data semantics to tell the caller that you need to do a cache miss and refetch because your local data is old buddy !!

Golang Http FS has etags: https://go.googlesource.com/go/+/go1.8.3/src/net/http/fs.go; see line 119. So it can easily act as a cache to a DB called over http for example.

The reason i bring this up is because you also want to do the same thing inside the WASM environment for many use cases.
Let say in the WASM environment you have a FS ( that is indexeddb under the hood) that is sourced from S3. When the S3 source changes, i presume there is some sort of etag header to tell you. So you want when in the WASM environment there is a call to the WASM FS, it should realise that the local FS is now "dirty" or "invalidated" and so it should automatically get the data from S3, update the local file, and return that to the caller.

So i am wondering if this is currently supported.

It will be really interesting to see how this also related to the Web Worker and Service Worker stuff too Service Worker.

It will be really interesting to build quasi CDN like architectures where you can have caches at the WASM, local and remote level like a CDN style architecture. An thew also cross mix. Markup templates are dependent on data which can be dependent on the caache and its "turtles all the way down"..... I like the simplicity of this.

You also get into a interesting situation with Design time versus Runtime also.
I use the work real time because the IDE is running in a browser and doing compilation as you change code. At non real time, you working off already compiled golang.
At real time Design time, you want your goimport to be always latest and when someone else PR#s some code that you import you want to know NOW and see it and adapt. you want to get broken by them ...
At Real time Runtime, you want to use a static version, and not break.

The same does for data. YOu have static data ( json, binaries, etc) and you have dynamic data.
At Real time Design time, you want everything to be dynamic. Don't use any caching anywhere.
At real time Runtime, you want everything that is static to never update, and everything that's dynamic to use the cache mechanism.

https://github.com/ariga/entcache is a good example of some of the semantics applied to Ent and databases.

Has multi level also: https://github.com/ariga/entcache/blob/master/internal/examples/multilevel/main.go

  • db is bottom
  • redis is next level
  • local is next level after redis.

Hey @gedw99, welcome back! ๐Ÿ‘‹ Hope you're doing well.

I am not sure if this issue / question shoudl be here or in the FS repo but here goes....

Anywhere is good with me. Worst case we can move the issue around ๐Ÿ‘

One thing i am working on is the relationship between retained mode GUI's and data caches.

Interesting, so in other words you're looking at triggering state changes in a GUI based on events from some backend server? e.g. subscribe to new issues on GitHub and auto-refresh your caches from those events.

This idea might be possible in an FS โ€“ I think it might be in its own vein too, so like FS + event source in one object. Say, subscribe to a git repo's push events and auto-detect that updates are available for FS merging locally. Or take that a step further and perform safe merges automatically in real-time.

Let say in the WASM environment you have a FS ( that is indexeddb under the hood) that is sourced from S3. When the S3 source changes, i presume there is some sort of etag header to tell you.
...
So i am wondering if this is currently supported.

I'm not super experienced with S3's APIs โ€“ for example, turns out there's an older SOAP XML one and a newer REST JSON one which the examples/s3 switches between automatically. Good news though, it appears Amazon S3 supports events, so this could totally be created if you've got the time.

It's not "supported" per se, but I see some potential in the idea and it seems possible. I think the eventing idea is orthogonal to file systems, so there's definitely plenty of room to make something that uses both!

If you have ideas of how they'd integrate into Hackpad or how to expand HackpadFS, I'm all ears. HackpadFS in particular is open for extension in custom projects since it's based primarily around interface{} support โ€“ it also happens to have some simple built-in file systems too. If "eventing FS"s are a powerful enough idea, maybe we could push the boundary there.

It will be really interesting to build quasi CDN like architectures where you can have caches at the WASM, local and remote level like a CDN style architecture.

You've actually hit on an interesting modern trend: Wasm running at the edge. Cloudflare workers make use of languages that can target Wasm for lots of powerful stuff. Caching at the edge (like a CDN) is a big draw for businesses. Moving data closer to the people that use it is a big deal, especially with large files.

... and its "turtles all the way down"..... I like the simplicity of this.

Yes! Me too. It's a big reason why I wrote HackpadFS. The abstraction is pretty darn powerful and can be nested infinitely. If you're curious, I wrote an article on the FS side of Hackpad: https://blog.johnstarich.com/write-once-store-anywhere-extensible-file-systems-for-go-65c7c0949e74

... doing compilation as you change code.

That'd be amazing for Hackpad. We definitely need better compile performance too, so this can be viable. I'm hoping the Service Worker changes will bring down compile times enough for things like this. ๐Ÿ™ (If only I could get it to stop crashing the runtime every time I tweak it! Wasm is rough sometimes...)

At real time Design time, you want your goimport to be always latest and when someone else PR#s some code that you import you want to know NOW and see it and adapt.

That's an interesting idea. I am curious how a system like this would behave in a high churn project. Maybe it could prompt before updating, sort of like the earlier git push event idea?

Sounds like a radical new way to write code, so might need further thought. Could be fun to play with those ideas a bit, or see if anyone's tried something like it before.

One idea I've bounced around is trying to make Hackpad work decently offline. In other words, give a great UX both online and offline with the Go Mod cache and "installed" Go version.

Hey @JohnStarich

Yep your understanding the concept....

BTW Minio has events via webhooks, NATS, or other transports. All the info here: https://docs.min.io/docs/minio-bucket-notification-guide.html

I actually see 2 patterns:

  1. NOT events based, so essentially polling: Use etags and cache misses. If the etags changes, the system requests it for you. Http range queries will get only the changed data for you also.
  2. Event based, so essentially push. Minio notifications basically. I think you only get told that the bucket changed

2 can be used to tell the clients and then kick off 1 perhaps. Best of both worlds where only the changed parts of a file are copied downstream ( towards the edges of the network) using ranges. Binary diffs for binary stuff like WASM could be added to HackFS :)


Here are two concrete examples of using this concept for real world stuff i am playing with:

Golang can be compiled to WASM and WASI, like spin does: https://github.com/fermyon/spin

  • hackpad can be the front end IDE...
  • we essentially get golang plugins or microservices but instead based on WASM.
  • These functions are pipelined into each other via a high level DSL. Benthos is one example of managing this, and it also runs in the browser OR on a server. Benthos has lots of goodies too.
  • works well with hackpad, which is fast enough for compiling small functions, but not large golang programs :). You would need tinygo as a "CAAS / Compiler As a Service" on the server as there is no way that tinygo will run inside a browser.
  • WASM for the Client and WASI for the Server, with HackPad giving you the IDE for all of it.

Application GUI, built from small file parts and data parts using https://github.com/ajstarks/deck and https://github.com/ajstarks/decksh

  • deck is the layout DOM
  • decksh is the mini language.
  • Runs in Browser, Desktop, Mobile, PDF and SVG.
  • Deck is static. Like a Dashboard. Its not meant to be a full gui.

The file system being the important part because you get incremental updates flowing from the Origin Server to the EDGES ( both read only servers ) and READ / WRITE GUI and you get updating dashboards from Deck.

For Spin, you want the File system events so that when a new wasm is compiled it propagates to the edges.
For Deck, when a .dsp file changes you propagate to the edges and the clients, and the client runtime reruns the .dsp pipeline.

2 can be used to tell the clients and then kick off 1 perhaps. Best of both worlds where only the changed parts of a file are copied downstream ( towards the edges of the network) using ranges. Binary diffs for binary stuff like WASM could be added to HackFS :)

That'd be pretty sweet. Do you know of any prior art, i.e. any other projects trying out eventing or perhaps using S3 notifications?

I've used https://github.com/fsnotify/fsnotify before. It's pretty good for native platforms, though I don't think it is pluggable for new platforms like Wasm.

Golang can be compiled to WASM and WASI, like spin does: https://github.com/fermyon/spin

Thanks for the link, spin looks like a cool project. I'm not sure I totally understand the full picture yet, but I like the way you think ๐Ÿ˜„ Sounds like a hybrid fat client / fat server model, where processing can be split or shared.

You would need tinygo as a "CAAS / Compiler As a Service" on the server as there is no way that tinygo will run inside a browser.

Oh, do we know for sure tinygo doesn't work? We've still got your issue open #8 and I think there's promise still.

Application GUI, built from small file parts and data parts using https://github.com/ajstarks/deck and https://github.com/ajstarks/decksh

Interesting, I suppose you're thinking deck could be used to regenerate on source file updates. Pretty cool idea. Maybe that could be an offshoot as its own separate project for making presentations in the browser, like single-user Google Slides but all client-side.

Great ideas. Sounds like we have a case to be made for things that listen to the events.

So the next question might be, can the event listener interface be standardized? Hackpad itself could use ideas before standardization, but HackpadFS might need to nail it down first. Minio looks like a decent implementation of eventing and fswatch for a native one, but each are very different from one another.

We might need to understand this space more thoroughly to define shared interfaces. We got really lucky the Go maintainers decided to start us off with familiar interfaces like opening and reading files on an io/fs.FS. "Event pubsub" in Go might have some standards already?

2 can be used to tell the clients and then kick off 1 perhaps. Best of both worlds where only the changed parts of a file are copied downstream ( towards the edges of the network) using ranges. Binary diffs for binary stuff like WASM could be added to HackFS :)

That'd be pretty sweet. Do you know of any prior art, i.e. any other projects trying out eventing or perhaps using S3 notifications?

I have used NATS with Minio before and it works great to get a notification of a file in minio being changed.

I've used https://github.com/fsnotify/fsnotify before. It's pretty good for native platforms, though I don't think it is pluggable for new platforms like Wasm.

Golang can be compiled to WASM and WASI, like spin does: https://github.com/fermyon/spin

Thanks for the link, spin looks like a cool project. I'm not sure I totally understand the full picture yet, but I like the way you think ๐Ÿ˜„ Sounds like a hybrid fat client / fat server model, where processing can be split or shared.

You would need tinygo as a "CAAS / Compiler As a Service" on the server as there is no way that tinygo will run inside a browser.

Oh, do we know for sure tinygo doesn't work? We've still got your issue open #8 and I think there's promise still.

There are moves to do this here: https://github.com/prep/wasmexec

Application GUI, built from small file parts and data parts using https://github.com/ajstarks/deck and https://github.com/ajstarks/decksh

Interesting, I suppose you're thinking deck could be used to regenerate on source file updates. Pretty cool idea. Maybe that could be an offshoot as its own separate project for making presentations in the browser, like single-user Google Slides but all client-side.

Slides client side, so people can own their data. Your FS and DB below on the client. So you can run OFFLINE !!
The Deck runtime can easily run inside WASM if we use your FS Lib. The bummer is that your FS interface does not match the standard library though, so me and @ajstarks would have to add it.

Then, we just need a co-ordination server that users can run on google cloud run, so its there but costs nothing if you not using it. The idea is for it to be just FILES, because Deck is all File driven. THis is where the File Diff concept would work well.

Deck has a server btw here: https://github.com/ajstarks/deck/blob/master/cmd/deckd/deckd.go

I also think Deck is totally amazing in how it works under the hood. Decksh has its own little language that is really clever. I think there are huge possibilities. I think also that @ajstarks is working on a nice little Editor too: ajstarks/deck#10. Happy days - You can then work from your Desktop, Mobile or web from anywhere.

Great ideas. Sounds like we have a case to be made for things that listen to the events.

So the next question might be, can the event listener interface be standardized? Hackpad itself could use ideas before standardization, but HackpadFS might need to nail it down first. Minio looks like a decent implementation of eventing and fswatch for a native one, but each are very different from one another.

I agree with your perspective - an agnostic ( works on WASM Client or Server) API for Funcs and Events makes all this come together.

We might need to understand this space more thoroughly to define shared interfaces. We got really lucky the Go maintainers decided to start us off with familiar interfaces like opening and reading files on an io/fs.FS. "Event pubsub" in Go might have some standards already?

Yeah thats the right question i think too. We might get luck and find one but i suspect we need to invent it.

Because a normal person wants to save money we can mount a File System within Google Cloud store.
So if they are NOT using the Deck, they only pay for storage, but not compute.

https://cloud.google.com/run/docs/using-network-file-systems. No idea what POSIX stuff is exposed in terms of File change events and event Content Diff / Binary Diff. I bet its expensive too.

Other way is with your S3 driver. Run Deckd in Cloud Run, and when it needs a File or needs to react to a file change it just works. Minio and i assume other s3 systems are managing LOCKING and Concurrency so it should work.
Again we would need to add your FS driver to Deck.

Hope i am not going around in circles repeating myself.. Its a long thread.

I think that Deck is a great Use case to center around, because its a File based system, and the code is really clean too.

I suspect we should work up some sort of Design Doc and also do a video chat with @ajstarks ???

Also I think this project might have what we are looking for.

https://github.com/stv0g/gose/blob/master/pkg/notifier/notifier.go

The develop has done a good job of working out all the differences between the s3 servers that exist.

This Aspect is where it really shines also: stv0g/gose#19 (comment)

Hope i am not going around in circles repeating myself.. Its a long thread.

You're good ๐Ÿ˜„ I like the brainstorming.

Maybe the best path forward here is folks making new libraries to provide FS's like an "S3-with-events" FS? HackpadFS defines interfaces and contains an S3 example sub-module, but I think a community built FS could try things like this more easily. i.e. No strict compatibility or stability restrictions.

You're welcome to kick it off by copying the S3 example to a new open source repo. ๐Ÿ‘ Seems like you could iterate through several variants of event listening to see which ones work best.

Once a few FS's pop up out there with event listening, we can definitely look at any emerging standards to include them here in HackpadFS too. I think that gives us the best chance of making a good choice.

OK i will do that. thanks !

I am going to close this. Will open a new issue when things pop up.