[RFC] Assets hashing

Question

[RFC] Assets hashing

Opened this issue 3 years ago · 4 comments

As I said on Discord, I wanted to write a ppx for static assets hashing and cache bursting.
The idea is like this:

let css_url = [%assets "style.css" ]

In normal mode: return the string as is.
In release mode: Copy style.css to style.content_hash.css and return style.content_hash.css
Ppx would throws when file isn't exists

However after careful consideration, I decided that it would be a runtime library rather than build time.
Here are some reasons I can think of:

Decoupled between backend and frontend build.
-> Parallel build
-> Re-deploy frontend codes without compiling the backend again
It's really annoying during development when I have a watcher to re-generate assets and build fails because the file isn't exists.
This allows us to easily using CDN/ switching CDN at runtime.

What I am proposing would be configuration file in JSON format like this:

{
  prefix: "/" // this could be CDN URL or absolute path to where you mount your static assets
  assets: {
    "style.css":  "style.content_hash.css",
    "script.css": "script.content_hash.css"
  }
}

How is this loaded is up to the implementor, could be a file during application start up, could be via POST request.

TODO:

Write the runtime script
Figure out the hashing algorithm to do content hashing (look at what webpack/rollup is using). Prefer something that's written in pure OCaml for easy packing
Write asset hashing executable
Publishing

What do you think?

Answer 1 · 2021-07-05T15:28:52.000Z

On the other hand, using a build time approach, one could inline the content of small images directly to string like webpack/rollup does.
Or one could write a ppx that does different things based on the arguments. But I don’t know how to write ppxes that is more complex than replacing a string with a string 😂

Answer 2 · 2021-07-05T15:37:18.000Z

On the topic of hashing, rollup uses

xxhash (32/64): https://github.com/314eter/ocaml-xxhash
metrohash (no ocaml bindings that I can find)

Update:

dream is using mirage-digestif but it's library for cryptographic hashes. I would prefer a non-cryptographic hash for tasks like this

Answer 3 · 2021-07-05T20:09:30.000Z

It all looks feasible. It looks largely independent from Dream itself. We would basically just call out from templates (or anywhere else) to some library to figure out the real name of an asset, right? I guess that library would need to know how to read the output of the hasher, and serve as the "runtime library" linked into server deployments.

Answer 4 · 2021-07-05T20:42:12.000Z

@aantron yes. It should work independent of dream but I only interested in build for dream right now. So I want to open discussions here and looking for comments/feedbacks on the design