ar-io/ar-io-node

feat(middleware): path manifest fallback to `*`

Opened this issue · 5 comments

BACKGROUND: https://specs.g8way.io/#/view/lXLd0OPwo-dJLB_Amz5jgIeDhiOkjXuM3-r0H_aiNj0

Modern js frameworks rely on push state, this is a feature where if the path request is not found, it will return the index file. This allows modern js frameworks to handle routing and hyperlinks in a web-native way, with no # hash routing required.

If gateways could handle a not_found request for any route from a path manifest TX id with the result returning the '*' property of the manifest, we could support many modern js frameworks out of the box with zero workarounds.

This issue is a request to build such a feature in middleware so that it does not impact the development of the gateway and can be installed by any gateway operator.

Here is an example:

{
  "manifest": "arweave/paths",
  "version": "0.1.0",
  "index": {
    "path": "index.html"
  },
  "paths": {
    "index.html": {
      "id": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI"
    },
    "js/style.css": {
      "id": "fZ4d7bkCAUiXSfo3zFsPiQvpLVKVtXUKB6kiLNt2XVQ"
    },
    "css/style.css": {
      "id": "fZ4d7bkCAUiXSfo3zFsPiQvpLVKVtXUKB6kiLNt2XVQ"
    },
    "css/mobile.css": {
      "id": "fZ4d7bkCAUiXSfo3zFsPiQvpLVKVtXUKB6kiLNt2XVQ"
    },
    "assets/img/logo.png": {
      "id": "QYWh-QsozsYu2wor0ZygI5Zoa_fRYFc8_X1RkYmw_fU"
    },
    "assets/img/icon.png": {
      "id": "0543SMRGYuGKTaqLzmpOyK4AxAB96Fra2guHzYxjRGo"
    },
    "*": {
      "id": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI"
   }
  }
}

@twilson63 Having the wildcard in the path list means (absent mandatory sorting) you have to parse the entire list before you can be certain whether it's there. In the interest of moving towards more efficient manifest parsing, we should consider extending the top level index object to specify the wild card ID rather than putting it in the path list. It would also be great to lock down the expected order of top level keys (paths should be last) and add the ability to specify an ID as the index rather than a path.

Below is one option. Though, as I type it, I think I might lean towards adding a top level key instead of overloading "index".

"index": {
    "id": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI",
    "*": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI",
},

A new top level key might look like this (note: the ids are only coincidentally the same):

"index": {
  "id": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI",
},
"wildcard": {
  "id": "cG7Hdi_iTQPoEYgQJFqJ8NMpN4KoZ-vH_j7pG4iP7NI",
}

I also think we should take this as an opportunity to specifying both that paths are assumed to be sorted when parsing manifests on-demand (as opposed to when indexing or serving from an index) and specify a size limit for on-demand manifest parsing.

I think I would go with * as the top level key too (same as original, but top level). wildcard just feels a little unwieldy.

Summarizing where we are:

The following spec adjustements are desirable to support efficient parsing:

  1. Specify manifest, version, index, and * should occur before paths.
  2. Allow index and * to be specified using an id instead of a path.
  3. Specify that paths should be sorted using utf8 binary collation.
  4. Specify that for on-demand parsing (vs background indexing) manifests must do 1 and 3 and use IDs instead of paths for index and *.
  5. Specify that manifests greater than 256KB may not be parsed on-demand.

Work on this feature could begin on a fork prior to spec updates provided it assumes the above adjustments will be made to the spec.