deislabs/hippo-cli

HIPPOFACTS and dependencies

technosophos opened this issue · 7 comments

I figured I would open an issue to brainstorm about how we might express dependencies between a local hippo app and an upstream Wasm module.

Use Case

Imagine I have a simple app. TheHIPPOFACTS file looks like this:

# Fact: The airspeed velocity of an unladen hippo is zero
[bindle]
name = "myapp"
version = "1.0.0"
description = "Does neat stuff"

[[handler]]
route = "/"
name = "myapp.wasm"

I would like to add the ability to serve static files from my app (at the path /static/...). And rather than write that code, I would like to use an existing fileserver. The HIPPOFACTS file for that project looks like this:

# Fact: Tawaret was the ancient Egyptian hippo goddess
[bindle]
name = "fileserver"
version = "0.2.0"
description = "Provides static file serving for Wagi"

[[handler]]
route = "/static/..."
name = "fileserver.gr.wasm"
files = ["README.md", "LICENSE.txt"]

(While the actual artifact we care about is the invoice.toml, the HIPPOFACTS above gives us all the information we could reasonably expect a user to know about a Bindle).

So how might I, as a hippofactory user, express my desire to use the fileserver inside of my own app.

Option 1: Out-of-band Handling

It is perfectly reasonable to say that the solution to this is that the user figures out how to get a copy of fileserver.gr.wasm on their own, download it locally, and include it directly:

# Fact: The airspeed velocity of an unladen hippo is zero
[bindle]
name = "myapp"
version = "1.0.0"
description = "Does neat stuff"

[[handler]]
route = "/"
name = "myapp.wasm"

[[handler]]
route = "/static/..."
name = "fileserver.gr.wasm"
files = ["index.html", "style.css"]

In this case, the user merely adds the downloaded Wasm module to their HIPPOFACTS, and the user takes on all of the responsibilities of managing that module.

Option 2: Add Dependencies in HIPPOFACTS

In this option, hippofactory is extended to declare additional dependencies more like Cargo.toml or package.json. Because bindles are immutable, we can punt here on the entire topic of lockfiles and such and focus just on the DevEx for now.

In this case, we allow a user to declare, in HIPPOFACTS that the user intends to use parcels located in an existing bindle. One possible syntax for this is:

# Fact: The airspeed velocity of an unladen hippo is zero
[bindle]
name = "myapp"
version = "1.0.0"
description = "Does neat stuff"

[[handler]]
route = "/"
name = "myapp.wasm"

[[dependency]]
[dependency.bindle]
name = "fileserver/0.2.0"   # Or whatever the actual bindle name is
[dependency.handler]
route = "/static/..."

While the exact structure of the [[dependency]] object is certainly a wide open area for conversation, the example above illustrates two features that I think are necessary:

  • It needs an unambiguous way to address the bindle and its parcels
  • It needs an unambiguous way to bind one or more parcels to a handler clause.

Let's treat each one separately:

Addressing a Bindle and Parcels

A bindle is composed of one or more parcels organized into groups. When pulling a bindle into an app, the user may have to make some decisions about how that bindle is to be pulled in.

For example, the bindle for our fileserver application has just one Wasm parcel. But a bindle could have several Wasm parcels, each doing a different thing. Bindle's design philosophy makes it possible for one parcel to declare dependencies on other parcels. And it also makes it possible to switch parcels on or off based on group membership or features.

So a key ability when importing a bindle is to be able to specify which parcels you want. And the traditional means of doing so are through specifying groups and features.

  • dependency
    • bindle: Object. The top-level description of a bindle
      • name: String (REQUIRED). The full name of a bindle, e.g. example.com/foo/1.2.3-alpha.99
      • groups: Array. Zero or more group names. Any group listed here is included in full (all parcels) unless a feature flag turns off the parcel.
      • features: Map<String, String>. Feature name and feature value to enable: (feature.wagi.file, true)
      • parcel: String. The SHA (or we could do the name, which is probably better) for an exact parcel to pull (maybe not a good idea) If this is specified, groups and features are ignored
      • excludeGlobalGroup: boolean. If set to true, the global group will not be imported from the parcel.

Given a dependency.bindle definition, the runtime should be able to determine what bindle to load, and which parcels to fetch for that bindle.

Binding parcels to features

The previous definition gave us a bindle and associated parcels, but it provided no instructions on how those parcels are to be included in the application. My suggestion is that we include a handler definition in a dependency, and that this definition matches the handler definition for a local object:

[[dependency]]
[dependency.bindle]
name = "fileserver/0.2.0"  # Pull the fileserver bindle and use its defaults (global group, no special features)
[dependency.handler]
name = "fielserver.gr.wasm"  # name of the parcel
route = "/resources/..."    # Override the `route` feature on the `fileserver.gr.wasm` parcel

The fields are the same as those on an existing HIPPOFACTS file, but the following clarications apply:

  • name: This refers to the parcel name within the Bindle. As a design constraint on this system, parcel names should be unique.
  • Features: When a feature is specified (e.g. route = or host = ), it will override the feature on the imported module. We might need a reserved way of unsetting a value. (e.g. route = "-" effectively sets route to its empty value)

An open question: File parcels

Right now files are attached to a handler using the files array:

[[handler]]
route = "/static/..."
name = "fileserver.gr.wasm"
files = ["index.html", "style.css"]

When pulling in a bindle and its parcels, it is desirable that we pull in the file parcels attached to it.

But, as the present fileserver case illustrates, it may also be desirable to supply files from the local project to be loaded into the external parcel. E.g. if I load a fileserver parcel, it is very likely that I will want to tell that external parcel which of my local files I want it to serve.

It seems there are two sets of goals, then:

  • I want to manage which file parcels I load from upstream, with default being "all of them"
  • I want to manage which local files I want to attach to the upstream parcel, with the default being "none of them"

Here are some example use-cases:

Upstream Parcel Local Files Desired Outcome
index.html index.html local index.html
- my.js local my.js
style.css - upstream style.css
README.md - I don't want the upstream, but I don't want to override

The last case illustrates an intent to "unset" a file that appears in the upstream parcel without replacing it with a local file. E.g. I just don't want a README.md at all.

Because a handler can easily have hundreds of files attached, it does not seem like manually building a list would be a good approach.

Possible solutions:

  1. Default to local files only, and require explicit inclusions for the parcel: files = ['local.txt', 'parcel:README.md'].
  2. Default to parcel files, but whenever files is specified, use only local files (e.g. all parcels or all locals, no mixing)
  3. Local files are additive. The parcel files are all turned on by default, and any files = [] appends to the list. When duplicates occur, the local overrides the parcel file.
  4. Provide a parcelFiles directive in addition to files: files = ['mylocal.txt'] and parcelFiles = ['README.md'], with the result being the union of the two (with naming conflicts favoring local)
  5. Provide a omitParcelFiles directive that removes parcel files, and default to all parcel files. Then use the same strategy as #4 to resolve

The issue I have is that we don't want this process to be burdensome to the user. All of these feel either burdensome or too limiting.

I've been noodling on this at the Hippofactory level and I'm concerned that we are going to end up duplicating a lot of stuff from the Bindle logic.

The base case, where we want to reference a Wasm module that exists as a parcel in another bindle, but is not present locally, is relatively simple at the technical level. But it feels awkward that the Hippo user needs to know the module structure of the bindle from which they are using a module - what if, in a fit of enthusiasm, you port the static file server from Grain to OCaml, and the serving module changes from fileserver.gr.wasm to file_server.wasm? Is that the business of the poor sod who just wants to serve a bit of CSS?

And when we start getting into the "what if that module requires other parcels" - well, now we have to parse out that dependency tree and copy all those entries into the application bindle... well, this feels like the client is repeatedly duplicating information that should come in by reference. And as you identify, what about conflicts? Because I shouldn't have to know how the static file server or whatever works, I don't know what files it might need today or in future. In option 3, what if a new version requires a new parcel file that is overwritten by a local file? In option 1, what if a new version requires a new parcel file that I'm not aware of?

So I'd like to approach this from the point of view of abstraction. How can I bring in a service implemented in another bindle without knowing the internal implementation details of that service? Can we have something in deislabs.io/fileserver/1.0.0 - say, a well-known annotation - that says "hey, WAGI, when I am plonked on a route, this is the parcel I want you to wire up"? And some way for its own requires files to be kept separate from any files the consuming app wants to associate with it?

This could be a client concern. It might also be worth exploring whether this is something that WAGI itself should understand, so that WAGI is able to provide this kind of composition independently of Hippo or HF - I'm not sure if that's within WAGI's remit. But anyway I would like to consider whether we can provide mechanisms for abstraction so that the user is freed from having to know handler implementation.

The base case, where we want to reference a Wasm module that exists as a parcel in another bindle, but is not present locally, is relatively simple at the technical level. But it feels awkward that the Hippo user needs to know the module structure of the bindle from which they are using a module - what if, in a fit of enthusiasm, you port the static file server from Grain to OCaml, and the serving module changes from fileserver.gr.wasm to file_server.wasm? Is that the business of the poor sod who just wants to serve a bit of CSS?

In your example, yes, by renaming fileserver.gr.wasm to file_server.wasm, I would be introducing a breaking change that the user would have to respond to. But I don't feel like this is any different than any other breaking change in this giant Tower of Babel that is software development.

I don't think we would be requiring knowledge of the bindle, though. Knowledge of the upstream HIPPOFACTS would be sufficient. And on an optimistic note:

  1. We can hope that package authors will provide instructions in their README (the rationale behind #21)
  2. We can write tooling to inspect a bindle or Hippofacts to provide guidance.

And when we start getting into the "what if that module requires other parcels" - well, now we have to parse out that dependency tree and copy all those entries into the application bindle... well, this feels like the client is repeatedly duplicating information that should come in by reference. And as you identify, what about conflicts? Because I shouldn't have to know how the static file server or whatever works, I don't know what files it might need today or in future. In option 3, what if a new version requires a new parcel file that is overwritten by a local file? In option 1, what if a new version requires a new parcel file that I'm not aware of?

Yes, behind the scenes there will be some dependency resolution. Fortunately, it's a "flat" resolution because each bindle contains a definitive list of parcels that it needs. (That is, we don't have to walk an n-deep tree of bindles). I'm not entirely clear if that is your concern, or if you are just worried about n-deep parcel/group trees. But, yes, by design bindles will have internal structure that will need to be respected when using them in HIPPOFACTS. This doesn't differ from any other package management system, though.

Conflicts represent only a limited surface area of a bindle. We do have to handle conflicts in the case where any given bindle potentially conflicts with a local file, but only in that one case listed above where the user has to indicate which file they want to use in the files list.

So I'd like to approach this from the point of view of abstraction. How can I bring in a service implemented in another bindle without knowing the internal implementation details of that service? Can we have something in deislabs.io/fileserver/1.0.0 - say, a well-known annotation - that says "hey, WAGI, when I am plonked on a route, this is the parcel I want you to wire up"? And some way for its own requires files to be kept separate from any files the consuming app wants to associate with it?

We could absolutely use the annotations field on a bindle to provide information about a default configuration. That's a great idea. Then a trivial dependency would just be something like:

[[dependency]]
bindle.name = "deislabs.io/fileserver/1.0.0

And the rest would be automatically resolved. I'm not sure I see how your proposal would work with the "OR-case" I presented above, where I want some files from the upstream bindle to be made available, but want local overrides of others. But that, I think, could be worked out while still allowing a bindle to declare its default configuration.

That said, I do think the original spec gives us desirable flexibility for allowing (a) a bindle dev to specify multiple ways of using their bindle, and (b) a bindle user a high degree of configurability. I think it makes sense to retain those features, while making the simple case really simple.

This could be a client concern. It might also be worth exploring whether this is something that WAGI itself should understand, so that WAGI is able to provide this kind of composition independently of Hippo or HF - I'm not sure if that's within WAGI's remit. But anyway I would like to consider whether we can provide mechanisms for abstraction so that the user is freed from having to know handler implementation.

I am not opposed to Wagi having the ability to do more. BUT, the artifact of record should be a bindle, and a bindle does not have any notion of external dependencies because its dependencies are computed at packaging-time (which makes them actually immutable instead of "immutable by way of a pin file that still has external references that can break when left-pad suddenly disappears from the Interwebs").

The system I originally proposed was designed to render dependencies at hippofactory time, and then produce a definitive bindle that had exactly the parcels necessary for execution. This does leave us wiggle room (via features and groups) to toggle things on and off at runtime. So perhaps what you are proposing could be done that way. And I'm certainly happy to define new feature/group behaviors in Wagi.

In a related discussion, it was asked whether we could support identifying a specific file parcel by bindle + name instead of bindle + sha.

The bindle + sha format is: BINDLE_NAME/VERSION@PARCEL_SHA. We probably could support something like BINDLE_NAME/VERSION#PARCEL_NAME (though the spec does not enforce that the parcel name is unique to the bindle).

Another idea, thinking out loud, let's suppose we introduce a parcel annotation or feature, say wagi.handler, whose value is a string that forms the contract independently of the parcel name. For example:

[[parcel]]
label.name = "fileserver.gr.wasm"
label.sha256 = "57A71C"
label.annotations.wagi.handler = "static"

Then a HIPPOFACTS file can specify that with something like:

# HIPPO FACT: There are only three species of hippo, and one of those is made up
[[handler]]
route = "/styles"
external.bindle_id = "deislabs/fileserver/1.1.0"
external.handler = "static"
files = ["styles/*.css"]

When Hippofactory sees this, it searches the invoice for a parcel with a wagi.handler annotation whose value corresponds to the given handler name, then translates the spec it into something like:

[[parcel]]
# Obtained from the deislabs/fileserver/1.1.0 invoice
label.name = "fileserver.gr.wasm"
label.sha256 = "57A71C"
# Derived from HIPPOFACTS
label.feature.wagi.route = "/styles"
conditions.requires = ["some_suitably_unique_group_name"]

[[parcel]]
label.name = "hotpink.css"
conditions.memberOf = "some_suitably_unique_group_name"

I am not sure if this adds anything useful over a single bindle-level wagi.defaultHandler annotation. Just throwing it out there.

None of this really helps me past the issues of require tree chasing and conflict resolution, but I have to go home now, so might try to have a brainstorm with someone about this tomorrow.

That does make sense to me.

After talking to @technosophos we have the following plan:

  • Mainly targeting the standalone module import scenario a la static files
    • Each route and module has to be mapped separately
    • If a "library" contains multiple modules that make assumptions about relative URLs, then a consumer must import each module separately, and adhere to those assumptions (or the library can provide a way to configure the links)
    • This does not preclude a richer embedding solution down the road: we are just not going to try to boil that ocean right now
  • Use an alias or identifier as described above, to decouple logical from physical structure
  • If the importee parcel has a requires then copy that entire tree
    • It is safe to munge importee group names to avoid clashes with the consumer
    • It is probably safe to flatten importee groups because we do not need to preserve other avenues of inclusion

We haven't really tackled the question of what to do about file name clashes if the importee does bring files in by requires. I for one am tempted to punt on this for now - make a clash an error in Hippofactory, and when someone complains that will give us a motivating example of what to do!

make a clash an error in Hippofactory, and when someone complains that will give us a motivating example of what to do!

That sounds like an excellent plan. That opens a dialogue with end users who need this, and we can form more opinions at that time.