purescript/registry-dev

Migrate operation validation into library

thomashoneyman opened this issue · 4 comments

Currently all validation for the Operation type is done via the JSON payload decoding or via checks implemented in the API module. However, we'll soon be moving the API module out of the src directory so it is not exposed as a library. We still want the operation verification to be runnable locally via Spago, however. With this in mind, below is my proposal for a new Registry.Operation module.


Operation Checks

Each registry operation is guarded by validation before its effects are executed (e.g. a package is uploaded, metadata is written). Since the registry runs asynchronously it's critical that this validation can also be run by package managers. And since the registry and package managers will run in different environments, it's critical that effects are injected. Some effects include:

  • License detection
  • Logging
  • Reading metadata, registry index files

The idea is that when you run:

$ spago publish
$ spago unpublish --version 0.7.0 --reason "Committed credentials."
$ spago transfer

then Spago will execute the same validation that the registry would run for each by depending on the registry as a library and providing its own effects. Here I am also making two assumptions:

  1. That we are going to unify the Addition and Update operations into a single Publish operation
  2. That we are going to preserve package set publishing as something coordinated via GitHub issues, not via an API running on a server (and therefore checks do not need to be exported for Spago, though we could do so).

Operations.purs

Unauthenticated Operations

For unauthenticated operations, the registry begins by verifying the JSON is well-formed. Package managers do not need to do this check, but they need to produce the correct JSON.

Publish

{ "package": "prelude", "location"?: ..., "ref": ... }

At first we only have the API payload. We use it to attempt to read the package metadata, which we then verify. To be publishable, either:

  1. The package name is not registered (ie. no metadata exists), a location is provided, and the location is not already in use. If the package is new (no metadata exists) then metadata should be created.
  2. The package name is registered (ie. metadata exists), and one of:
    1. No location is provided
    2. The provided location matches the package metadata.
    4. The provided location supports the subdir key, and all fields except the subdir key matches the package metadata.

At this point we have the API payload and package metadata. We know that the package is publishable. Next, we fetch the package source and verify it.

  1. The package must contain a src directory containing.purs files.
  2. The package must contain a valid purs.json manifest or spago.yaml manifest. In the presence of multiple manifest files the purs.json file takes precedence and all others are ignored.

Now we have a valid manifest we can ensure that the API call, manifest, metadata, and package sources are reconciled (these can be in parallel):

  1. The manifest package name and API package name must match.
  2. The manifest version must not have been published or unpublished before, according to the metadata.
  3. The manifest location and API location must match (if an API location is provided). The manifest location and metadata location must match, except for the subdir key. If this key is different, then the metadata should be updated.
  4. The manifest 'owners' field should update the metadata owners field.
  5. The manifest dependencies should be solvable by the registry solver.
  6. If a LICENSE file is present in the package source, or a license is listed in a package.json or bower.json file, then the manifest license must match all specified licenses. (Inject the license detection logic as an effect).

Note that we don't literally update the metadata and write it here -- we just note that it needs to be updated.

Next, we process the package source. We make a temporary directory and remove ignored files. Further verification assumes work on this processed package source, not the original repository. (This ensures e.g. Spago can do the same thing, without deleting user files). We do two things in parallel:

  1. We verify that ignored files are not present in the source directory. Then we produce a tarball of the contents and check the size of the tarball. The tarball size must be less than the registry maximum. (The tar effect should be injected).
  2. (API ONLY) We verify that the package dependencies can be solved by the registry and the package can be built.
    1. First we produce resolutions from the manifest dependencies (this should be injected, with possible errors returned – so Spago can just read the lockfile, for example).
    2. Then we compile the package using the resolutions, and the API-provided compiler version, returning errors if the compiler reports any.

Authenticated Operations

For authenticated operations, the registry begins by ensuring the JSON is well-formed (package managers don't need to). Then, it verifies the payload. If everything checks out, it verifies ownership of the package and then processes the operation.

To verify package ownership the registry uses the owners listed in the metadata to see available public keys and email addresses. Then, it uses ssh-keygen to verify a payload along with an email address and SSH signature. This can be a check, if the call to ssh-keygen is injected as an effect.

We can run signature verification in parallel with other verification checks.

Transfer

To verify a transfer we must have access to the metadata for all packages.

  1. If the metadata does not exist, then the package cannot be transferred.
  2. If the metadata exists for the given package, then we can verify the API-provided new location.
    1. The new location cannot be the same as the old location.
    2. The new location cannot already be in use by another package.

Unpublish

To verify an unpublish for a given version we must have access to the package metadata.

  1. The metadata must exist for the package
  2. The indicated version must be published and must not have been unpublished before
  3. The indicated version must have been published within the last 48 hours

If these conditions are met, the package version can be unpublished.

The idea is to implement these checks as individual, pure functions, and then to accumulate them into pipelines that Spago or the registry can run. Those pipelines can then do logging and other effects, but with all effects supplied by the registry or Spago.

Colin and I have discussed this, and the implementation is becoming much too complex due to cross-cutting concerns:

  1. Implementing individual checks
  2. Aggregating checks into a pipeline
  3. Notifying subscribers when a check starts, warns, fails, and/or succeeds
  4. Reporting errors immediately, and aggregating errors overall
  5. Adding debug logs

We're taking a step back and focusing on the core requirement here: checks that can be shared by the Registry and Spago. To that end, the first cut of this will just pull individual validation functions into an Operation.Validation module with accompanying tests, and those will be reused by the Registry.App.API module.

In later iterations we can have a fuller discussion around how to achieve other goals, like ensuring the Spago / Registry pipelines match, or how to run a pipeline that emits events representing the status of various checks.

f-f commented

This sounds good - after all Spago only needs to care about being able to run the checks in the same order as the Registry, so we can manage with just the single checks factored into functions at first.
Once we have the two implementations side by side we can then figure out how to unify them and share the code.

Closed in #561