Arlodotexe/brain-dump

Distributed build validation via Git + IPFS + PowerShell

Opened this issue · 1 comments

🚨 This has been ported from Arlodotexe/strix-music#201

Background

Strix has pledged to make an effort to keep the project "perpetually preserved". Put another way, we should avoid relying on things with a single point of failure. The repo should contain everything needed to operate the project, and a release should contain a repo backup + all dependencies expected to be available to build the project, published to IPFS.

In order to keep things resilient, we use IPFS extensively to back up and host the content on our website (docs, WASM app, app binaries, build dependencies, source backup, published nuget packages, etc)

As a result of all this, Strix has a lot of build scripts. They contain everything needed to create a release, and there's a guide[1], [2] to help string them together into a working solution that can download build dependencies, generate documentation, generate our website, create release tags, generate changelogs, and more (with a few more coming in #203 and #202).

Problem

For something that should be the equivalent of "yes this commit built successfully", DevOps and GitHub Actions may be extremely convenient, but it's complex, expensive and doesn't play into the spirit of Git.

Further, git itself has no built-in features that enable contributors to verify "yes, this commit built successfully".

We can do better.

Proposed solution

(proposal has been amended, see comments)
(Prerequisite: #203)

Leveraging our existing build scripts and Git's already decentralized nature, we can combine Git with IPFS to create a distributed build validation system.

Since PowerShell is used in a lot of CI scenarios (and we've already leaned into it because of this), we'll be developing a PowerShell Module designed to work with Kubo.

First draft:
Whiteboard (3)

Scripts

  • Long running script for code owners
    • Can be any machine controlled by someone with full repo permissions (code owners, volunteers)
    • Performs tagging autonomously on validator's behalf
    • May require extra abuse prevention
    • Coordinates over IPFS with other running instances of this script
  • Run and done script for build validators
    • Anybody, any machine (maintainers, contributes, traditional CI)
    • Assumes limited repo permissions
    • Wants to validate that they can successfully build a commit
    • Configuration:
      • What command to use for build validation (as of the built commit)
      • Define glob paths of built content to upload to IPFS (as of the built commit)
  • Run and done script to check commit validation
    • List the validations on a commit (show data)
    • Should be useful to check against PRs

Node settings

  • External node
    • Custom IPFS API url
  • Embedded node
    • Download and run temporary IPFS node
    • Custom repo path
    • Custom go-ipfs config file. Allows for fine-grain control over the embedded node.

Other

  • User-defined script to validate a folder of build assets
  • Common configuration settings
    • Pre-shared passkey (optional)
      • To encrypt/decrypt pubsub messages
      • To encrypt/decrypt content

Proposal amendments

After more research into the inner workings of Git, I've found that we may be able to lean on Git even more, and massively simplify our setup in the process.

Permission management

It's a solved problem

Most services like GitHub and Azure DevOps already come with built-in permission management for the upstream repository, for both Branches and Tags.

  • We can rely on the developer to set up this security
  • This gives more freedom to the validator on where results are pushed to.
  • Annotated git tags can be signed and verified with GPG (docs), which can be leaned on as a trust system.
  • This means we can allow contributors with repo push permissions to push build validation tags directly.
  • For forks, where contributors don't have repo push perms.
    • They could push to their own fork, and we could add that as a remote, swap branches, and continue as usual.
      • We should be able to push the tag to our remote on their behalf
      • Will need to pull and swap to a branch from a second remote.
      • Doing this also pushes the commits in the tag to the remote, even if the commits don't exist on a branch.
      • git gc should clean up automatically, in case the fork's branch is never merged.

Preserving validated commits

Tags, branches, and you

Git tags are only done against a specific commit, and cannot point to an branches. This means it's on us to make sure a built/validated commit is preserved in git history when merging branch.

A commit's hash is an immutable object that represents a set of changes, plus additional git information such as author, time, messages, etc, and a link to the parent commit, meaning it indirectly references all previous changes as well. It accurately represents the entire current state of the repo.

Can we build it?

I've verified the following behavior:

  • If you merge the upstream branch into your current branch first, you can build/validate that commit while still in PR, and as long as history is linear and your commit is preserved in history, people can see that you were able to build successfully!
  • The above behavior is true even when there are merge conflicts
  • As long as history is linear, the merge commit can be validated before or after the actual merge and will stay preserved.
  • (!!!) If history is NOT linear, such as when squashing, the commit that the contributor built/validated won't be included in the upstream branch, and will be lost.
    • Solution: squash locally, push only tags for that commit
      • Can we tag a nonexistent commit? - Yes, as long as you can switch to it.
      • Can we push a tag for a nonexistent commit?
        • Yes, even if the branch was deleted
        • Doing this also pushes the commits in the tag to the remote, even if the commits don't exist on a branch.
        • git gc should take care of this automatically
      • Can we predict the squash commit hash? - Yes, if you can pull both branches. Just do the squash locally

Yes, we can

No matter what merge strategy is used, we have the ability to preserve arbitrary information in tags, as long as the code exists on the remote.

The only caveat is that unless we know what merge strategy is going to be used, we'll have to create tags for both the upstream merge commit and a squash commit. We can make this part of the config on the "build validator" side of things.

I'll have an updated graphic soom:tm:., once we're done iterating.

Long term considerations

IPLD Integration

  • https://github.com/ipfs/go-ipld-git
  • Would integrate nicely into these scripts without changes
  • Instead of storing the data in tags, store the CID as a "link" in IPLD
  • Is there a way we can manipulate Git to make this translation happen automatically in the future? Needs investigation.
  • If the above is possible while maintaining the link to a Tag, it's MUCH better than throwing the CID into the tag message.
  • Git allows you to create arbitrary blob objects, create object trees from scratch, and link to either of them in a commit. All of these would work natively with any IPLD translation layer.
  • So far, I haven't found an easy way to do this. We may need to settle for putting the CID in a structured tag message and make a converter later if needed.