aquasecurity/btfhub

storage for btfhub artifacts

itaysk opened this issue · 5 comments

we use git to store both the btf generation script, and also the resulting btf artifacts. this was very easy for the poc stage of btfhub, but now is a good opportunity to reevaluate the storage requirements. If we want to stay in github world, we have:

basic requirements are:

  • being able to HTTP GET every file individually
  • Reputable as stable and robust
  • easy to manage artifact (add/update/remove BTFs)
  • download stats

(note it doesn't have to be a free service. paid services are also fine)

  1. git (current solution)
    1. I'm not sure we really need a VCS to manage the BTF artifacts. we are not interested in their evolutions over time, or in their history. managing blobs in git is not ideal.
    2. combining the generation script (code) with the btf files (artifact) is unnecessary and pollutes the git history and makes it harder to collaborate on the script (which is the only thing that requires collaboration).
    3. no stats
    4. managing all artifacts in the same repo also results in huge repo which is hard to clone
    5. easy to get all btfs in one action (git clone) but this is a rare use case.
  2. git with lfs
    1. still the same cons as git option except issue 1.4 (hard to clone)
    2. requires a client-side tooling
  3. GitHub Releases (we can create a release and add individual btfs as "Assets")
    1. suitable for external distribution of artifacts
    2. code and artifacts remain in same repo
    3. maybe harder to manipulate existing assets
    4. not sure how it will throttle/limit in the future. seems like there are no hard limits
  4. AWS S3
    1. not closely coupled with the github repo
    2. most flexible for future requirements

I'm in favor of option 4 - AWS S3.

I would be in favor of GitHub Releases. Assuming we do get download stats, it marks all the basic requirements, plus it'd be easier to manage than S3.

easy to get all btfs in one action (git clone) but this is a rare use case.

I think this use case is important for BTFGen. I also think that in any case we could have a "sync" script that downloads all available BTFs.

@mauriciovasquezbernal agree. This was written with use case of apps using BTF Hub directly and downloading the BTF for the runtime environment. But with BTF Gen it's probably not very appealing anymore

We have decided on the following:

  1. btfhub repo will be the "landing" repo and contain the tooling and docs, but not the archive.
  2. btfhub-archive is a new repo that will contain just the released artifacts.
    1. it seems that some users prefer to git clone the archive which this will allow.
    2. will not use LFS due to dependency on extra client.
  3. A GitHub workflow on btfhub will build BTFs and push them to btfhub-archive.
    1. The script will groom the git history to reduce cloning overhead.
  4. In the future, we might want to also use GH Releases on btfhub repo. out of scope for now.

This has been sorted out already.