kubernetes-sigs/cluster-api

Review hashicorp license change impact

killianmuldoon opened this issue ยท 24 comments

Recently hashicorp announced changes to some of its licenses. This issue is a place to assess the impact on Cluster API.

CNCF issue: cncf/foundation#617

None of the dependencies CAPI imports have been updated to the Business source license.

Note: The original finding in this issue are irrelevant at this point. See below for the current state and plan-of-action.

Initial findings

Generate list of all modules which are importing a hashicorp module. This must be done for each of the three go modules in the CAPI repo.

go mod graph | grep "\s.*hashicorp" > hashicorp_modules
cd test ; go mod graph | grep "\s.*hashicorp" >> ../hashicorp_modules
cd ../hack/tools ; go mod graph | grep "\s.*hashicorp" >> ../../hashicorp_modules
cd ../..

Get a unique sorted list of hashicorp modules that are being imported

cat hashicorp_modules | cut -d ' ' -f 2 | sort -u -o hashicorp_modules

The end result of the above is a list of 27 modules. Some are the same module with a different version.

Outstanding tasks:

This list reflects the current state of the master branches. These can change in future versions.

It could be good idea to add license scanning on PRs so that if a version updates to a non-compatible license we can catch it. Not sure what k/k is doing in this area right now.

Looks like trivvy has this functionality - will take a look at how it works and whether it's useful.

Thx for looking into this!

/triage accepted

Thanks @killianmuldoon ! We might want to broadcast the news at this week community meeting, there is usage in other Cluster API repositories as well.

Looks like they are blocking dependencies because of MPL (because "MPL license not in CNCF allowlist") Would be interesting to know how they enforce this config (and eg if they check only go.mod or also go.sum)

Also would be good to know why we have those dependencies

Looks like they are blocking dependencies because of MPL (because "MPL license not in CNCF allowlist") Would be interesting to know how they enforce this config (and eg if they check only go.mod or also go.sum)
Also would be good to know why we have those dependencies

Their tool only checks go.mod AFAIU, and they have those dependencies in the go.sum. I'm still doing some research on how the CNCF sees go.sum dependencies, but I haven't found anything concrete yet.

Thanks @killianmuldoon looking into this!

I recently implemented a feature in CR for authorized metrics. I ended up implementing it in a way that you only actually get dependencies like k8s.io/apiserver in your go.mod file when you are using it (i.e. importing a certain package). If you don't use it, it will only end up in go.sum.

So my rough guess is that dependencies which are only in go.sum are not actually getting compiled into your binary because you don't need any of their packages. They are probably just part of "what if you include all dependencies of all dependencies without looking at specific packages". But really just a guess.

Probably we should just ask in some Kubernetes Slack channel about the background of only checking go.mod and they can tell us (it seems k/k is aligning to CNCF recommendations)

nrb commented

In terms of tooling to help mitigate these issues on an ongoing basis, https://compliance.linuxfoundation.org/references/tools/ mentions some tools for licensing compliance. It appears the LF-sponsored project was spun down in 2020, but Fossology is active.

The downside to Fossology from what I see is that it appears to be designed as a live database system, which would require extra hosting.

I would definitely like to align to upstream k/k if somehow possible

I would definitely like to align to upstream k/k if somehow possible

The #k8s-code-organization group is a good group to talk to about go dependency management, it's usually mostly focused on kubernetes/kubernetes under sig architecture but more general dependency management comes up.

So my rough guess is that dependencies which are only in go.sum are not actually getting compiled into your binary because you don't need any of their packages. They are probably just part of "what if you include all dependencies of all dependencies without looking at specific packages". But really just a guess.

go.sum contains all observed dependencies in the module graph. When go computes what versions should be used it looks at the entire dependency graph of all transitive deps, even if they're not going to wind up in the main module / binary.

In go1.17+ modules (NOTE: that's NOT the go compiler version, it's $x in go $x statement in the go.mod file, controlling the language version of the module), go.mod contains all required dependencies to build and test the main module. Reference: https://tip.golang.org/doc/go1.18#go-mod-download

As for tooling, see https://kubernetes.slack.com/archives/CHGFYJVAN/p1692213758848149

go-licenses can collect the license info for dependencies https://github.com/google/go-licenses, though it's an open question what kubernetes / code organization would recommend (currently discussing), the main repo has checked in vendor and a bunch of custom scripts to audit and manage dependencies / vendor.

Looks like our last exposure here is the indirect dependency (through viper) on github.com/hashicorp/hcl

github.com/hashicorp/hcl v1.0.0 // indirect

This dependency doesn't have an approved license by the CNCF rules. It is on an exemption list from 2022 . I need to figure out whether or not that exemption carries forward to today.

It seems like the exceptions in those files are cumulative - as evidenced by this PR: cncf/foundation#621 and some other information on that CNCF repo.

Because of this I'm inclined to close this issue after merging #9184. At that point we will have the automation in place to prevent this from happening in future.

Because of this I'm inclined to close this issue after merging #9184. At that point we will have the automation in place to prevent this from happening in future.

Maybe I missed something, but my quick skim suggests that PR only covers container images, while this repo also releases binaries and may develop dependencies in other places (test code? clusterctl? etc.)

The trivy scan is looking at files in the repo - so it catches the go.mod + scripts etc. I think that should be sufficient for what's in the binaries. What's definitely missing is our container images which have binaries + tools etc. from the base images.

I think you're right that we should also setup something for image scanning - maybe snyk like the sig-security scan for k/k.

I might be wrong but I thought for CVEs Trivy can scan based on source code and image (we definitely scan for CVE's based on images). Can't it do the same for licenses?

Also wondering what we actually have in our base images, I think we only use gcr.io/distroless (so no alpine or something like that). But never looked into what distroless contains, I assumed almost nothing

I might be wrong but I thought for CVEs Trivy can scan based on source code and image (we definitely scan for CVE's based on images). Can't it do the same for licenses?

It can - we could do a periodic license scan, but that's not covered in #9184 which is for blocking forbidden licenses incoming in PRs.

I'll open a separate issue for image scanninng, I think there's probably a number of choices to be made there.

I've opened #9436 to track periodic license scanning.

IMO we should ignore "Remove the indirect dependency on hashicorp HCL." for now as this dependency is accepted at the CNCF level. Using current tooling we will be able to catch a change in its license. If we really want to remove that dependency we could track it in a follow up.

Removing the dependency from our code involves removing viper which is a lot of work for little gain in this instance, at least in my opinion.

/close

@killianmuldoon: Closing this issue.

In response to this:

I've opened #9436 to track periodic license scanning.

IMO we should ignore "Remove the indirect dependency on hashicorp HCL." for now as this dependency is accepted at the CNCF level. Using current tooling we will be able to catch a change in its license. If we really want to remove that dependency we could track it in a follow up.

Removing the dependency from our code involves removing viper which is a lot of work for little gain in this instance, at least in my opinion.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Let's just wait and see. Maybe viper just drops the dependency sooner or later.

Pretty sure viper itself can't accept a change to a more problematic license

IIRC we previously removed viper from k/k for reasons like this ....