gradle/github-dependency-graph-gradle-plugin

Alternate solutions if the GITHUB_ vars do not exist

Closed this issue ยท 15 comments

I know this is designed for a GitHub Actions environment, but I'm curious if we can modify the configuration slightly for other environments.

Today, this plugin will cause a build failure if it's applied without all its expected configuration. I've implemented this as a temporary workaround:

if (System.getenv("GITHUB_WORKSPACE") != null
        && System.getenv("GITHUB_REF") != null
        && System.getenv("GITHUB_SHA") != null
        && System.getenv("GITHUB_JOB") != null
        && System.getenv("GITHUB_RUN_NUMBER") != null) {
    apply plugin: GitHubDependencySubmissionPlugin
}

But this is fairly annoying to check, and it can get out-of-date. I'd like to propose 2 alternatives if the variables do not have values:

  1. Inject known placeholders, such as the name of the variable, into the JSON.
  2. Emit a warning message, don't apply anything, and therefore don't generate the JSON.

Note: I'm willing to PR this but I want to make sure I'm doing what the maintainers would accept.

I think I'd rather see an GitHubDependencySubmissionPluginOptional that applies the GitHubDependencySubmissionPlugin if those variables are present. That way, if a user wants the guarantee that the plugin will work, or fail, then they can apply the GitHubDependencySubmissionPlugin, but if they want flexibility, they could apply the GitHubDependencySubmissionPluginOptional version. Thoughts?

In any case I'd like to be able to use the plugin locally like

./gradlew -p plugins/package-managers/gradle/src/funTest/assets/projects/synthetic/gradle-library -I ~/Downloads/github-dependency-graph-gradle-plugin.init.gradle GitHubDependencyGraphPlugin_generateDependencyGraph

without the need to specify any GITHUB_ environment variables. Maybe these could simply be set to dummy default values if the GITHUB_ACTION environment variable is not set?

bigdaz commented

I'm not sure that dummy default values make sense for all variable, since several of them form an integral part of the dependency snapshot to be submitted. Unfortunately there are many aspects of the GitHub Dependency Graph snapshot that are not very well documented and it's not clear how they are currently being used, or how they will be used in the future.

I've documented the required environment variables for the most recent plugin release here: https://github.com/gradle/github-dependency-graph-gradle-plugin#required-environment-variables. Of these, the GITHUB_DEPENDENCY_GRAPH_JOB_CORRELATOR is vital, since it is used by GitHub to determine whether a particular dependency snapshot should replace a previous one.

The others I'm not so sure about.

  • GITHUB_DEPENDENCY_GRAPH_JOB_ID: the gradle-build-action will default this to the GITHUB_RUN_ID, but it's not clear how it's used.
  • We could almost certainly determine the GITHUB_SHA and GITHUB_REF values locally
  • The GITHUB_WORKSPACE is only used to determine the relative path to the build file: this could safely default to the project root directory.
bigdaz commented

In any case I'd like to be able to use the plugin locally like

./gradlew -p plugins/package-managers/gradle/src/funTest/assets/projects/synthetic/gradle-library -I ~/Downloads/github-dependency-graph-gradle-plugin.init.gradle GitHubDependencyGraphPlugin_generateDependencyGraph

without the need to specify any GITHUB_ environment variables. Maybe these could simply be set to dummy default values if the GITHUB_ACTION environment variable is not set?

How would you use the generated snapshot, assuming that it contained placeholder values that are invalid for the Dependency Submission API?
Would it help if it was easier to set these values, via a plugin extension, system properties, etc?

bigdaz commented

I know this is designed for a GitHub Actions environment, but I'm curious if we can modify the configuration slightly for other environments.

Can you be more specific about your use case?

The goal is to ensure that the plugin generates a snapshot file that is valid for the Dependency Submission API.

Do you mostly want to be able to apply the plugin as a no-op when the required data isn't available, or do you have another use case in mind?

How would you use the generated snapshot, assuming that it contained placeholder values that are invalid for the Dependency Submission API?

Basically, I'm just interested in some JSON output that lists a Gradle project's dependencies (similar to what is being proposed here) for further processing in Open Source compliance tools (ORT in my case). As such I can easily ignore the dependency-submission-API-relevant bits.

Would it help if it was easier to set these values, via a plugin extension, system properties, etc?

Not really; if I have to set them at all, I do not actually care whether that's by environment variables or other means.

bigdaz commented

@sschuberth I suppose a plugin that didn't have GitHub-specific inputs and generated output in some sort of SBOM format would be the most useful to you?

Probably yes @bigdaz, although I'm a bit skeptical about the

output in some sort of SBOM format

part. Most established SBOM formats like SPDX or CycloneDX are a bit cumbersome to handle if you'd like to record deeply technical metadata about dependencies, like the URL to their sources artifact and the location of the VCS repository (maybe even including the relevant sub-path within that repository).

So I'd also be fine with (or even prefer) a custom JSON / YAML based format that is fully under our control, to add as much details as we'd like, for third parties to consume / convert.

bigdaz commented

Most established SBOM formats like SPDX or CycloneDX are a bit cumbersome to handle if you'd like to record deeply technical metadata about dependencies, like the URL to their sources artifact and the location of the VCS repository (maybe even including the relevant sub-path within that repository).

So I'd also be fine with (or even prefer) a custom JSON / YAML based format that is fully under our control, to add as much details as we'd like, for third parties to consume / convert.

Thanks for the feedback.
First I want to nail down the GitHub support, but I've been thinking about a separate Dependency Graph plugin that uses the same infrastructure but produces a different report.
Can you clarify (maybe with an example) the kind of information you'd like to see included?
It's not clear to me exactly what you mean by "the URL to their sources artifact and the location of the VCS repository".

I've been thinking about a separate Dependency Graph plugin that uses the same infrastructure but produces a different report.

๐Ÿ‘ on that, as that could / should nicely replace gradle/gradle#21894, and maybe the https://docs.gradle.org/current/userguide/project_report_plugin.html in general, which despite that page saying

We plan to add much more to the existing reports and create additional ones in future releases of Gradle.

hasn't seen any updates in ages ๐Ÿ˜‰

Can you clarify (maybe with an example) the kind of information you'd like to see included?

Sure thing! Here's an example output of what metadata ORT captures for the Apache Commons Lang Maven dependency:

- id: "Maven:org.apache.commons:commons-lang3:3.5"
  purl: "pkg:maven/org.apache.commons/commons-lang3@3.5"
  authors:
  - "Benedikt Ritter"
  - "Carman Consulting, Inc."
  - "CollabNet, Inc."
  - "Duncan Jones"
  - "Fredrik Westermarck"
  - "Gary D. Gregory"
  - "Henri Yandell"
  - "Joerg Schaible"
  - "Loic Guibert"
  - "Matt Benson"
  - "Niall Pemberton"
  - "Oliver Heger"
  - "Paul Benedict"
  - "Rob Tompkins"
  - "Robert Burrell Donkin"
  - "SITA ATS Ltd"
  - "Steven Caswell"
  - "The Apache Software Foundation"
  declared_licenses:
  - "Apache License, Version 2.0"
  declared_licenses_processed:
    spdx_expression: "Apache-2.0"
    mapped:
      Apache License, Version 2.0: "Apache-2.0"
  description: "Apache Commons Lang, a package of Java utility classes for the\n \
    \ classes that are in java.lang's hierarchy, or are considered to be so\n  standard\
    \ as to justify existence in java.lang."
  homepage_url: "http://commons.apache.org/proper/commons-lang/"
  binary_artifact:
    url: "https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5.jar"
    hash:
      value: "6c6c702c89bfff3cd9e80b04d668c5e190d588c6"
      algorithm: "SHA-1"
  source_artifact:
    url: "https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5-sources.jar"
    hash:
      value: "f7d878153e86a1cdddf6b37850e00a9f8bff726f"
      algorithm: "SHA-1"
  vcs:
    type: "Git"
    url: "http://git-wip-us.apache.org/repos/asf/commons-lang.git"
    revision: "LANG_3_5"
    path: ""
  vcs_processed:
    type: "Git"
    url: "http://git-wip-us.apache.org/repos/asf/commons-lang.git"
    revision: "LANG_3_5"
    path: ""

It's not clear to me exactly what you mean by "the URL to their sources artifact and the location of the VCS repository".

As you can see in the above example, we know that the Maven source code artifact is at https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.5/commons-lang3-3.5-sources.jar, but we also know that the real origin of the source code (from where the source artifact was created) is at http://git-wip-us.apache.org/repos/asf/commons-lang.git, Git tag LANG_3_5.

If you're interested, the whole data model for a dependency / package in ORT is here.

bigdaz commented

Wow, that's a lot of data about the dependency artifact!

  • Do you currently extract this information while running the build, or is this done post-hoc?
  • If you're doing it within the build invocation, where do you get the information?
  • Why do you inline so much static information about the external project, rather than simply referencing the POM file?

Do you currently extract this information while running the build, or is this done post-hoc?

Our general philosophy in ORT is to not hook into a build that we're anyway doing, like on CI, but to be able to inspect any project at any time from the outside. As part of that we're actually aiming for avoiding a (full) build where possible, and to get all required metadata as cheaply as possible, i.e. without triggering the download of binary artifacts, or actually building any artifacts.

Needless to say that's not always possible (depending on the build system; we support way more than Gradle alone). But in the case of Gradle, our "legacy" implementation is a monolithic init.gradle that fills our data model using a ToolingModelBuilder. A newer alternate implementation uses another init.gradle just to apply a plugin that we're extracting locally at runtime, and then also uses the OrtModelBuilder to fill our data model, but this time using Kotlin instead of Groovy code ๐Ÿ˜

If you're doing it within the build invocation, where do you get the information?

Good question. Some of the information is a little bit hard to get, as it might be inherited from Maven parent POMs. Basically, we're explicitly resolving all POMs, and then using Maven to resolve the POM to its effective model.

Why do you inline so much static information about the external project, rather than simply referencing the POM file?

One reason is that looking at a single POM file might not be enough due to inheritance from the parent POM(s) (see above). Another reason is reproducibility: We want to document how things were at a specific time, as we've seen Maven artifacts being republished under the same name and version, but with changed dependencies. Finally, our rules engine / evaluator is then able to operate on the metadata completely offline.

But you're asking a valid question. Another design choice could have been to just capture as much information as needed to uniquely identify a specific dependency, and then get all (hopefully static) metadata from services like https://deps.dev/ or https://ecosyste.ms/.

bigdaz commented

Folks, I've added some rudimentary support for alternative output formats in #64 .
This introduces a SimpleDependencyGraphPlugin which doesn't require any environment variables and will output the graph in 2 ways:

  • A raw JSON rendering of all dependency configurations resolved
  • A simple text list of all dependencies resolved

This could easily be evolved into something more sophisticated, but it's not entirely clear what's required. In the end, we should add more robust support for flexible output formats, and then you'll be able to implement exactly what you want.

I'm going to close this issue, since the infrastructure is now there to support this use case. Please try the SimpleDependencyGraphPlugin and see if it suits your needs.

This could easily be evolved into something more sophisticated, but it's not entirely clear what's required.

Thanks for this @bigdaz! I'm planning to take a look at it and eventually enrich the output with more metadata required by OSS compliance / security checks.