bazelbuild/bazel

maven_jar doesn't download sources when hitting the repository cache

Closed this issue · 9 comments

drigz commented

Description of the problem:

If I use maven_jar with sha1 specified and --experimental_repository_cache in my .bazelrc, then it does not download sources if it hits the cache. If I do not specify sha1 or disable or clear the repo cache, then the sources .jar get downloaded.

Reproduction:

Put the following in WORKSPACE:

maven_jar(
    name = "com_google_guava_guava",
    artifact = "com.google.guava:guava:18.0",
    sha1 = "cce0823396aa693798f8882e64213b1772032b09",
)

Then run:

> echo "build --experimental_repository_cache $PWD/cache" > .bazelrc
> bazel build @com_google_guava_guava//...
> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel  guava-18.0.jar  guava-18.0-sources.jar

The sources have been downloaded as expected. To trigger a download that hits the cache, run:

> bazel clean --expunge
> bazel build @com_google_guava_guava//...
> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel  guava-18.0.jar

The sources have not been downloaded.

If you comment out sha1 and rebuild, then you will see:

> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel  guava-18.0.jar  guava-18.0-sources.jar

If you add back sha1 and rebuild, the sources are gone again. If you then run:

bazel clean --expunge
bazel --bazelrc=/dev/null build @com_google_guava_guava//...

or

bazel clean --expunge
rm -r cache
bazel build @com_google_guava_guava//...

Then the sources are downloaded.

What operating system are you running Bazel on?

Debian Buster.

What's the output of bazel info release?

release 0.11.1

Have you found anything relevant by searching the web?

I found an email thread from 2015, which seems to predate any SHA-check for maven_jar. I also found 7a7c41d where the ability to download the source jar was added.

This seems to be a known issue and was already commented here. And it looks like the implementation of #308 was not completed yet.

I believe, it is not sufficient to extend the backend implementation (MavenDownloader.java) to download the source JAR, without actually extending the interface of maven_jar in the first place. Before 7a7c41d only one single JAR was downloaded, with the SHA1 specified in the maven_jar interface. After 7a7c41d, two JARs are downloaded, but where was the source SHA1 specified? It's still missing and sounds like a design bug to me.

Looking at the cache level, after invoking bazel build @com_google_guava_guava//... the source JAR isn't there either:

  $ ls $(bazel info output_base)/external/com_google_guava_guava/jar
  BUILD.bazel  guava-18.0.jar  guava-18.0-sources.jar

  $ sha1sum $(bazel info output_base)/external/com_google_guava_guava/jar/guava-18.0.jar
  cce0823396aa693798f8882e64213b1772032b09 ...

  # check the cache content after the download -> only one single file there
  $ ls -all cache/content_addressable/sha1/
  total 12
  drwxr-xr-x 3 davido users 4096 Mar 11 22:37 .
  drwxr-xr-x 3 davido users 4096 Mar 11 22:37 ..
  drwxr-xr-x 2 davido users 4096 Mar 11 22:37 cce0823396aa693798f8882e64213b1772032b09

  # What file? -> it's guava jar artifact
  $ diff $(bazel info output_base)/external/com_google_guava_guava/jar/guava-18.0.jar cache/content_addressable/sha1/cce0823396aa693798f8882e64213b1772032b09/file
  <no diff reported>

I think, what should happen is: maven_jar interface must be extended to accept src_sha1 in addition to sha1. Not only this would allow to check, that the right source JAR was downloaded, it would also allow to correctly handle the repository cache use case and store/retrieve the source JAR under/from the content accessible storage directory.

@drigz If you are looking for a workaround for now, consider to use Gerrit Code Review's own maven_jar implementation, that is correctly handling this use case: downloading jar and source jar with aggressive caching across different projects/clones/branches (the cache is enabled per default and cannot be deactivated, every downloaded artifact is cached locally, independently from Bazel). We have also externalized this rule in bazlets repository to manage external dependencies for over 100 of gerrit plugins (example of one plugin).

To get it right, the above maven_jar with source SHA1, in gerrit, would be:

 maven_jar(
    name = "javax_validation",
    artifact = "javax.validation:validation-api:1.0.0.GA",
    sha1 = "b6bd7f9d78f6fdaa3c37dae18a4bd298915f328e",
    src_sha1 = "7a561191db2203550fbfa40d534d4997624cd369",
)

after the build command, both JARs are downloaded (only once and never again on that machine, unless the cache is wiped out) and located in the cache, obviously:

  $ ls ~/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA*
/home/davido/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA.jar-b6bd7f9d78f6fdaa3c37dae18a4bd298915f328e
/home/davido/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA-src.jar-7a561191db2203550fbfa40d534d4997624cd369
hsyed commented

I'm seeing this or a similar problem in 0.12 -- it indirectly triggers a bug in the kotlin rules and then indirectly in the intellij plugin.

Short of it is that in 0.12 *-sources.jar are no longer included in the filegroup targets generated by the maven_jar repository rule and the kotlin rules depend on this behaviour.

The bug fix in the kotlin rules would be to be to deal with missing *-sources.jar from the filegroups and to setup producers so that the intellij aspect processing can still do a to_proto.

hsyed commented

@davido any comments on the behaviour of maven_jar in 0.12.0 ? how do I get it to include the sources. the change I made to the kotlin rules also looks for source jars via the java provider -- they are not available in there either.

@hsyed Why are you asking me? As pointed out in my previous comment, Gerrit Code Review is using our own Skylark based implementation of maven_jar, that is using curl based python script as backend.

It should be very stable, as it was ported from our previously used Buck based tool chain, that we were using since 5 years. Of course, everything is cached including source artifacts.

hsyed commented

@davido sorry assumed you were the expert on the issue.

@hsyed No reason to say sorry. I don't use native maven_jar rule, but there is a trivial fix for this problem. Here we go: [1]. Excerpt from the commit message:

Test Plan:
    
    WORKSPACE file:
    
      maven_jar(
        name = "com_google_guava_guava",
        artifact = "com.google.guava:guava:18.0",
        sha1 = "cce0823396aa693798f8882e64213b1772032b09",
        sha1_src = "ad97fe8faaf01a3d3faacecd58e8fa6e78a973ca",
      )
    
    BUILD file is empty.
    
    1. bazel build --repository_cache ./cache @com_google_guava_guava//...
    2. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
    BUILD.bazel  guava-18.0.jar  guava-18.0-sources.jar
    3. ls -1 ./cache/content_addressable/sha1/
    ad97fe8faaf01a3d3faacecd58e8fa6e78a973ca
    cce0823396aa693798f8882e64213b1772032b09
    4. bazel clean --expunge
    5. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
    No such file or directory
    6. Disconnect network device
    7. bazel build --repository_cache ./cache @com_google_guava_guava//...
    8. bazel build --repository_cache ./cache @com_google_guava_guava//...
    9. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
    BUILD.bazel  guava-18.0.jar  guava-18.0-sources.jar

[1] https://bazel-review.googlesource.com/c/bazel/+/53950

hsyed commented

Could have a simple maven_jar implementation that doesn't use the repository cache for source jars. The change of behaviour has really affected us. The kotlin rules are no longer compatible with 0.11. We have a lot of external deps too many to maintain by hand or use java_import_external. Our repo uses bazel-deps which is gradually getting support for source jars but updating to it is a significant amount of work (for us).

Couldn't we have an implicit source jar mode that just tried to download the sources -- at least just till we have stable tooling for managing deps. I think It is acceptable for a clean --expunge to be needed for the re-download of sources or have a temporary flag bazel clean --implicit-sources.

I suspect this problem doesn't affect most. Internal you guys probably build most things from source so it's not a big deal to list out or use a tool for a ~100 external jars. We have ~400 deps and our repo uses the bazel-deps layout. We have had the rug pulled under our feet please prioritise an interim solution that doesn't require a wholesale refactor/fix to our dep management.