maven_jar doesn't download sources when hitting the repository cache
Closed this issue · 9 comments
Description of the problem:
If I use maven_jar
with sha1
specified and --experimental_repository_cache
in my .bazelrc, then it does not download sources if it hits the cache. If I do not specify sha1
or disable or clear the repo cache, then the sources .jar get downloaded.
Reproduction:
Put the following in WORKSPACE:
maven_jar(
name = "com_google_guava_guava",
artifact = "com.google.guava:guava:18.0",
sha1 = "cce0823396aa693798f8882e64213b1772032b09",
)
Then run:
> echo "build --experimental_repository_cache $PWD/cache" > .bazelrc
> bazel build @com_google_guava_guava//...
> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel guava-18.0.jar guava-18.0-sources.jar
The sources have been downloaded as expected. To trigger a download that hits the cache, run:
> bazel clean --expunge
> bazel build @com_google_guava_guava//...
> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel guava-18.0.jar
The sources have not been downloaded.
If you comment out sha1
and rebuild, then you will see:
> ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel guava-18.0.jar guava-18.0-sources.jar
If you add back sha1
and rebuild, the sources are gone again. If you then run:
bazel clean --expunge
bazel --bazelrc=/dev/null build @com_google_guava_guava//...
or
bazel clean --expunge
rm -r cache
bazel build @com_google_guava_guava//...
Then the sources are downloaded.
What operating system are you running Bazel on?
Debian Buster.
What's the output of bazel info release
?
release 0.11.1
Have you found anything relevant by searching the web?
I found an email thread from 2015, which seems to predate any SHA-check for maven_jar. I also found 7a7c41d where the ability to download the source jar was added.
/cc @aehlig
This seems to be a known issue and was already commented here. And it looks like the implementation of #308 was not completed yet.
I believe, it is not sufficient to extend the backend implementation (MavenDownloader.java
) to download the source JAR, without actually extending the interface of maven_jar
in the first place. Before 7a7c41d only one single JAR was downloaded, with the SHA1 specified in the maven_jar
interface. After 7a7c41d, two JARs are downloaded, but where was the source SHA1 specified? It's still missing and sounds like a design bug to me.
Looking at the cache level, after invoking bazel build @com_google_guava_guava//...
the source JAR isn't there either:
$ ls $(bazel info output_base)/external/com_google_guava_guava/jar
BUILD.bazel guava-18.0.jar guava-18.0-sources.jar
$ sha1sum $(bazel info output_base)/external/com_google_guava_guava/jar/guava-18.0.jar
cce0823396aa693798f8882e64213b1772032b09 ...
# check the cache content after the download -> only one single file there
$ ls -all cache/content_addressable/sha1/
total 12
drwxr-xr-x 3 davido users 4096 Mar 11 22:37 .
drwxr-xr-x 3 davido users 4096 Mar 11 22:37 ..
drwxr-xr-x 2 davido users 4096 Mar 11 22:37 cce0823396aa693798f8882e64213b1772032b09
# What file? -> it's guava jar artifact
$ diff $(bazel info output_base)/external/com_google_guava_guava/jar/guava-18.0.jar cache/content_addressable/sha1/cce0823396aa693798f8882e64213b1772032b09/file
<no diff reported>
I think, what should happen is: maven_jar
interface must be extended to accept src_sha1
in addition to sha1
. Not only this would allow to check, that the right source JAR was downloaded, it would also allow to correctly handle the repository cache use case and store/retrieve the source JAR under/from the content accessible storage directory.
@drigz If you are looking for a workaround for now, consider to use Gerrit Code Review's own maven_jar
implementation, that is correctly handling this use case: downloading jar and source jar with aggressive caching across different projects/clones/branches (the cache is enabled per default and cannot be deactivated, every downloaded artifact is cached locally, independently from Bazel). We have also externalized this rule in bazlets repository to manage external dependencies for over 100 of gerrit plugins (example of one plugin).
To get it right, the above maven_jar
with source SHA1, in gerrit, would be:
maven_jar(
name = "javax_validation",
artifact = "javax.validation:validation-api:1.0.0.GA",
sha1 = "b6bd7f9d78f6fdaa3c37dae18a4bd298915f328e",
src_sha1 = "7a561191db2203550fbfa40d534d4997624cd369",
)
after the build command, both JARs are downloaded (only once and never again on that machine, unless the cache is wiped out) and located in the cache, obviously:
$ ls ~/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA*
/home/davido/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA.jar-b6bd7f9d78f6fdaa3c37dae18a4bd298915f328e
/home/davido/.gerritcodereview/bazel-cache/downloaded-artifacts/validation-api-1.0.0.GA-src.jar-7a561191db2203550fbfa40d534d4997624cd369
I'm seeing this or a similar problem in 0.12 -- it indirectly triggers a bug in the kotlin rules and then indirectly in the intellij plugin.
Short of it is that in 0.12 *-sources.jar
are no longer included in the filegroup targets generated by the maven_jar repository rule and the kotlin rules depend on this behaviour.
The bug fix in the kotlin rules would be to be to deal with missing *-sources.jar
from the filegroups and to setup producers so that the intellij aspect processing can still do a to_proto
.
@davido any comments on the behaviour of maven_jar
in 0.12.0
? how do I get it to include the sources. the change I made to the kotlin rules also looks for source jars via the java provider -- they are not available in there either.
@hsyed Why are you asking me? As pointed out in my previous comment, Gerrit Code Review is using our own Skylark based implementation of maven_jar, that is using curl based python script as backend.
It should be very stable, as it was ported from our previously used Buck based tool chain, that we were using since 5 years. Of course, everything is cached including source artifacts.
@hsyed No reason to say sorry. I don't use native maven_jar
rule, but there is a trivial fix for this problem. Here we go: [1]. Excerpt from the commit message:
Test Plan:
WORKSPACE file:
maven_jar(
name = "com_google_guava_guava",
artifact = "com.google.guava:guava:18.0",
sha1 = "cce0823396aa693798f8882e64213b1772032b09",
sha1_src = "ad97fe8faaf01a3d3faacecd58e8fa6e78a973ca",
)
BUILD file is empty.
1. bazel build --repository_cache ./cache @com_google_guava_guava//...
2. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
BUILD.bazel guava-18.0.jar guava-18.0-sources.jar
3. ls -1 ./cache/content_addressable/sha1/
ad97fe8faaf01a3d3faacecd58e8fa6e78a973ca
cce0823396aa693798f8882e64213b1772032b09
4. bazel clean --expunge
5. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
No such file or directory
6. Disconnect network device
7. bazel build --repository_cache ./cache @com_google_guava_guava//...
8. bazel build --repository_cache ./cache @com_google_guava_guava//...
9. ls $(bazel info output_base)/external/com_google_guava_guava/jar/
BUILD.bazel guava-18.0.jar guava-18.0-sources.jar
Could have a simple maven_jar
implementation that doesn't use the repository cache for source jars. The change of behaviour has really affected us. The kotlin rules are no longer compatible with 0.11. We have a lot of external deps too many to maintain by hand or use java_import_external
. Our repo uses bazel-deps
which is gradually getting support for source jars but updating to it is a significant amount of work (for us).
Couldn't we have an implicit source jar mode that just tried to download the sources -- at least just till we have stable tooling for managing deps. I think It is acceptable for a clean --expunge
to be needed for the re-download of sources or have a temporary flag bazel clean --implicit-sources
.
I suspect this problem doesn't affect most. Internal you guys probably build most things from source so it's not a big deal to list out or use a tool for a ~100 external jars. We have ~400 deps and our repo uses the bazel-deps
layout. We have had the rug pulled under our feet please prioritise an interim solution that doesn't require a wholesale refactor/fix to our dep management.
Closing in favor of https://github.com/bazelbuild/rules_jvm_external