codehaus-plexus/plexus-archiver

Reproducible Builds not working when using modular jar

jorsol opened this issue ยท 18 comments

I have a project that compiles a module-info.java using the <multiReleaseOutput> approach of maven-compiler-plugin as show here:

        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.8.1</version>
          <executions>
            <execution>
              <id>compile-java9</id>
              <phase>compile</phase>
              <goals>
                <goal>compile</goal>
              </goals>
              <configuration>
                <fork>true</fork>
                <release>9</release>
                <compileSourceRoots>
                  <compileSourceRoot>${project.basedir}/src/main/java9</compileSourceRoot>
                </compileSourceRoots>
                <multiReleaseOutput>true</multiReleaseOutput>
              </configuration>
            </execution>
          </executions>
        </plugin>

The <project.build.outputTimestamp> property is set on the project and the jar is created correctly, all the files set the timestamp except for the /META-INF/versions/9/module-info.class file which causes that the build is no longer reproducible.

I have found where the issue lies, the JarToolModularJarArchiver uses the jar command as a postCreateArchive() step, which updates the module-info.class file changing the last modification date of that file.

I have tested to add the following to the last part of the postCreateArchive() method, where the moduleDescriptorsPath is a List with all the module descriptors found.

  // Fix Last Modified Date of the module descriptor.
  if (getLastModifiedDate() != null) {
    URI uri = URI.create("jar:" + getDestFile().toURI().toString());
    try (FileSystem zipfs = FileSystems.newFileSystem(uri,
        new java.util.HashMap<String, Object>()) ) {
      for (String path : moduleDescriptorsPath) {
        Path module = zipfs.getPath(path);
        long lastModDate = getLastModifiedDate().getTime();
        Files.setLastModifiedTime(module, FileTime.fromMillis(lastModDate));
      }
    }
  }

This more or less works but I think the precision of the last modified time set is not the same (for some reason it's not using UTC).

@hboutemy sorry if I'm pinging on you, but since the reproducible-builds is your area, without this, all the modular jars won't be reproducible never.

There is an open issue about the jar tool breeaking this. Search JBS.

Ok, it might be a bug on the jar tool, but probably a workaround could be implemented here, or should we wait until a new release of Java fix this?

See my analysis here: https://issues.apache.org/jira/browse/MJAR-275
Worst bullshit ever by Oracle.

The root cause has been fixed in Java 18. Well, we now need a backport...

As this doesn't seem to be backported, I could provide a workaround for this issue that should work, WDYT?

IMHO depending on a specific Java release is less than ideal, not many users will use Java 18 just to compile (even if they can event target to Java 7) just to get reproducible builds.

@jorsol You can write to the appropriate mailing list for a backport request.

or add a jar rewriting process after the jar tool execution:
https://github.com/codehaus-plexus/plexus-archiver/blob/master/src/main/java/org/codehaus/plexus/archiver/jar/JarToolModularJarArchiver.java#L99

notice that I did not test the update of Java 18, I don't get how the jar tool even in Java 18 will get a reproducible timestamp of the .class: IIUC, it will use the timestamp from the file system, but this timestamp is not reproducible

or add a jar rewriting process after the jar tool execution

I already have a working fix after the jar tool execution, that's why I'm asking if it's of interest.

I don't get how the jar tool even in Java 18 will get a reproducible timestamp of the .class

In previous versions of Java, the jar tool basically rewrites the module-info.class and in the rewrite process the last modified time of the jar file was lost, in Java 18 this process fetches the last modified time of the zip entry and then use that to set the last modified time of the zip entry that is overwritten.

I think a temporary fix will be useful: if you can share yours, that would be awesome
and I suppose that we can check that if Java 18 is currently used, no fix is necessary

on the change, ok, if the Java 18 logic is "keep the existing module-info.class timestamp", we'll get reproducible result: did you check that we get expected result with JDK 18? Notice: now I remember that last Logback was done with JDK 18 https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/wip/logback-1.3.0-alpha13.buildspec , which was a pain for me because there is no Maven Dockerhub image with JDK 18 yet, but I imagine that Ceki wanted to get that reproducible output by using this EA JDK 18. I suppose I'll test with that release :)

thanks for the feedback @jorsol

Thanks, @hboutemy, I could provide my fix but it actually depends on this PR #199, so if you or any of the maintainers don't mind reviewing and merging it would be nice.

I just fall into a rabbit hole trying to provide a fix, after many tests and research I found other issues:

The bad:

  • From my findings, JDK 18 does fix the module-info.class timestamp, BUT if the Main-Class attribute is set, then the MANIFEST.MF is updated without preserving the timestamp, so essentially the fix applied in the JDK 18 only partially fix the problem.
  • My original idea was to use the Zip File System Provider to update only the specific entries, BUT this uses Files.setLastModifiedTime and this sets the extended timestamp attributes, which is NOT great for reproducible builds.
  • In the future, on Java 8 (once this project uses #200) the ZipEntry could use setLastModifiedTime(FileTime), but again, it has the same problem as it sets the extended timestamp attributes (see the previous item). This can be ok for normal Zip files but for jars, not so much.

The good:

  • There are some efforts in the OpenJDK project to allow reproducible builds (or at least set the timestamp), there is a new option in the JAR tool --date (bad name but whatever), this sets the timestamps in every entry of the zip file, that option is already integrated into Java 19-ea, and will be backported to Java 18, but the best part is that it's also being backported into the next Java 17.0.3 ๐ŸŽ‰
  • My (new) fix includes a "detection" of the --date option and uses that if available.
  • For every version that doesn't have the --date option, there is a fallback mechanism to fix the time so we can have reproducible builds way down to Java 9 (Java 8 and 7 should not be affected by modular jars).

The ugly:

  • The fallback mechanism will post-process the time in the zip entries, which essentially means that it needs to create a new copy of the jar file, this adds a few seconds to the jar generation depending on the size, BUT the good part is that the output will be practically the same as if the --date option was available, meaning that a jar created with Java 17.0.3 should be identical to a jar created with Java 17.0.2.

I think the post-processing part is not a bad trade-off to allow reproducible builds.

What a PITA instead of Oracle monkeys fixing this the right way and back porting down to 8. Joke.

I will open the PR as soon as the FileTime API is merged.

The PR #205 fixes this issue if someone want to take a look ๐Ÿ‘€ .

I just merged #199 : this is the update that I was not confident on, but everybody looks ok
I'll have a look back at this PR later this week

thanks for you patience and efforts

I just merged #199 : this is the update that I was not confident on, but everybody looks ok

Thanks @hboutemy, that update was only changing the API from the old Date to FileTime, it looks large but all the functionality should be preserved, is backward compatible, and it can be used as-is with the maven-jar-plugin without changes (tested by using the local SNAPSHOT dependency of plexus-archiver in the jar-plugin).

But please, if you found something that is broken or something that needs clarification ping me to check it.

Now, back to the the real fix of the reproducible build with modular jars, the PR #205 is smaller in files touched, but is actually more invasive than the previous one, this is the one that probably you should look out more closely (not saying that is should broke anything).

This fix has many hours of research and it will use the new --date jar option if available, I'm confident that it should be ready, I think I added enough comments to explain it, but if something is not clear, please let me know.