internetarchive/heritrix3

Maven build fails due to HTTP only upstream servers

Jauchi opened this issue · 5 comments

Hello!
I am currently trying to build heritrix, but maven won't build the project, since the upstream repo is marked as HTTP URL and that get's blocked in newer versions.
Additionally, I tried replacing the URLs with HTTPS, but your maven repo is only reachable through HTTP.

Could you enable HTTPS on your maven repo?

As a workaround, it's possible to tell Maven to use the HTTP repo. For GitHub actions, we use this:

https://github.com/internetarchive/heritrix3/blob/04f958e987e6c8a3849740cf5ee69fce0a6d1896/.github/workflows/m2-settings.xml

We've been talking to IA about updating the Maven endpoint, but there's no update on this yet.

To remove the dependency on the IA build server, we need at least:

com.anotherbigidea:javaswf:jar:CVS-SNAPSHOT-1
com.esotericsoftware:kryo:jar:1.01
com.esotericsoftware:reflectasm:jar:0.8
com.esotericsoftware:minlog:jar:1.2

(this list comes from taking the repo out and building - it failed on heritrix-commons so there may be other gaps in other modules)

EDIT: it is possible to upload old artifacts to Maven Central, but we should consider whether we want to depend on such old code. For JavaSWF, that's a long dead project and an extractor that never worked very well AFAIK. For Kryo, that is a more disruptive change, because it's buried so deep in Heritrix3 and will invalidate all the existing state systems. But perhaps that's all the more reason up upgrade, given it's critical, yet so ancient and unsupported.

I think those dependencies must now be in Maven Central as the build seems to build without the archive.org Maven repo. Added a branch to check the build still works...

Ah, no. The local caching must have been more aggressive that I realised. The PR will fail until the aforementioned dependencies are either removed from the build or added to Maven Central.

ato commented

builds.archive.org is now accessed using HTTPS (#310)