smithy-lang/smithy

`createSmithyJarManifestUrl`: incorrect handling of complex paths

kubukoz opened this issue · 3 comments

Hi! I first bumped into this over a year ago, but so far had a workaround that seemed decent, and the usecase seemed pretty niche. Now me and my team maintain about 4 copies of the same workaround, so it's reaching a point where it'd be best to resolve upstream - right here.

What I'm trying to do

  • grab a bunch of jars, find Smithy model files in them, and add them to a model assembler
  • ...allowing some of these jars to be missing a manifest altogether (simply ignoring them).

Originally, here's how we did it in smithy4s:

val modelsInJars = deps.flatMap { file => // java.io.File
   val manifestUrl = 
     ModelDiscovery.createSmithyJarManifestUrl(file.getAbsolutePath()) 
   try { ModelDiscovery.findModels(manifestUrl).asScala } 
   catch { 
     case _: ModelManifestException => Seq.empty //fallback to nothing
   } 
 }

So far so good.

The problem

Trouble begins when certain artifacts are involved, namely:

  • artifacts fetched with Coursier
  • ...which have urlencoded characters in their path. Which is typical for snapshots named after their corresponding commit and .

For example, org.polyvariant:test-library-core_2.13:0.0.1+123-SNAPSHOT from Sonatype Snapshots would be placed in:

$COURSIER_CACHE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar

Here's what happens when you use that file with ModelDiscovery:

  • createSmithyJarManifestUrl yields this URL: jar:file:$COURSIER_CACHE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%2B123-SNAPSHOT/test-library-core_2.13-0.0.1%2B123-SNAPSHOT.jar!/META-INF/smithy/manifest - so far so good, it seems
  • findModels fails with ModelManifestException caused by FileNotFoundException. the path that cause reports is $COURSIER_CACHE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1+123-SNAPSHOT/test-library-core_2.13-0.0.1+123-SNAPSHOT.jar - the urlescaping is gone now.

I think this is an issue with ModelDiscovery.findModels.

The expected behavior can be achieved with the following workaround: treat the jar as a FileSystem, and use the Path API to get an URL:

Using.resource(
  FileSystems.newFileSystem(file.toPath())
) { fs =>
  val path = fs.getPath("META-INF", "smithy", "manifest")
  ModelDiscovery.findModels(path.toUri().toURL())
}

The path.toUri() part stringifies as: jar:file://$COURSIER_FILE_LOCATION/v1/https/s01.oss.sonatype.org/content/repositories/snapshots/org/polyvariant/test-library-core_2.13/0.0.1%252B123-SNAPSHOT/test-library-core_2.13-0.0.1%252B123-SNAPSHOT.jar!/META-INF/smithy/manifest - which, to me, sounds correct.

Request

I think there's more than one issue here:

  1. createSmithyJarManifestUrl going from URL (a relatively structured type) to a String (no structure at all), offering little to no type safety when it's used down the line
  2. findModels doing something strange with its parameter, discarding the urlencoding of the given path.

so the ideal solution would also be twofold:

  • deprecate the Stringly-typed methods and add some safer ones, operating on URL or URI
  • fix the handling of urlencoded paths in findModels

Context, if you want to learn more about the issue we had and the workaround: disneystreaming/smithy4s#850

Notes / questions

  • Are there better ways to load a bunch of jars, not all of which necessarily contain Smithy manifests? If not, could one be baked into ModelAssembler?

@mtdowling QQ, what kind of reproduction would work best for you? This scala-cli snippet showcases the problem (just have to switch the withSpecialCharacters flag on and off to see the difference between normal and "strange" version patterns)

The best possible repro would be a unit test showing the broken current behavior, but I know that’s challenging here. Or maybe a known dependency on maven or something we can test?

In theory this would suffice:

{
  "version": "1.0",
  "maven": {
    "repositories": [
      {
        "url": "https://s01.oss.sonatype.org/content/repositories/snapshots"
      }
    ],
    "dependencies": [
      "org.polyvariant:test-library-core_2.13:0.0.1+123-SNAPSHOT"
    ]
  }
}

but both Smithy CLI and the LSP claim they don't allow + in versions.