Default value of `smithy4sOutputDir` can break codegen cache
majk-p opened this issue · 5 comments
TL;DR
Can we please change:
config / smithy4sOutputDir := (config / sourceManaged).value / "scala"
to something like:
config / smithy4sOutputDir := (config / sourceManaged).value / "scala" / "smithy4s"
Or exclude codegenArgs.output
from cache altogether?
Problem
I'm working on an sbt plugin that uses smithy4s, but also generates some of it's own managed sources. Smithy4s produces files to smithy4sOutputDir which by default is defined as smithy4sOutputDir := (config / sourceManaged).value / "scala"
. This value is then later used by Smithy4sCodegenPlugin as a part of codegenArgs
used to produce files. The entire CodegenArgs
object is used as a cache key and thus the output
counts in. That wouldn't be a problem because the path would just remain cached as a String
, but smithy4s also overrides the json format for os.Path in a way that calculates the hash of the entire directory content. It isn't a problem as long as smithy4s is the only source generator. When another plugin adds managed sources the hashes change and we stop hitting cache. In some cases this might lead to repeated very long compilation times.
Perhaps it's worth reconsidering if smithy4s codegen cache should rely on the contents of source_managed
and resource_managed
at all, since it seems easy to invalidate the cache.
It's also not obvious to the end user why the cache got invalidated. In my debugging case I would be very happy if the plugin told me why it decided to rerun codegen even though smithy sources remained unchanged.
Solution
I think adding another segment in the path should solve this and save fellow sbt plugin authors some headache. This way other plugins can either use the root sourceManaged / scala
directory or anything different than sourceManaged / scala / smithy4s
.
The other option is to modify JsonConverters.codegenArgsIso
in a way that it doesn't serialize output
.
I'm thinking to do both:
- nest the generated sources under an extra directory so that it's clearly separated from other codegens
- remove the output's contents from the input hash, that feels like something we got "for free" unintentionally
I agree with Jakub.
Sounds good, I can raise a PR to address this
Working on my initial problem I've noticed an additional issue around pathFormat
implicit val pathFormat: JsonFormat[os.Path] =
BasicJsonProtocol.projectFormat[os.Path, HashFileInfo](
p => {
if (os.isFile(p)) FileInfo.hash(p.toIO)
else
// If the path is a directory, we get the hashes of all files
// then hash the concatenation of the hash's bytes.
FileInfo.hash(
p.toIO,
Hash(
os.walk(p)
.map(_.toIO)
.map(Hash(_))
.foldLeft(Array.emptyByteArray)(_ ++ _)
)
)
},
hash => os.Path(hash.file)
)
It assumes all paths will exist, which is not always the case. In one of my sub-modules I noticed the resource_managed
was not present, which silently causes the cache miss and code regeneration. Since we're about to exclude hashes of the outputs (both sources and resources) the issue should disappear for now, but this may require some more thought in future.
The paths are checked for existence first:
val inputFiles =
(inputDirs ++ generatedFiles)
.filter(_.exists())
.toList
but only for the inputs 😅