Make classpath scanning optional
dsyer opened this issue · 30 comments
There's a discussion here spring-projects/spring-framework#27619 about how some common use cases do not require classpath scanning (specifically versionless URL path construction). Myself and @vpavic have both volunteered to contribute code here, but we need some guidance. One option would be to extract the "core core" into yet another Jar file (what would it be called?). Or we could make the classpath scanner an optional dependency here (use it to compile and test but not include it transitively by default). Or we could keep the dependency and simply make the classpath scanning lazy (only do it if it is needed).
UPDATE: added a third option.
Regarding naming, if new library is the preferred route, maybe we could take some inspiration from the name of the component we have in Spring (WebJarsResourceResolver
) and name it webjars-resource-resolver
or simply webjars-resolver
.
Actually, maybe we should start by asking a simple question: why does webjars need classpath scanning? What are the use cases that supports?
I haven't inspected webjars-locator-core
at depth, but from my experimentation in vpavic/spring-framework/tree/gh-27619, I suspect that resolving WebJars of Bower GitHub variant cannot be done as easily as for Classic and NPM variants and is probably (one of the things) reliant on classpath scanning.
This is because Bower GitHub variant does not have as static path to pom.properties
as Classic and NPM variants do (META-INF/maven/org.webjars/{webjar}/pom.properties
and META-INF/maven/org.webjars.npm/{webjar}/pom.properties
, respectively) but is rather dependent on the source GitHub org and repo and is like META-INF/maven/org.webjars.bowergithub/{org}/{repo}/pom.properties
. Similarly, the path to assets themselves (resources/webjar/...
) does not contain version unlike with the other two variants.
You can observe these differences by taking a look at, for example, Bootstrap WebJar in different variants:
org.webjars:bootstrap:5.2.2
org.webjars.npm:bootstrap:5.2.2
org.webjars.bowergithub.twbs:bootstrap:5.2.2
Another thing to consider, I originally learned about spring-projects/spring-framework#27619 after opening spring-projects/spring-boot#31560 because Spring Boot wasn't picking up webjars-locator-core
updates due to versioning policy.
That led me to open #86, and I think that if what we're discussing here end up being part of webjars-locator-core
, those concerns will remain and Spring Boot will continue to pick up updates at a very slow pace unless changes occur somewhere (either this repo adopts SemVer, or Spring Boot makes exception to its policy).
I suspect that we may only support “npm style” webjars, but that’s OK by me - if it’s documented it should be fine for everyone IMO.
If we want Spring Boot users to be able to take advantage, I think @vpavic’s argument suggests that the best way to implement this is to make the classpath scanning lazy in the existing library - only use it if direct lookup fails. I haven’t looked at the code yet but that seems like a feasible solution.
I suspect that we may only support “npm style” webjars
Hmm, why do you think it's only NPM variant we can support?
The way I see it, Classic and NPM variant are basically equals in terms of the way they are resolved from the classpath. Bower GitHub seems like the only one that isn't straightforward (I'm ignoring Bower Original variant as webjars.org says it's deprecated).
You’re probably right. I suppose I should rephrase what I said as “I don’t care if we can only support NPM packages”. It’s a bonus if some other layouts work but I don’t think it’s worth a lot of additional effort.
Here's an attempt at a small step that delivers something useful: https://github.com/dsyer/webjars-locator-core/tree/version-locator. It adds a new utility WebJarVersionLocator
with enough features to be used by Spring (and anyone else who simply wants versionless paths) but no classpath scanning. Add feedback here or on my fork.
After the interim rise of activity to resolve this there seems to be some loss of traction again.
I can't stretch enough how much of an impact the classpath scanning has. Especially on bigger projects and bigger test suites I see a major portion of allocations being caused by this. Which in turn triggers GC again, which has an impact on CPU and timings.
Here's a (large) project where 17% of allocations are spent on only classpath scanning during tests. I've shared one in the original Spring-Framework issue that even has up to 60%.
What's especially bad in the Spring case it's that it's doing the classpath scanning multiple times. (Basically once per registration
inside ResourceHandlerRegistry
.
I can workaround a few places where I have control over the code, but it would be really better if this would be solved for good in the underlying libraries.
Cheers,
Christoph
Sorry this took so long. I've been trying to wrap my head around everything that exists today and why. Here are some initial thoughts...
-
Classpath Scanning:
For user ergonomics we don't require a WebJar artifact ID or a full path to the file (after the version). So the user can just write:
webJarAssetLocator.getFullPath("bootstrap.min.js")
and they get the resolved artifact (WebJar and subdir), like:/META-INF/resources/webjars/bootstrap/3.1.1/js/bootstrap.min.js
. This resolution requires doing some sort of classpath scanning. I think we used to manually get directory content listings but switched to using the Classgraph library. We could maybe switch to PathMatchingResourcePatternResolver in a Spring implementation. Does that work with GraalVM Native Image correctly?As @dsyer proposed, we could offer an API that requires a WebJar artifact ID and a full path but I think the DX will take a negative hit. Today there is a similar method (but it is implemented internally with Classgraph):
public String getFullPathExact(@Nonnull final String webJarName, @Nullable final String exactPath)
Having a library that just has that method would definitely be an option but that doesn't address the ergonomic issues. Maybe those issues aren't a problem for Spring as it seems maybe most users don't use the looser
getFullPath(partialPath)
method. -
Version Caching:
Even if we get rid of scanning we will probably want to figure out some way to cache the version lookup as it is something we don't want to do every time a WebJar asset is resolved.
-
Bower GitHub WebJars
For a variety of "good reasons" we decided not to put the version in the Bower GitHub WebJars paths. Whatever change we make should be compatible with those WebJars or users will have a hard time.
-
Spring & Versions in Paths:
As we look at making changes to this in Spring we really need to look at a few other things which don't fit the usual WebJars paradigm. First, it is intentional that we put the version in the path so that good HTTP cache headers can be put on the WebJar requests. It looks like we break that with the way Spring currently removes the path. Second, for Play we add an easy config option to switch to using the WebJars CDN. Both of these could be accomplished with some HTML templating utils which we should investigate for doing in Spring.
Overall I think the best way to deal with all of this would be to do a lot more at build-time. If we have any opportunity to do that, I think that is the best place to invest the energy but requiring another build plugin is probably a non-starter so may not be an option.
If we just want to plug the current hole and ignore the DX & HTTP cache issues, we can take @dsyer's changes and publish a webjars-locator-core-base
(or whatever) with the non-scanning logic but we will still need something to address the version caching and I'm not sure the best place or way to handle that.
Overall I think the best way to deal with all of this would be to do a lot more at build-time. If we have any opportunity to do that, I think that is the best place to invest the energy but requiring another build plugin is probably a non-starter so may not be an option.
I don't think this is practical in some cases. Consider for example using https://github.com/springdoc/springdoc-openapi, which at least for our projects is actually the reason why the WebJarsResourceResolver
is added internally in Spring because this lib pulls in webjars-locator-core
. With this lib we provide swagger docs. It's not uncommon that you provide swagger docs based on some sort of profile or configuration that's disabled/enabled in certain environments. I could imagine this would complicate things during build-time at the very least. Or you would need to act like things would be enabled every-time and ignore any user-side configuration during build-time. Maybe you have a more concrete picture already in your head where this wouldn't be an issue, but I wanted to let you know of the use-case...
Thanks @dreis2211 for the use case. It might be possible in that case to have the generated file in the springdoc-openapi
artifact. But that does complicate things a bit. Question for you on the OpenAPI use case... Are you doing some templating that could handle the versioned paths or do your paths to WebJar contents have to be versionless?
We don't do any templating - that's all on the library end. And by default the library is accessing stuff in a versionless fashion. But you can set the springdoc.swagger-ui.version
in which case it uses the versioned one. Check https://github.com/springdoc/springdoc-openapi/blob/master/springdoc-openapi-common/src/main/java/org/springdoc/ui/AbstractSwaggerWelcome.java#L240 . But that's impractical in terms of DX as you have to know which swagger-ui version is used inside springdoc... There is a reason after all for the versionless resolution.
In all fairness. The library also provides ways already to build the OpenAPI json at build-time. But to the best of my knowledge this doesn't build the complete HTML which uses the swagger-ui
stuff in the end. It just builds up the data. I wanted to mention this for completeness reasons, because it might be possible to hook into this existing infrastructure.
The workaround posted by @dsyer in Spring Petclinic works for the Spring-Doc use-case as well and is what I'm mostly using these days to workaround the performance problems. Next to excluding webjars-locator-core
in tests when possible (e.g if the Swagger page is not covered by tests). If I have access to the code, at least.
publish a webjars-locator-core-base (or whatever) with the non-scanning logic
Just for the sake of clarity: my proposal does not require a new jar (although that would make it easier to exclude the github dependency) - it's just a new utility API that you can use directly or through the old interface. A reasonable and conservative step in the right direction IMO would be just to include it in webjars-locator-core
. It might even be possible to keep the interface of the existing WebjarResourceLocator
and make the new utility purely an implementation detail, but that would require some major refactoring so I didn't want to propose it as a first step.
I was thinking a new artifact would be nice so we wouldn't have the transitive deps.
webjars-locator-lite
?
webjars-locator-utils
?
Either is good. I think before we move forward we need to think more about the version lookup caching piece. Any thoughts on the best way to handle that?
Hi any new thoughts about how to solve this?
I think there is still a lot to be resolved around this and I haven't had time to solve any of it. :(
For clarity sake, I don’t know why we can’t move forward with my patch. There is no caching issue in practice (IIRC Spring does all the work already, and probably other frameworks do), and classpath scanning could be eliminated with no impact on user code.
There are still 4 unresolved issues:
- Partial paths are not supported
It is nice to be able to do something like:WebJarVersionLocator.fullPath("jquery", "jquery.min.js")
which in the NPM WebJar is actually indist/jquery.min.js
Omitting thedist/
is nice and removing the functionality has downsides even if we don't force users to use the new library. - Version Caching
My understanding of how this works is that a request comes in for like/webjars/jquery/jquery.min.js
and to resolve that to the file in the classpath, it needs to callWebJarVersionLocator.fullPath
which needs to callWebJarVersionLocator.webJarVersion
which has to do a bunch of stuff to resolve the version. Same for usingWebJarVersionLocator.fullPath
in a template. The current WebJars Locator pre-caches these versions so they are just lookups in a Map. From what I can tell, that version resolution would have to happen on every request / call tofullPath
. - Bower GitHub WebJars are not supported
We can't provide a new locator that doesn't work with Bower GitHub WebJars unless we want to officially deprecate Bower GitHub WebJars. I'm not totally opposed to doing that since I think Bower is pretty dead at this point but I want to check with Vaadin as this whole system was for them. - Cache friendly request paths
I still would like to work with the Spring folks to find a better way to support easily client-side cacheable HTTP paths. I think there is considerable value to including the version numbers in the paths and enabling easy CDN switching. We could do this later but it'd be nice to at least decide where we want to go so that we don't make it hard to get there down the road.
Thanks James. IMO 1-3 are all nice to have - they are optional features that no-one has to use and we don't have to take them away just to support a different (also optional) utility. 4 is a solved problem AFAIK - Spring (and other frameworks) create cacheable resource URIs from the full path, but the developer never has to know the version (except to put the right thing on the classpath). This has been the case for many years, and all we want to do here is make the classpath resource resolution more efficient. So I still think the most pragmatic approach is to simply include the classpath utility stuff in the next release of webjars and move on to some of the other things if anyone cares deeply enough. It can be a separate jar file or it can go in the core, from a purely technical point of view it doesn't matter, but the github dependency should be optional either way.
- I agree that we could do without it in a locator-lite library.
- I'd like to see some performance data before I'd call this nice to have. What is the current time per request now vs with the change?
- I've asked Vaadin for feedback on deprecating Bower GitHub WebJars.
- I'll need some more details on that, anything you can point me to?
\2. The local cache doesn't have to be provided by webjars (libraries that use webjars could implement it also), but if you want one in a static Map
or something simple it would be totally trivial to add it and I could do that as part of my patch. Just ask.
\4. https://docs.spring.io/spring-framework/reference/web/webmvc/mvc-config/static-resources.html and https://docs.spring.io/spring-framework/reference/web/webmvc/mvc-config/static-resources.html (cache-busting explicitly mentioned there).
\2. Totally open to the cache being external to WebJars Locator. I'm assuming Spring already has some mechanisms for in-memory caches. If you think we can get that into Spring then I'm down but we will need to consider both the HTTP request and the template usage.
\4. I'll investigate this further but I think we should also consider the CDN swapping part as something that we should add to Spring down the road.
Spring has a CachingResourceResolver
which can be enabled easily in a Spring Boot application I believe. There might still be some value in a local cache inside the webjars locator.
CDN swapping as in "select the right CDN for the geographical origin of the request?" Is that anything to do with webjars? Maybe I misundertood.
I've forked the changes:
https://github.com/webjars/webjars-locator-lite
I cut out all the overlap with webjars-locator-core
.
And released this for testing:
org.webjars:webjars-locator-lite:0.0.1
As far as CDN stuff goes, in Play server-side templates you can do this:
@webJarsUtil.locate("bootstrap.min.css").css()
Which resolves to the local server's path by default, but with this setting:
webjars.use-cdn=true
The path changes to (dependent on the WebJar type and version) in the classpath:
https://cdn.jsdelivr.net/webjars/org.webjars/bootstrap/5.3.2/css/bootstrap.min.css
Let me know if this is headed in the right direction
That's the same code I posted, so it gets my thumbs up. We should take the CDN topic elsewhere (another issue or something).
Yup! Thank you. Let me know how integrating into Spring goes. For the CDN and caching topics, would you rather I file issues to facilitate discussion on a Spring repo or on the new lite repo?
would you rather I file issues to facilitate discussion on a Spring repo or on the new lite repo?
I don't mind. I don't really understand the requirement yet. We could do a "discussion" topic in the lite repo if you want.