Let's save scalacache!
cb372 opened this issue · 38 comments
I'd like to start with an apology. Over the last few months I have become increasingly apathetic towards maintaining this library. I haven't been merging scala-steward PRs, making new releases, or reviewing PRs from people who want to contribute. This is not fair on the library's users, and it's especially unfair on people who've given up their own time to make contributions.
The reasons for my neglect of the library are twofold. First is a simple lack of time and energy. I became a father almost 6 months ago, and since then my life has mostly revolved around my daughter. After doing my day job and looking after her, it's hard to summon the motivation to hack on OSS in my free time. I'm more inclined to veg out in front of Netflix for a couple of hours before bed. I'm acquainted with a few superhuman OSS contributors with kids, but I honestly don't know how they manage it.
But a much bigger reason for not wanting to interact with my own library is that I just don't like the way much of the code is written. I wrote most of it a long time ago and made a lot of bad design decisions along the way, and I have to confront my own mistakes every time I look at the code. It's frustrating to see these problems, knowing that I don't have enough free time to fix them.
So for a long time I've pretty much ignored all GitHub notifications for this repo, but without giving anyone the common courtesy of an explanation. That's not a reasonable or sustainable strategy, so the time has come to sort this mess out. In the remainder of this post I'd like to enumerate some of the problems I see with scalacache, and propose a plan to get the library into a shape where I would actually enjoy working on it.
If you'll indulge me in a little reminiscing, I'll start by summarising the history of the project in order to give a bit of context.
Pre-history
The very first commit was in February 2013, and the first release (v0.1) was in November of that year. (I can't remember why it took so long to release.) Back then the library was called cacheable
, and it was little more than the memoize
macro. It was an experiment to see if I could use Scala macros (which were brand new at the time) to emulate the behaviour of Spring's @Cacheable
annotation, auto-generating the cache key based on the method name and arguments.
The experiment was a success, and I felt the macro was useful, so I decided to expand it into a proper caching library so that people could actually make use of it. I didn't want to reinvent the wheel by writing my own cache implementation, so I made it an slf4j-style facade API, with integrations for a number of popular caching solutions. At the time there were integrations for Guava, Ehcache and Memcached. Later this grew to include Redis, Caffeine and others.
v0.1 to 0.10
At this time (2013, Scala 2.10.x), the Scala community was miniscule, and there were not many resources for people trying to get started with the language. So one of the main design goals for the library was to be as beginner-friendly as possible. Everybody needs to do caching at some point, so I knew that people new to Scala would soon come looking for a caching library, and I didn't want to scare them away.
For example, in v0.3.0 I started using Future
s in the API, but I also wanted to make the library easy to use for beginners who didn't want to bother about Future
, map
/flatMap
, for-comprehensions, error-handling, etc. So I made my First Huge Design Mistake, and decided to expose two equivalent APIs: one that used Future
everywhere, and a yolo-mode one that blocked the thread and threw exceptions on errors.
Over the next few years, the library mostly grew by adding integrations for more and more cache implementations. Some of these were pretty esoteric (twitter-util LRU, anyone?), but they were added by first-time contributors and I didn't want to be mean and reject their PRs, so I let them in. In fact, I have a feeling that making a PR to add a new cache implementation was quite a nice way for people to make their first Scala OSS contributions. They could copy an existing integration, including a test suite somebody else had already prepared, tweak the code to integrate with their caching library of choice, and make some simple sbt changes to add a new module.
Another notable addition was Scala.js support. This was another situation in which I accepted a contribution mostly to avoid hurting the feelings of the person who contributed it. I have nothing against Scala.js per se, but I don't use it myself, so I really didn't care whether scalacache was built for Scala.js or not. It also comes with a significant maintenance overhead for library authors. I have wasted many an hour fighting with sbt over the years thanks to Scala.js. What makes it worse in this case is that only the scalacache-core
module, which doesn't do anything on its own, is cross-built for Scala.js. We don't provide any cache implementations, so a Scala.js user would have to write their own. I seriously doubt anyone has ever used scalacache with Scala.js.
v0.20.0
Something that had always bugged me about the scalacache API was that it was hardcoded to Future
. In 2017, alternatives to Scala's Future
such as Monix Task
and cats-effect IO
started to gain some traction. Intrigued by this idea, I dropped this monster of a PR. It contained a lot of breaking changes so I bumped the version number up to 0.20.0 for the next release.
Unfortunately this came with its own batch of poor decisions. For example, I wrote my own type class hierarchy (from MonadError
to Async
), heavily inspired by cats-effect. To be fair, this was a reasonable thing to do at the time. This was right in the middle of a very bloody cats-vs-scalaz war and I didn't want to pick a side. It was politically safer to write my own type classes, plus integrations for both libraries. Also cats-effect was very immature at that point (v0.4?) so I didn't want to add it as a core dependency.
What's worse is that my type classes have some laughably unlawful/deliberately broken instances. Have you ever noticed that scalacache has an instance of Async
for Id
? How on earth does that work? This is pretty shameful, but I did it with my eyes open - I can't plead ignorance.
The final whoopsy of this release is that I didn't manage to remove the duplication of APIs. Instead of a Future
-based API and a yolo-mode API, we now had an F[_]
-based one and a yolo one. This was again done with beginners in mind (if they thought Future
was scary, what would they think of higher-kinded types and MonadError
?!), but it's deeply unsatisfying.
Summary of problems with the library
Here I'd like to summarise what I think are the main flaws of scalacache. I've touched on a few of them above. In no particular order:
- Uses its own unlawful/very dodgy type class hierarchy (a bastardised version of the cats-effect hierarchy) to model effects.
- Has a nasty duplicated API in which every operation can be done either inside an effect monad or not. This was an attempt to make it easy for Scala beginners to use the library without worrying about monads, but it just complicates things and makes the API bigger and uglier.
- Only allows Strings as cache keys, which doesn't make much sense when using an in-memory cache such as caffeine or a remote cache like Redis which allows arbitrary byte arrays as keys
- Uses a weird @propensive -inspired concept called "modes", which I thought would catch on but didn't. These days standard patterns for specifying the effect monad have emerged.
- Because scalacache is basically a facade for a bunch of different cache implementations, it can only support the lowest common denominator of operations that are supported by all those implementations. That means we can't unlock the more powerful features of each individual backend. For example, pipelining support in Redis, or a "get N different keys at once" operation, which is supported by some implementations but not others, or more intelligent usage of Caffeine's API to maximise performance.
- Half-baked Scala.js support.
- No clear policy on what kind of contributions are welcome. PRs accepted without a fight.
- Lone-wolf development. If I hadn't been the sole maintainer, a lot of the above problems could probably have been avoided.
A plan to fix them
- First of all I'd like to add cats-effect as a core dependency and rewrite the library to use that instead of its own type classes. This means no more
Sync[Id]
! - I'd like to finally get rid of the "synchronous" API. So the sole API would wrap everything in
F[_]: Async
. For beginners who find HKTs unnerving, I think good documentation with plenty of examples is probably good enough. This would mostly be along the lines of "UseIO
, stick anunsafeRunSync()
at the end, and don't worry about it. And these days there are a lot more learning resources we can point people at, to explain tagless-final style. - Update the API to support any type
K
as a cache key. Needs a bit of thought regarding how to turn a value of typeK
into a type that the cache implementation can handle (e.g. memcached needs a String, Redis needs anArray[Byte]
). Most likely some type ofKeyEncoder
type class. - Switching to cats-effect would imply getting rid of the modes concept.
- I've never found a good solution to this, but I feel there's plenty more scope for exploration of the design space here.
- Personally I'd like to remove Scala.js completely, but I could be persuaded to keep it if we could make it actually useful, e.g. by adding a cache implementation that is cross-built for Scala.js
- Decide on a contribution policy, then document it in the repo. I'm not too averse to people adding integrations with Java dependencies (e.g. cache2k), but we should be very wary of adding too many Scala dependencies. It becomes a maintenance nightmare, especially when a new version of Scala comes along. In general I don't want the library to be a tiny core plus a huge collection of wrapper modules to integrate with every cache implementation under the sun. That feels like an anti-pattern. It should be easy for users to create new integrations, but that doesn't mean they all have to become first-class citizens and get published as part of the library.
- This, dear reader, is where you come in! If you've read this far, hopefully you are reasonably invested in the fate of scalacache. If you'd like to become a maintainer, let me know in the comments or via Twitter DM. Another of my OSS libraries, cats-retry, was in danger of falling into a similar state of neglect until @LukaJCB, my knight in shining armour, appeared in my DMs one day offering to help out. Nowadays the library is in a much healthier state. Hopefully we can do the same for scalacache.
Thank you for all you’ve done with Scalacache and congrats on the baby!
I think these sound like a great set of changes! I especially love the idea of getting away from modes and standardizing on cats-effect. Just a few points that I’d love for you to elaborate on.
I'm not too averse to people adding integrations with Java dependencies (e.g. cache2k), but we should be very wary of adding too many Scala dependencies.
Do you think any of the currently supported caches are in need of being removed/moved to another project, etc?
In general I don't want the library to be a tiny core plus a huge collection of wrapper modules to integrate with every cache implementation under the sun.
Where do you see the line being drawn? There are already more wrapper modules than say SLF4J has (I’m pretty sure).
Do you think any of the currently supported caches are in need of being removed/moved to another project, etc?
Where do you see the line being drawn?
Very good questions. Ideally I think we should provide integrations for the most commonly used cache libraries, but it's hard to say what those are. I only have anecdotal data. For example, does anyone use memcached these days? I don't see anyone talking about it, but maybe I'm not moving in the right circles. I see on the official site that Netflix sponsors it, so I guess they are a user.
I think Guava can probably be dropped, as it's superseded by Caffeine. As I understand it, Caffeine started out as a performance-focussed rewrite of Guava and has now added a lot more features, so there's no reason to choose Guava over Caffeine.
EhCache can probably go. That's a relic from my Java days. I don't think anyone uses it these days, but again that's anecdotal.
Maybe we could look at the Sonatype download stats for these libraries? Might give us some useful data on which to base a discussion.
Mules includes a few dependency-free cache implementations out of the box. We could do the same, e.g. a simple cache based on a cats-effect Ref
of a Map[K, V]
.
If we do manage to draw a sensible line between caches we want to keep in the main lilbrary and those we don't, there are a few options for dealing with the latter:
- Move them to a
scalacache-contrib
repo (under ascalacache
org?) and give them a separate release cycle from the main library - Move each of them to its own repo under the
scalacache
org - Turn them into examples in the documentation, as most of the integrations are actually only a few lines of code.
Thank you for your in-depth response!
does anyone use memcached these days?
Good question. I know a few people use it where I work, but I don’t know if they are more so adopting or moving away from it. I do know that AWS ElastiCache supports it as a managed service (which is maybe where Netflix uses it). I think this one is worth discussing in more detail, but in general I think it will make the most sense to remove it from the core repository and potentially put it in some kind of contrib repo as you mentioned.
I think Guava can probably be dropped
Agreed. This article validates what you are saying about Guava and shows how caffeine is superior in terms of performance.
EhCache can probably go. That's a relic from my Java days. I don't think anyone uses it these days, but again that's anecdotal.
Anecdotally I would agree. I would think it is especially less likely to be used within the Scala community since it is associated with Java EE at least in name. Could be good to look at the Sonatype stats on this one, but I think it will probably be a good candidate for removal.
Mules includes a few dependency-free cache implementations out of the box. We could do the same, e.g. a simple cache based on a cats-effect Ref of a Map[K, V].
I think having at least one kind of “default” cache that is included without dependencies makes a lot of sense. If all someone wants is an in-memory cache and they don’t care a ton about a specific set of performance characteristics, then it doesn’t make sense to make them choose a backend cache to use if we can provide one or a few reasonable defaults.
- Move them to a scalacache-contrib repo (under a scalacache org?) and give them a separate release cycle from the main library.
- Move each of them to its own repo under the scalacache org.
The contrib repo could end up getting really bloated over time. When you have a single repo with a bunch of somewhat unrelated sub-projects, it seems like the issues section on Github gets really tough to manage and use properly (see Alpakka). Another concern with a single repo is that you could end up with conflicting dependencies (similar to why you said you’d want to mostly avoid Scala deps in the core).
However, having them all in their own individual repo could be harder for visibility. If you are trying to look through the different available cache implementations it could be a pain to sort through them all if they are in disparate locations. Also, the one repo per cache impl approach could be a burden on maintainers (needing to follow and watch more repos).
I guess I could be persuaded either way, but I think it will depend on 1) how many different cache implementations we see there being and 2) how large each of these will be.
- Turn them into examples in the documentation, as most of the integrations are actually only a few lines of code.
I think this could make a lot of sense if there were only 1 or 2 small examples, but also it can be frustrating as a user of a library if you have to copy and paste code into your project to do something. Seems like the better UX is providing a dependency even if it is a small one. I could definitely be persuaded here though 🙂
Also, this may be implied by my lengthy responses here, but I would definitely be interested in being a maintainer. I think these are some really interesting problems to solve for the library moving forward. I have a good amount of experience working with cats-effect and tagless final. I am newer to the OSS scene, but I am trying to really ramp up and get involved. Feel free to reach out on Twitter DMs if you have any questions for me (@lewisjkl).
I'd be happy to help in whatever way!
I would like to help either.
by the way, @cb372 @lewisjkl you can put this repo into https://dashboard.mergify.io/ and then you don't bother merging Scala Steward PR manually
Thanks @manuelcueto and @pandaforme! I'll try to write some proper issues soon so people can contribute.
I agree we should set up Mergify in future, but we had such a large backlog of scala-steward PRs I think it made sense to go through and check them by hand.
As I mentioned privately, I started the move to cats-effect in the core (also getting rid of the sync API and modes) :) #345
Hey, I forgot about this for a while, but I think the recent major changes were never released. Who can make new releases, besides @cb372?
I see @lewisjkl should be able to make some by pushing tags.
I would suggest making a milestone around now, and wait for Cats Effect 3 before we make a scalacache 1.0 - there won't really be many changes because we don't use Effect
or Concurrent
a lot AFAIR, but still it'll be a breaking change and it's probably not worth making a 1.0 and then a 2.0 in a couple months.
@kubukoz That makes sense to me. I can push a tag for sure. Are you thinking v1.0.0-M1
?
Yeah, sounds good
Awesome, I just pushed that tag up.
Hey @kubukoz I would be happy to hop on a call sometime this week or next. When is good for you?
Next week sounds good, let's find a time in the DMs :)
We met with @lewisjkl and sketched out an approximate roadmap for the upcoming months:
Immediate:
- remove ScalaJS (#452) - it'll unblock a lot of potentially-conflicting work in the build area - merged
Before M2:
- use cats-effect 3 milestones (#450): probably a good idea to push a separate milestone after this
- use sbt-github-actions (#371)
- explore API improvements (#355, #349, #459): we should at least get a design for how we want these to be implemented ASAP, so we can make it work by RC1 and not break binary/source compatibility soon after 1.0 final.
- define a logging abstraction (#455)
Around M3 / RC1 / RC-X:
- cross-build for scala 3 milestones
- add MiMa to the build pipeline (in RCs we should be able to verify compat against a milestone, even for the purpose of verifying that this works at all)
- move project to the scalacache organization OR suggest moving into the typelevel incubator
- ensure the ability to add maintainers is distributed and not bus-factored on Chris
- update sbt-microsites and get publishing included in the build pipeline
Added :)
explore API improvements (#355, #349, #459): we should at least get a design for how we want these to be implemented ASAP, so we can make it work by RC1 and not break binary/source compatibility soon after 1.0 final.
I am going to take a look at these and draft up a proposal in another issue within the next few days.
I am going to take a look at these and draft up a proposal in another issue within the next few days.
Better late than never 😅 #491. Would love any thoughts and feedback.
Anything I can help out with?
I'm happy to discuss #561 with whomever (albeit asynchronously) - I didn't realise the situation with the library when I made it and have only just discovered this thread. I'm going through a bunch of dependencies that my company has on OSS at the moment and trying to help bump what I can, and this one has definitely been the most fun to work on, so I'm checking back fairly regularly to see if anyone's had a look at the pr. First step is probably to enable running the tests in CI 😅
Update: #517 is done and #561 is released (thanks a lot @hughsimpson 🙏). Scala 3 is in 1.0.0-M4.
Hey all, just want to apologize for dropping off the radar for so long here. I am going to have more time coming up so I am hoping to make some progress on this repo and finally get it over the "finish" line for 1.0.0 release.
@ronnnnnnnnnnnnn thank you for your PR, I just took a preliminary glance at it and I think it generally looks good. I will take some time this weekend to read through it more carefully and add any comments. If I have time I will try to merge master into it too since I see there are quite a few merge conflicts in there now.
After that, I will go through and deal with the many Scala-steward PRs that have piled up.
From there, my vote would be to punt on #349 and #459. I think these are more niche use cases and can be dealt with down the road. If anyone feels strongly about them, then feel free to comment in the issues and/or raise a PR.
At this point, I think we will be good to move into the RC phase and proceed with the outlined tasks that @kubukoz outlined above:
- cross-build for scala 3 milestones
- add MiMa to the build pipeline (in RCs we should be able to verify compat against a milestone, even for the purpose of verifying that this works at all)
- move project to the scalacache organization OR suggest moving into the typelevel incubator
- ensure the ability to add maintainers is distributed and not bus-factored on Chris
- update sbt-microsites and get publishing included in the build pipeline
🚀 😎
Hey @DavidGregory084, thanks for the interest in helping out here. Things are in a bit of a weird spot..
So I was working at the beginning of the year to get version 1.0.0 released, but ended up having some issues getting the scaladocs generating correctly which is blocking the ability to release.
When I started helping to maintain this library I was actually using it in production so I had a lot more incentive to work on it and get things going. I know it is lame to say, but my motivation to work on it since has dwindled with the fact that I no longer use it and have other projects I'd rather focus on in my limited free time. Anyway, I think this project deserves some more maintainers that will dedicate more time and have a better vision for the direction of the project based on their actual usages of the library. I am happy to help out here and there, but I don't think I am a good person to be "leading the charge" anymore.
- move project to the scalacache organization OR suggest moving into the typelevel incubator
- ensure the ability to add maintainers is distributed and not bus-factored on Chris
These are big points that need to get resolved if this project is to move forward (which I think it should, of course 🙂 ).
In terms of 1.0.0, I think the main things outstanding are:
- Fix publishing (scaladoc generation)
- Update docs to actually be helpful for 1.0.0, I don't think they are very great currently.
As far as your mongo PR goes, I don't really have an opinion on whether or not that should be merged. The original thought when we started trying to "save" scalacache was to reduce the total number of cache implementations, but that was mainly to lower the amount of maintenance needed in this codebase I think. If some other people want to step up and be maintainers here, then they can decide what direction they want to go with that.
Sorry for the super long post here, but I guess I am mainly throwing this out into the ether: if anyone has interest in helping maintain here, please let us know and we will try to get you added.
@lewisjkl no need to justify anything, this is open source and it's your free time!
I will see what I can figure out about the scaladoc generation failures. It looks like scaladoc is refusing to link a definition from the core
project in the memcached
project, and one from the JDK. I'm sure that is fixable using something like sbt-api-mappings but perhaps we could move the project to sbt-typelevel as I think that it has got cross-project scaladoc mappings set up out of the box (/cc @armanbilge)?
After that perhaps we can talk to the Typelevel folks about whether they would take on scalacache? Depending upon the answer to that we could decide what to do about the organization going forward.
👍 to sbt-typelevel, no I'm not biased 😉
It does set some good default settings for API mappings, among other things. Not sure if it will solve all the problems, but will hopefully improve the situation.
OK, so this is definitely a bit of a tangle of a problem. Scaladoc links are supposed to work across modules out of the box. (EDIT: it turns out this is not correct) The underlying issue seems to be sbt/sbt#4929, but in that original report the cross-project dependency was more complex as it used an ivy configuration dependency. The problem in scalacache is triggered even with a straightforward dependency on the core module in the memcached module.
I do not experience problems linking to the JDK like we see in that build - perhaps that issue has been fixed?
Cats works around this by simply disabling fatal warnings when publishing docs, and accepting that some of the links will not work (typelevel/cats#787), e.g. see how the link to Eval does not link to the right place in the doc for Cofree that is published to javadoc.io, but it does on the cats documentation site, which uses the API docs produced by sbt-unidoc.
Docs produced via sbt-unidoc do not suffer from this problem. However, sbt-unidoc docs are not really the solution to this because they are not published under the correct artifact name for the docs to be found by IDEs or for them to be easily found on javadoc.io - they are much more suitable for publishing on a project website.
I guess I need to spend some time debugging sbt + Scaladoc 🤯
Hmm, is it still a problem when you explicitly set the apiURL
setting for every modules? e.g. see
https://github.com/typelevel/sbt-typelevel/blob/24ec0703071f5557309b4e9db7759262def9a24a/sonatype/src/main/scala/org/typelevel/sbt/TypelevelSonatypePlugin.scala#L65
I wouldn't really look to the cats build for examples ... it's possibly reflecting problems of the past. I plan to overhaul it with sbt-typelevel soon.
This is not an area that I know much about, but in Gradle for cross-module javadocs I use offline links. Not sure if that's the same problem, so just an fyi if helpful.
javadoc.options.linksOffline(
"https://static.javadoc.io/${group}/caffeine/${version}/",
"${project(':caffeine').buildDir}/docs/javadoc/",
)
Hmm, is it still a problem when you explicitly set the
apiURL
setting for every modules? e.g. see https://github.com/typelevel/sbt-typelevel/blob/24ec0703071f5557309b4e9db7759262def9a24a/sonatype/src/main/scala/org/typelevel/sbt/TypelevelSonatypePlugin.scala#L65
Yes that definitely does solve the problem, however I couldn't help going down the rabbit hole of trying to understand why sbt does not at least try to do something without it. 😆
Scaladoc cannot find anything to link because it cannot find anything for that classpath entry in its extURLMapping
, and that comes from apiMappings
in sbt.
I guess notionally what I would expect is something like that if apiURL
is not set, it would cross link against the local version of the scaladoc in the other module's target folder in order for users to preview the docs. This is essentially what @ben-manes has described doing above with gradle. You can get sbt to do that by setting the apiURL
to (crossTarget.value / "api").toURI.toURL
.
However, I suppose that it's probably better that it doesn't do this by default - it would be too easy to publish docs that link to a local file without any kind of build warning.
I have had a go at fixing the Scaladoc build issue in #665, but it's not quite there yet - currently Scaladoc builds work locally and for CI release builds but will not work for CI snapshot builds. Mulling over the solution to this in typelevel/sbt-typelevel#226 as it's not exactly clear how to set the URLs for inter-module Scaladoc links for a snapshot.
@lewisjkl #665 is ready for review now whenever you have some time - it resolves the scaladoc issues you were experiencing as @armanbilge found a good solution for typelevel/sbt-typelevel#226.