Integrate cats bifunctorial IO from LukaJCB into cats effects

Question

Integrate cats bifunctorial IO from LukaJCB into cats effects

loicdescotte opened this issue 6 years ago · 23 comments

This Bifunctorial IO Implementation for Cats seems to be a nice way to express precise error types for Cats Effect IO, with better performance (and more friendliness) than EitherT.
The goal is not to convert IO[A] to IO[E,A] as it has been done in Scalaz8, but to have the choice to use IO[A] or Bio[E,A].

Answer 1 · 2018-04-22T08:57:19.000Z

(but please don't call it Bio!)

Answer 2 · 2018-04-22T11:16:51.000Z

Full disclaimer that code is a basically a WIP where I just took existing cats.effect.IO and made the Throwable part generic (which means we can no longer catch exceptions in a couple of places like flatMap) and a bunch of things don't work yet, so it'd require a LOT of work to get it to something that's more than just usable. There's also the problem that currently there's a ton of duplication with existing IO.

Overall, however, I'd like having this as an option in cats-effect, I don't think it should belong inside the core package, as it does not really integrate that well with existing type classes, which we won't change in the foreseeable future, but could instead be a separate module.

Answer 3 · 2018-05-05T19:11:48.000Z

We haven't yet reached 1.0...

Crazy idea, what if we make IO a type alias for cats.effect.bifunctor.IO[Throwable, ?].

It is a shame to have an idea, just before 1.0, that we may want to not bake in Throwable and MonadError, but do it anyway and live with it for 2 years or more.

I'm not sure if we have really done the exercise of imaging how these typeclasses change if we keep E generic. Having an error-free version of Sync, for instance, is interesting, because it allows an error free version of Ref, and an error free Async (such as what happens when you use threads to do some long running computation, which should always complete), could give you nice matching with an error-free Pledge.

I think we could do these changes so IO is still source compatible for folks that import cats.effect.IO (which would have the same methods and be concrete for E = Throwable).

Answer 4 · 2018-05-05T19:51:41.000Z

Bifunctor IO cures cancer, ends world hunger, and abolishes the designated hitter!

I'm unconvinced it's the only correct way to model IO, but I'm convinced it's a way with interesting tradeoffs and high community excitement. We should do this exercise this before 1.0, but urgently, because several projects are waiting on a stable release. Does someone volunteer to take the lead on exploring the changes to the type classes and laws?

Answer 5 · 2018-05-05T21:50:21.000Z

I'm interested in trying this. Maybe we can make a branch where we can make smaller PRs into, then evaluate the whole diff.

Working our way up the type class heirarchy, it might not be that bad.

Answer 6 · 2018-05-05T22:24:09.000Z

I'd like to see some examples that mix in polymorphic methods in F[_] that only require Functor/Monad with methods that use the new bifunctor typeclasses.

At the moment this approach only considers code that's monomorphic in IO[E,A], but what happens when you go to e.g F[_]?
I see two possibilities:

all your type parameters are now F[_, _] , how does that work with methods that only want Monad?
your type parameters remain F[_], but different methods have different MonadError constraints, which however can't be used together with the current tech we have. You can see this problem today with EitherT, if you want typed errors you are forced to monomorphise to [F[_]: Sync]: EitherT[F, Err, A], and with bifunctor IO it will mean eschewing F altogether and have just IO[E, A] everywhere.

The third approach is to keep F and a single MonadError e, with classy prisms to allow unification of various e types under one umbrella type. In fact @oleg-py has started some work on this https://github.com/oleg-py/meow-mtl/blob/master/src/main/scala/Test.scala. But in that case I suspect we could be able to make it work with normal IO by having the final all encompassing type be Throwable.

Regardless of whether the third approach works or not though, the problems with approaches one and two remain, so I'd like to hear if people have given some thought to it, and what solutions they see.

Answer 7 · 2018-05-06T02:50:28.000Z

I made a sketch here:

#197

which basically continues the MonadError approach keeping E generic all the way through. I think this allows us to get what we want (the ability to have no error (Nothing), or the ability to have ADT errors).

Answer 8 · 2018-05-06T07:24:52.000Z

Hey, so I haven't expressed my feelings for a bifunctor IO yet, but here goes ...

I don't like it.

But before going any further, I don't want to imply that a bifunctor IO isn't useful for some people and some use cases. I was thinking that we could end up having it as another sub-project. What we are discussing here are:

making the current IO a type alias to an IO[E, A]
changing the current type classes
doing anything that will delay 1.0.0

The primary reason for why I don't like it is because Throwable is out of the picture and that means we will no longer catch Throwable in delay, in suspend or in flatMap.

The premise that @LukaJCB writes about in Rethinking MonadError is that, due to this uncertainty about which operations trigger errors and which don't, we're forced to do attempt everywhere, but that is not a correct premise. The way exceptions work and why they were introduced in LISP and later in C++, is that you only catch exceptions at the point were you can actually do something about it.

for {
  r1 <- op1
  r2 <- op2
  r3 <- op3
} yield r1 + r2 + r3

Do you need to handle exceptions for all 3 operations? Of course not, even if all 3 operations can fail. You only handle errors at the point where you can recover — in case of web requests recovery can simply imply a log and an HTTP 500 error, which is totally fine, as the server can keep on going instead of crashing.

Also, lets say that we're using a bifunctor IO implementation. Well, we can say:

usually each of these operation will throw another exception type, but their composition will end up throwing Throwable or some other type that makes the error so generic as to be essentially Throwable
such an IO implementation is no longer reflecting the capabilities of the underlying runtime, which can still throw exceptions at any point in time, even for pure operations, as in case you aren't aware, on top of the JVM even a pure, error free, total function can throw due to things such as InterruptedException

And this point on IO no longer reflecting the underlying runtime is an important one, because in the words of Daniel Spiewak, IO is the runtime.

Also on utility, I understand the drive to parameterize all things. But the question is, what else could we parameterize and why aren't we doing it?

We could parameterize the operation type for example. Is it IO bound? Is it CPU bound? in order to not make a mistake about the thread pool on top of which it runs
Or we could parameterize the execution model — is it synchronous or asynchronous?
Or we could parameterize the side effect — i.e. is it doing PostgreSQL queries, or ElasticSearch inserts?
Or insert your own pet peeve ...

We aren't doing it because adding type parameters to the types we are using leads to the death of the compiler, not to mention our own understanding of the types involved, plus usage becomes that much harder — because by introducing type parameters, values with different type arguments no longer compose without explicit conversion / widening, pushing a lot of complexity to the user.

This is why EitherT is cool, even with all of its problems. It's cool because it can be bolted on, when you need it, adding that complexity only when necessary.

IO[E, A] looks cool, but what happens downstream to the types using it? Monix's Iterant for example is Iterant[F[_], A]. Should it be Iterant[F[_], E, A]? Or maybe Iterant[F[Throwable, _], A]? Or Iterant[F[_, _], E, A]?

If I parameterize the error in Iterant, how could Iterant keep on working with the current IO that doesn't have a E parameter? And if Iterant works with IO[Throwable, _], then what's the point of IO[E, A] anyway?

Odersky already expressed his dislike for type classes of multiple type parameters, such as MonadError and it's pretty telling that type classes with multiple type parameters are not part of standard Haskell.

Again, I'm not saying that we shouldn't do the bifunctor IO as an alternative.

There is always the problem that as an approach it is totally unproven and I don't want us to fall into that trap, just because Scalaz 8 is doing it, a library version that nobody is using due to it not being released yet.

And if the current IO ever happens as a type alias for an IO[E, A], personally I'll stop using it, plus cats-effect as a Monix dependency will be in serious doubt.

And I don't want to say this lightly — that Monix now depends on Cats and Cats-Effect has been a great sacrifice for the library. It's for example the reason for why Monix is not a Quill dependency, because its author doesn't want Cats as a transitive dependency.

In terms of the 1.0.0 release, I'm strongly not in favor of (big 👎 on) adding anything related to a bifunctor IO. We are already at RC and the next release was supposed to be a final release.

Consider that Monix 3.0.0-M1 was released September 2017 and since then Monix has not seen the final 3.0.0 release due to waiting on Cats-Effect to stabilize. And note that this is not a Cats 2.0 situation, because the changes between Monix 3.0 and 2.x are pretty big.

Answer 9 · 2018-05-06T11:49:16.000Z

@alexandru I agree, and the original purpose of the issue was to have bifunctor IO as an alternative, not to change IO into a bifunctor.

Answer 10 · 2018-05-06T11:58:48.000Z

@alexandru just one more thought about making IO[A] and alias of IO[Throwable, A]

The primary reason for why I don't like it is because Throwable is out of the picture and that means we will no longer catch Throwable in delay, in suspend or in flatMap.

Would it be more acceptable to constraint IO left type, i.e defining as IO[E <: Throwable, A] ?

Answer 11 · 2018-05-06T12:28:16.000Z

@loicdescotte that would be the same thing and doesn't fix the problem.

Dealing with Throwable is basically saying that the user affords to not care about any errors until they become a problem. Having E in there is equivalent with Java's checked exceptions, which have been an annoyingly bad idea, resulting in users wrapping them in RuntimeException, or worse, ignoring them completely.

The web is littered with articles on why checked exceptions were a bad idea and many of those reasons are also very relevant for an IO[E, A]:

Checked exceptions I love you, but you have to go
The Trouble with Checked Exceptions, an interview with Anders Hejlsberg

Among the problems cited:

empirical evidence suggests that most checked exceptions in Java are either ignored or rethrown, forcing people to write catch blocks that are meaningless; this is relevant for IO[E, A] as well, because if you have an IO[E1, A] and you combine it with IO[E2, B], then you have to create an E3 that can express both E1 and E2
the noise of dealing with errors re-cast to other types is problematic because users will train to ignore catch blocks that might actually have useful information
if you have a very explicit type, like FileNotFoundException, that doesn't mean you can recover from it; if a file isn't found, it's a pretty serious app configuration problem, the developer having missed a case — you might be able to recover from it, e.g. by showing the user a warning, but you probably can't replace that missing file, so the specific error we're talking about doesn't help
scalability of development is a problem — i.e. lets say that at some point a foo() is able to terminate with a FileNotFoundException, but by using checked exceptions or IO[E, A] this error becomes part of the signature; this means that you cannot change the function's implementation without breaking all callers, so suppose you change the implementation from reading files on disk to doing HTTP requests or whatever and as such it is no longer able to throw FileNotFoundException; so you have to either change the type, breaking backwards compatibility, or you can lie to the user that the function can indeed throw FileNotFoundException, thus leading to unreachable code

Also to quote Anders Hejlsberg:

It is funny how people think that the important thing about exceptions is handling them. That is not the important thing about exceptions. In a well-written application there's a ratio of ten to one, in my opinion, of try finally to try catch. Or in C#, using statements, which are like try finally.

We're making fun of Go for ignoring decades of language research, but this would be IMO a case going in the opposite direction, ignoring the decades of experience we've had with exceptions.

Answer 12 · 2018-05-06T12:34:55.000Z

@alexandru I understand your points, thanks for the detailed answer!

Answer 13 · 2018-05-06T14:49:51.000Z

I agree that binary IO (I don't like "bifunctor" for reasons discussed in #197) doesn't capture all the benefits of unary IO. My interest in debating this now is in evaluating whether unary IO is a specialization of binary IO with extra laws, or represents an entirely parallel hierarchy. More to the point, whether the prospective cats.effect.biwhatever is a breaking change or a feature release.

Answer 14 · 2018-05-06T15:27:40.000Z

@alexandru thank you for your honest opinion, this is a very valuable discussion and I think most of us agree on a lot of points, so I'll try to address some of your points.

The primary reason for why I don't like it is because Throwable is out of the picture and that means we will no longer catch Throwable in delay, in suspend or in flatMap.

I agree with you that Throwable is extremely important, but we can still catch Exceptions in delay and suspend (though you're totally right for flatMap. If you look at the BIO.apply, which is the synchronous delay counterpart it takes a function Throwable => E to create a BIO[E, A].
So yes, Throwable has to remain an important part of cats-effect no matter what, because as Daniel put it, it does reflect the runtime.

The premise that @LukaJCB writes about in Rethinking MonadError is that, due to this uncertainty about which operations trigger errors and which don't, we're forced to do attempt everywhere, but that is not a correct premise. The way exceptions work and why they were introduced in LISP and later in C++, is that you only catch exceptions at the point were you can actually do something about it.

Do you need to handle exceptions for all 3 operations? Of course not, even if all 3 operations can fail. You only handle errors at the point where you can recover — in case of web requests recovery can simply imply a log and an HTTP 500 error, which is totally fine, as the server can keep on going instead of crashing.

Maybe I worded things badly, but I don't think that's the premise at all.
I disagree that we should use attempt everywhere and I'd argue that the premise of the article is separating IO values whose errors have already been handled with those that have not.
So if you look at that snippet you posted:

val x: IO[A] = for {
  r1 <- op1
  r2 <- op2
  r3 <- op3
} yield r1 + r2 + r3

With standard MonadError handling this error at that stage using something like handleError means you still get an IO[A], whereas with something as described in that blog post would deliver you a value that encodes this fact into the type system with something like UIO[A].

IO[E, A] looks cool, but what happens downstream to the types using it? Monix's Iterant for example is Iterant[F[], A]. Should it be Iterant[F[], E, A]? Or maybe Iterant[F[Throwable, ], A]? Or Iterant[F[, _], E, A]?

If I parameterize the error in Iterant, how could Iterant keep on working with the current IO that doesn't have a E parameter? And if Iterant works with IO[Throwable, _], then what's the point of IO[E, A] anyway?

I fully agree on this, and I think it's the most important issue with BIO. There is loads and loads of code that's currently parametrized for F[_] and that means either rewriting a lot of it to make use of two type parameters, or fitting BIO into F[_] by fixing the left part of the type constructor to a specific type (something like BIO[E, ?]), but that means again that we can't make use of the fact that we can change the error type.

I see it somewhat similar to how IndexedStateT is really nice in theory, but it doesn't fit neatly within the Monad structure so working with it in polymorphic code is super unpractical.

There is always the problem that as an approach it is totally unproven and I don't want us to fall into that trap, just because Scalaz 8 is doing it, a library version that nobody is using due to it not being released yet.

Agree here yet again. 👍

And if the current IO ever happens as a type alias for an IO[E, A], personally I'll stop using it, plus cats-effect as a Monix dependency will be in serious doubt.

This is unfortunate and as I said earlier, I definitely don't think we shouldn't change it for 1.0, but if in a year from now, we do decide to make it a type alias, what is the biggest issue you're seeing? The fact that type aliases are fairly transparent and make type errors harder to read? It'd be cool if you could elaborate on why this is a show-stopper for you :)

The web is littered with articles on why checked exceptions were a bad idea and many of those reasons are also very relevant for an IO[E, A]:

Checked exceptions I love you, but you have to go

The Trouble with Checked Exceptions, an interview with Anders Hejlsberg

I disagree checked exceptions in general are a bad idea.
Those articles are from an entirely different perspective than what we usually deal with.
Most checked Exceptions could be completely converted to Either, which I think most of us would agree has a lot of value.
I also don't necessarily think those articles constitute language research, so I'm not sure the comparison to Go holds up that well.

That said, I think one of the points you extract is on point:

if you have an IO[E1, A] and you combine it with IO[E2, B], then you have to create an E3 that can express both E1 and E2

It remains to be seen if we can overcome this issue in a nice way, if at all.

For now I'd rather focus more of my energy on unexceptional types, as they seem to provide more value to me at least empirically (and don't require enormous buy-in as BIO does), while also not suffering from the above usage problems.
There's also the fact that BIO[E, A] is completely isomorphic to EitherT[UIO, E, A]. So we could have something that resembles BIO as well without having to duplicate a bunch of code for IO and BIO (though without the performance).

Answer 15 · 2018-05-08T17:03:24.000Z

I'm a huge fan of this, and will be writing more on the topic soon. I understand @LukaJCB's work requires more development before a PR can be considered, but it's very promising work and I hope it will be pursued, as it generalizes both unexceptional IO, and today's classic IO, with relatively minor differences (namely, catching in map/flatMap).

Answer 16 · 2018-05-10T19:22:54.000Z

I wrote a thing on this.

Answer 17 · 2018-05-10T21:31:46.000Z

Thanks for that, John.

Answer 18 · 2018-05-11T10:05:26.000Z

An argument that I think is really well made propped up on reddit in this post

Quoting here:

This is something I've been thinking about and I think this gets really unwieldly unless the language supports first class extensible records(Purescript) or open unions (Dotty/Scala3).
The reason I say this is because sum types don't work well for representing possible error values that a function can return.
Imagine we have two layers of function with the following call graph: (e.g. funcA1 calls funcC2 and funcE2)

funcA1 funcB1
| \ / |
funcC2 funcD2 funcE2
All of layer 2 (funcX2) can fail with errors. Let's say they only have one error type each so for example funcC2 has the signature Either ErrorC2 ().
What should the signature be for funcA1 and funcB1? You will need to construct a new sum type for each of funcA1 and funcB2.
Now imagine you have 3 layers of this (Validation/Business Logic/Database Access) where each layer can have its own failures. When I tried this for the sake of typesafety the result was:

Very unwiedly to write and refactor

Not helpful at all - because 99% of the time you simply just rewrap the error and pass it up

Doesn't really provide any more safety than simply catching exceptions at the top layer. You often don't pattern match on the sum type so the exhaustiveness benefit is not even exploited.

If you work in a system where you're calling out to multiple other 3rd parties where the data fetched can be invalid 1% of the time, strongly typed errors is very unwieldly. Currently we define our own base exception where our errors extend, and handle it at the top level to provide good error messages & error reporting.

Over the last ~3 years I've been bitten exactly once by not having strongly typed errors (where I actually wanted to handle the error not at the top level). It doesn't feel clean but I think is the right trade-off between ergonomics and safety for my situation.

Keen to give it another shot though, working with bifunctors are a lot better now than 3 years ago.

Answer 19 · 2018-05-14T19:30:51.000Z

@LukaJCB Even the person who made the argument now agrees bifunctor IO is a good idea. 😉

Answer 20 · 2018-05-14T20:16:34.000Z

3 days later? That was fast 🙂

Answer 21 · 2019-01-31T18:19:29.000Z

Hi all, thanks for your work on this library! I was curious if there had been any further developments or discussions about this?

Answer 22 · 2019-02-01T16:48:47.000Z

@andywhite37 Bifunctorial typeclasses and IO are being discussed for cats-effect 2.0 at #321

Answer 23 · 2019-02-05T10:57:53.000Z

Hey guys, I'd like to keep the issue tracker clean, in order to focus on important issues.

We agreed to provide bifunctor versions of the type classes in 2.0, we don't currently have a timeline for when that will happen since it's plenty of work, plus the ecosystem just adopted 1.0, but rest assured that it will happen.

I'm closing this for now.