Proposal: instances for cats
jchapuis opened this issue · 7 comments
For 🐱 lovers out there, how about including cats instances in the besom-cats package?
allows for compact syntax like role >> roleBinding >> service
import besom.{Context, Output}
import cats.Functor
object CatsInstances {
implicit val outputFunctor: Functor[Output] = new cats.Functor[Output] {
def map[A, B](fa: Output[A])(f: A => B): Output[B] = fa.map(f)
}
implicit def outputApplicative(implicit context: Context): cats.Applicative[Output] = new cats.Applicative[Output] {
def pure[A](x: A): Output[A] = Output(x)
def ap[A, B](ff: Output[A => B])(fa: Output[A]): Output[B] = fa.flatMap(a => ff.map(f => f(a)))
}
implicit def outputMonad(implicit context: Context): cats.Monad[Output] = new cats.Monad[Output] {
def flatMap[A, B](fa: Output[A])(f: A => Output[B]): Output[B] = fa.flatMap(f)
def pure[A](x: A): Output[A] = Output(x)
def tailRecM[A, B](a: A)(f: A => Output[Either[A, B]]): Output[B] =
f(a).flatMap {
case Left(a1) => tailRecM(a1)(f)
case Right(b) => Output(b)
}
}
}
(the tailrec might need an actual implementation, which probably requires access to internals)
I thought about that too but I'm a bit apprehensive about this move because Output isn't really an equivalent of IO, not in the "global monad" sense - Output has Pulumi-oriented semantics and while it prooooobably would pass discipline tests for a Monad, Functor and Applicative it has some funny semantics related to dry runs (previews). In previews there are actually two different types of Outputs: static ones that are just like IO and if they say they are Output[A]
you will get an A
if you flatMap on it (provided it's not a failed Output) and computed ones that contain values resolved from providers in actual application of infrastructure. Computed ones behave like Option[A] in dry runs - they short circuit without a way to inhibit that (we could add a combinator to provide a dummy value in dry runs though) and do not run flatMaps. This is all probably fine with the laws but it is surprising to the end user. For instance, a 🐱 aficionado would expect that it's possible to just flatMap on everything that is a monad and write something like this using TF/MTL style (and please forgive me if this makes little sense for some reason, I'm assuming here that intent is to write TF over Output with MTs over Outputs as F implementation):
def doThings[F[_]: Async: LiftOutput]: F[Unit] = // assuming we can lift Output to F using a typeclass
for
uuid <- UUIDGen.randomUUID[F]
// this is resource constructor, it returns a static Output, behaves like IO
bucket <- s3.Bucket("my-bucket", s3.BucketArgs(name=s"my-bucket-$uuid").lift[F]
// properties on resources are computed Outputs so this will short-circuit in dry run and break plan
bucketName <- bucket.name.lift[F]
_ <- uploadAFile[F]("my-file", bucketName, "./my-file.html")
yield ()
def uploadAFile[F[_]: Async: LiftOutput](
name: NonEmptyString,
bucketName: String,
path: Path
): F[s3.BucketObject] =
s3.BucketObject(name,
s3.BucketObjectArgs(
bucket = bucketName,
key= name,
source=pulumi.FileAsset(path.toString),
etag=std.filemd5(input=path)
)
).lift[F]
but this would - in dry run - completely skip the part where a file is uploaded to the bucket where without a flatMap this operation would show up in the plan shown as a result of dry run / preview.
There's also another problem - Besom is not written in tagless final style and therefore one can't put F[_]
into Stack.exports
clause because it only works with Outputs. This makes it very very hard to use higher abstractions over Outputs and I sadly have to admit that this is by design - not to make it inconvenient to use TF/MTL style but to make it more clear that Besom is a higher level, domain specific DSL that allows to embed parts of programs written in any other style as side effecting code that does auxiliary tasks during infrastructure management work. So the intended direction is for things to end up translating to Outputs and that Outputs are final types in which infrastructural programs are expressed (and also with Outputs being used as pipes that transform data between resource properties and other resource inputs, not as global effect monads because it is strongly preferable to make all resource definitions top-level ).
There's also ongoing work to make the distinction between computed Outputs and static Outputs known on type level and to inform users about the possibility of broken plan in dry run due to resource constructor being called in a flatMap on a computed Output and it is possible that it will introduce separation between these two types of Outputs by splitting them into two separate types and that would force us to do even more magic with cats instances I'm afraid.
If you have another use case for those instances in mind please do tell! I'm also thinking about converting this issue to a discussion given that this is quite an important thing and we would really like to make it very obvious how Outputs work and discussions are probably easier to locate and pin than issues.
Just to be clear - this is what I mean by Outputs being the final types and IOs/ZIOs/Futures being subsumed into them:
def doThings: Output[Unit] = {
// notice `p""` interpolator:
val bucket = s3.Bucket("my-bucket",
s3.BucketArgs(name= p"my-bucket-${Output.eval(UUIDGen.randomUUID[IO])}"
)
// no flatMap, bucket name property is being passed as Output, direct syntax available via lifting
uploadAFile("my-file", bucket.name, "./my-file.html").void
}
def uploadAFile(
name: NonEmptyString,
bucketName: Output[String],
path: Path
): Output[s3.BucketObject] =
s3.BucketObject(name,
s3.BucketObjectArgs(
bucket = bucketName,
key= name,
source=pulumi.FileAsset(path.toString),
etag=std.filemd5(input=path)
)
)
info about lifting: https://virtuslab.github.io/besom/docs/lifting
Thanks for the elaborate response! I have to admit I haven't tried running the Monad laws on Output
, I can try when I find a moment.
My immediate concrete use case was usage of the >>
operator. But this could be added directly in the extensions I suppose? I find using monad transformers might also become interesting if I need to integrate conditional logic (OptionT
) and or error handling when dealing with some dynamic deployment code (EitherT
).
Tagless final why not for reusable bits of logic, but it's not immediately obvious, especially since as you say the design doesn't seem prepared for pluggable interpreters of the monadic chain (if I understood well).
(btw unrelated but it seems like both Intellij and Metals are struggling with type inference, not sure if it's work in progress or if it's just due to current scala 3 integration state of things)
To follow up on this, have you considered representing the Context
as a Reader
, rather than a given instance? maybe you discarded this option to support the literals?
On the plus side, I'm thinking a Reader
would make it natural to build a besom program value, pass it around, and interpret it with different contexts (for instance, for testing the program). It might also help with type inference and IDE performance (not sure, just a guess)
Hey, I've been thinking about this a lot lately. I've considered using a reader monad in the design stage but I wasn't able to come up with a solution to a pretty important invariant we have to never break. Context carries around a reverse semaphore called internally TaskTracker
. This semaphore is created with Int.MaxValue
permits and there's also a method called waitForAllTasks
that basically takes Int.MaxValue
permits from the semaphore effectively blocking until all permits are returned. Now every Output
ever created has to be registered with this semaphore and take a lease which is then returned once said Output finishes computation. This is very important because we run gRPC tasks in a fire-and-forget fashion and resolve all resources using Promises
(a resource is just a case class containing Output
s returned from Promise#get
and there's a fork handling gRPC call that will resolve said promises with errors or failures). I've been pretty anal about this invariant because I didn't want to introduce very hard to diagnose early exit errors and therefore even Output.pure
registers with the TaskTracker
(I am aware that this is probably unnecessary). Having worked on the internals for so long now I have a better grasp on what and when should be tracked and I'm slowly coming around to the idea that we could loosen tracking a bit and should that be possible, we could change the design to use reader monad and pass context internally in Result
. It would require, however, some work around debuggability of TaskTracker
with an easy way to log how many permits were taken at any given point in the program and then some careful testing. Possible, just not easy.
Thanks for the detailed answer! From your description, I understand one challenge is to implement a kind of thread barrier for the completion of all the asynchronous tasks.
My suggestion of using a reader was also a question regarding the feasibility of separating the formulation of the besom program from its execution in such a way that the pulumi instructions could be treated as pure values. But I'm not familiar with your internals, so feel free to discard 😄
So with a definition a bit like this:
case class BesomProgram[A](run: Output[Context => A]):
def map(f: A => B) = ???
def flatMap(f: A => ContextReader[B]) = ???
then your program could be
val program: BesomProgram[Stack] = ...
so that you could do
val stack = program.run(Output.pure(realContext)).interpret(realPulumi)
val assertions = program.run(Output.pure(testContext)).interpret(munitTester)
val diagram = program.run(Output.pure(previewContext)).interpret(fancyVisualizer)
Essentially all accesses to pulumi would somehow have to be tracked into Output
values (kind of a writer) but not executed directly. Hopefully, I'm not suggesting re-inventing an effect system here: Output
isn't a higher-kinded type constructor so maybe complexity-wise it's reasonable. Scala code expressing the besom program could still side-effect but everything effecting pulumi would be pure. Then orchestration of asynchronicity when dealing with pulumi would probably be easier to express in the interpreter than it is today.
Such a formulation doesn't seem to prevent our two biggest problems as far as I can see:
- in dry run computed Outputs behave like None and short-circuit
- there are grpc calls that have to be called exactly once
2nd would be actually much easier to deal with (because in usual pure FP for-comp-based monadic api you just flatMap stuff and that's how you control evaluation count) if not for the 1st!
Current state of the SDK is extremely similar to other Pulumi SDKs and that's by design (to allow users of other SDKs to understand Besom by looking at the code). Divergences from behavior or shape of behavior of other SDKs is very small (there's just a single behavioral difference caused by our need to memoize resource constructor calls, other than that we return resources in Outputs and other SDKs return resources unwrapped so a single syntactic difference). We could potentially diverge a bit more if it meant radical improvement in usability.
I was thinking about autoderived Default[A]
for all stuff that arrives from the engine via gRPC so that we could substitute missing stuff in dry run. This would alleviate 1) and allow users to just flatMap everything all over the place but it would make things extremely fragile unfortunately:
a) let's assume a field of type String on some resource that gets populated during runtime
b) nothing prohibits the user from mapping/flatMapping on that Output and parsing said String
c) said String would adhere to some structural convention of a cloud provider but we have no way of capturing such information, therefore
d) our dummy string from Default[String] would fail in user's code, depending on user's logic this error would either crash dry run phase or worse: yield a plan different from what is going to be applied!
This is just one option but it's fairly easy to see that it's going to be troublesome. The other aforementioned option is to detect usage of resource constructors inside of Output#flatMap (and this is the direction we're exploring the most). This would push users towards the style we currently promote (kinda like direct-style, with Outputs used to transform properties that are then fed to Inputs of other resources) and that's also similar to other SDKs.