Should ap be parallel?
rpominov opened this issue · 31 comments
Strictly speaking this would violate Fantasy Land and Static Land requirements.
Here is an explanation I wrote in Fantasy Land gitter room recently:
So the spec require:
If a data type provides a method which could be derived, its behaviour must be equivalent to that of the derivation (or derivations).
Also spec says that
ap
can be derived asfunction(m) { return this.chain(f => m.map(f)); }
. In case of tasks the derived ap will execute tasks in sequence. So basically if provided ap is parallel then it's not equivalent to derived ap. Of course we could choose to consider sequential/parallel execution as an optimization unrelated to equivalence, since we basically can define equivalence as we like. But I think that would be wrong, sequential/parallel execution is related to equivalence.
But I can't think of actual practical consequence of this violation. And on the other hand it would be nice if traverse
of types other than array would execute tasks in parallel (we have built-in kinda traverse
for arrays — Task.parallel()
).
Also other libraries like Fluture or data.task have parallel ap
and seem to not have any trouble with it. @Avaq @robotlolita Maybe you can share some thoughts?
Practical consequence is that refactoring or reusing code must be done very carefully to make sure you don't accidentally make things sequential or parallel.
Yeah, good point. We won't be able to rely on algebra laws during refactoring.
Just an idea:
// TaskAp Is Applicative but not Monad so `ap` can be parallel
// Task is Monad so `ap` can't be parallel
import {Task, TaskAp} from 'fun-task'
// get :: a -> Task r a
// lift2 :: (Applicative f) => (a -> b -> c) -> f a -> f b -> f c
// toTaskAp :: Task r a ~> TaskAp r a
// toTask :: TaskAp r a ~> Task r a
const main = lift2(
(a) => (b) => [a,b],
get('/a').toTaskAp(),
get('/b').toTaskAp()
).toTask().chain(
([a,b]) => .....
)
something like Concurrently
I was thinking about having two static-land dictionaries. So with static-land it could look like this:
import {Task, TaskAp} from 'fun-task'
function lift2(Type, fn, a, b) {
return Type.ap(Type.map(x => y => fn(x, y), a), b)
}
const main = Task.chain(
([a, b]) => ...,
lift2(TaskAp, (a, b) => [a, b], get('/a'), get('/b'))
)
So a Task could be used as Task and as TaskAp at the same time without conversion? that looks unfamiliar
This is just how static-land types work. They don't require wrapping or conversion. We could do the same with arrays for example:
const List = {
map(f, a) {
return a.map(f)
},
ap(f, a) {
return f.map(f => a.map(f)).reduce((r, i) => r.concat(i), [])
},
}
const ZipList = {
map(f, a) {
return a.map(f)
},
ap(f, a) {
return f.map((f, index) => f(a[index]))
},
}
function lift2(Type, fn, a, b) {
return Type.ap(Type.map(x => y => fn(x, y), a), b)
}
lift2(List, (x, y) => x * y, [1, 2], [3, 4]) // [3, 4, 6, 8]
lift2(ZipList, (x, y) => x * y, [1, 2], [3, 4]) // [3, 8]
👍 I have taken look at static-land and now it makes sense!
const main = lift2( (a) => (b) => [a,b], get('/a').toTaskAp(), get('/b').toTaskAp() ).toTask().chain( ([a,b]) => ..... )
^ this, is what Parallel spec will look like but toTaskAp
will be parallel
and toTask
sequential
: fantasy-land#186
@rpominov IMO without a formal definition of "equivalent" is hard to settle this kind of problems. Given the current informal definition:
Terminology
- "value" is any JavaScript value, including any which have the
structures defined below. - "equivalent" is an appropriate definition of equivalence for the given value.
The definition should ensure that the two values can be safely swapped out in a program that respects abstractions. For example:- Two lists are equivalent if they are equivalent at all indices.
- Two plain old JavaScript objects, interpreted as dictionaries, are equivalent when they are equivalent for all keys.
- Two promises are equivalent when they yield equivalent values.
- Two functions are equivalent if they yield equivalent outputs for equivalent inputs.
both parallel ap
and derived ap
return a promise that yields an "equivalent" value, hence they should be considered "equivalent".
Note that by the same informal definition the following function is not ("equivalent"-)pure
const f = (url: string): Promise<string> => axios.get(url)
because when called twice with the same argument, it may return promises that yield different values.
The following random
function is also considered ("equivalent"-)impure (this seems to defeat the very purpose of having an IO
abstraction)
type IO<A> = () => A;
function random(): IO<number> {
return () => Math.random()
}
because when called twice, it returns functions that yield different values with the same input (undefined
).
Basically it would mean to define a suitable equivalence relation. In static-land there is Setoid
which represents equivalence relations. Maybe the spec should include a (Setoid a) =>
constraint when it describes the laws and/or an official Setoid
instance for all types.
For example,
the string
type
if a: string and b: string then a ≡ b if and only if a === b
the Array
type
if a: Array<A> and b: Array<B> then a ≡ b if and only if A = B and a.length === b.length
the Promise
type
if a: Promise<A> and b: Promise<B> then a ≡ b if and only if A = B
The last equivalence relation on Promise
would solve this issue (yes ap
s are equivalent) and the purity problem
EDIT: in the Array
example was missing: a[i] ≡ b[i]
at all indices
Yea, it really depends on how we define equivalence. And I think the definition should be based on referential transparency. E.g. if we replace value A with value B in a program and it still be the same program then A and B are equivalent. When we trying to decide if two programs are the same, we have to take into account not only our pure code that uses our pure abstraction, but also all side effects that happen when we "run" our "program" (data structure) that we have build using pure abstraction.
Getting back to Task, when we create our tasks and compose them using all available methods like chain/ap/race/parallel/etc
we write pure referential transparent code. For example if we have two tasks that we consider equivalent, we should be able to replace one with the other and get the same program.
So we construct our complex pure structure with tasks as building blocks. This structure is just data, although it contains some functions that will perform side effects, but until we run them we can consider them as data. This data structure may look for example like this:
{
type: 'race',
elements: [
{type: 'of', value: 1},
{type: 'map', function: x => x + 1, parent: {type: 'of', value: 2}},
{type: 'create', impureComputation: f},
]
}
Then we run this data structure using task.run()
method, at which point side effects start to happen.
Now the important idea is that when we consider if two tasks are equivalent, we must take into account not only values that they produce but also side effects that they perform, including order in which side effects happen. So in my opinion, parallel / sequential execution matters when it comes to equivalence of tasks. Otherwise we get a leaky abstraction.
Two promises are equivalent when they yield equivalent values.
Promises are really a bad example because they are impure to start with. I guess this example included there only because of familiarity with promises. So people could get the idea quicker.
In static-land there is Setoid which represents equivalence relations.
It really hard to implement Setoid for any reasonable definition of equivalence of tasks I think.
After thinking a bit more: Promises represent only result of a computation while tasks represent entire computation. So with promises it's correct to define equivalence based only on the value that they produce since they represent only the value. But not so with tasks.
if we have two tasks that we consider equivalent, we should be able to replace one with the other and get the same program
Actually you get the same program, based on the types.
A type Task<A>
means that you'll get a value of type A
, nothing more. If the side effects executed in order to get that value are sequential or parallel doesn't really matter, because that information is not encoded in the type.
If you want to make them different, you have to encode the difference in the types, otherwise the different behaviour simply does not exist from the point of view of the type system and it's only an implementation detail.
Yeah, from types perspective sure. But from that standpoint (x, y) => x + y
and (x, y) => x * y
are also the same programs. I mean if we look at the interface (number, number) => number
. Or [1]
and [2]
are equivalent values (both Array<number>
). So I just don't get it, seems like not what we need.
But from that standpoint f = (x, y) => x + y and g = (x, y) => x * y are also the same programs
IMO it always depends on the underlying equivalence relation. The informal "same" or "equal" should always be a formal "equivalent". For functions, as f
and g
, the equivalence relation would be
f ≡ g if and only if for all a, b in number f(a, b) ≡ g(a, b)
then f
and g
would be equivalent if you choose
const numberSetoid: Setoid<number> = {
equals: (x, y) => true
}
(This is weird but no more weird than seeing 1 + 1 = 0
in Z_2, the group with addition modulo 2)
But another (more sensible) choice is
const numberSetoid: Setoid<number> = {
equals: (x, y) => x === y
}
then f
and g
are not equivalent. It depends on how you want to model your system I guess.
For Task
the equivalence relation could be
a: Task<A> ≡ b: Task<B> if and only if A = B
so effects don't matter and a function like () => Task<number>
can be pure.
We could try to model our system differently and choose another equivalence relation for Task
, trying to encode the order of side effects, but how? (genuine question)
a: Task<A> ≡ b: Task<B> if and only if ???
f ≡ g if and only if for all a, b in number f(a, b) ≡ g(a, b)
This sounds good.
a: Task<A> ≡ b: Task<B> if and only if A = B
Although I'm not sure I understand this correctly. If A and B here are types, and say we substitute them with number
, does this mean that a
and b
are equivalent if they both produce numbers, even if a
produces 1 and b
produces 2?
Although if A and B are values, than it makes sense (e.g. a
and b
are equal if they both produce 2), but notation is kinda strange then.
Anyway, as I was saying in #28 (comment) , considering only produced values is not enough.
We could try to model our system differently and choose another equivalence relation for Task, trying to encode the order of side effects, but how?
I guess we could say something like: a
and b
are equivalent if they produce the same value and cause the same side effects when run. This probably not formal enough, but I think that to use simpler definition that only considers produced values is just not practical.
Update: At least it's not practical in general, although I can agree that there can be cases when side effects don't mater so much. For instance we have a task that fetches data from server, and we don't really care if it makes two request or one, as long as it fetches the same data.
If A and B here are types, and say we substitute them with number, does this mean that a and b are equivalent if they both produce numbers, even if a produces 1 and b produces 2?
Yes, they are types. I come up with that definition, which at first seems weird, in order to solve the purity problem. Let's say we have the following function
f: string -> Task<string>
If we state that a
and b
are equal if and only if they both produce the same value (read "equivalent" values), how f
can be pure? We call f
twice with the same value as input and we may get different (read "not equivalent") results. Same for IO
:
type IO<A> = () => A;
const g = (): IO<number> => () => Math.random()
g
is supposed to be pure, but it is not if we don't use a suitable equivalence relation on IO
a: IO<A> ≡ b: IO<B> if and only if A = B
For instance we have a task that fetches data from server, and we don't really care if it makes two request or one, as long as it fetches the same data
Exactly, if we don't encode the different behavior in the types or in a suitable equivalence relation, when we write Task<A>
we only care about the result, not how we get that result.
My point is that systems can be modeled in different ways, depending on the properties we care about, using types and formal definitions (like the equivalence relations). If we are not able to encode or specify formally a property we care about, well... I think we have a problem: either we can find a formal definition or maybe we should abandon that idea. Writing an informal definition is brittle and can lead to theoretical errors.
@gcanti g
is pure up until it is performed. Some languages have tools to ensure IO can only be performed by a runtime. We have to pretend in JS.
This discussion is has some weird theoretical ideas. There is a very simple and practical reason why ap
should have consistent parallelism with chain
:
No-cost code reuse and refactoring are the reasons for the Fantasy Land specification. I want to never think about code reuse or refactoring. If code reuse or refactoring changes parallelism of my program, it now requires my thought (because parallelism is not free) and I have lost all benefits of the specification.
@gcanti I think I starting to understand what you mean. The problem is that we have a single type Task
that can encode many kinds of effects. If instead we had FetchTask
, ManyFetchesInParralleTask
, ManyFetshesInSequnceTask
etc., then for instance it would be obvious that a value of type ManyFetchesInParralleTask<number>
is not equivalent to a value of type ManyFetshesInSequnceTask<number>
. But I have no idea how this could work in practice. Seems that we end up with basically writing our programs completely in types.
@puffnfresh If g
is pure and a = g(), b = g()
then we should have a ≡ b
. But a
and b
are functions so they are equivalent if and only if they return the same output (which is not the case). Hence g
is not pure, unless we make sure that a ≡ b
with an alternative equivalence relation on IO
.
g is pure up until it is performed
I don't understand the underlying model here, how can a property like purity change? In my mind purity is a static property. In set theory, my reference model, (pure) functions are just subsets of the cartesian product Domain x Codomain
(*). In the case of g
, if the codomain set IO<number>
(actually a quotient set because of the involved equivalence relation) contains only one element, then we can say that g
is a (pure) function.
(*) such that every element of Domain
is the first component of one and only one ordered pair in the subset
@gcanti we have encoded IO using functions but we have to pretend there is no domain. a = g(), b = g()
is not a thing we should be able to write, even if we actually can in JS.
@puffnfresh in my mental model the domain is Void
(or Unit
or how else we want to call the initial object)
g: Void -> IO<number>
In order to be pure g
must be constant, hence IO<number>
must contain only one inhabitant
@puffnfresh Ah yes sorry, I wrote "initial" instead of "terminal", I don't mean the empty type (for example in Flow would be the type empty
), but the unit type, i.e. the type with only one inhabitant (for example in Flow would be the type void
, which contains only undefined
). So g()
is can be modeled as g(undefined)
@gcanti Unit -> IO a
is the same as IO a
. I don't think I understand the model.
But, if it were the model:
a = g(undefined)
b = g(undefined)
Should mean a = b
, assuming g
only constructs an appropriate IO value.
@rpominov are you planning on releasing the applicative (or alternative) version of task?
I don't plan to do anything soon. I still don't have a strong opinion on how this should be solved from the library API perspective. Also don't have much time to work on this currently. But I may get back to this in the future.
Meanwhile, you could use concurrify
to wrap the FunTask in a manually generated ConcurrentFunTask type.
Excellent discussion!
My 2 cents:
- Task equivalence should be based on the execution graphs defined by the
Task
s.
When you use the ap
, chain
, etc. combinators on Task
s, you combine their execution graphs.
These combinators are pure functions.
They perform no side effects.
They take execution graphs as inputs and return execution graphs as outputs.
Only when you call run
(an impure "function" or procedure) with a Task
does any IO happen, and at that point, the only guarantee you ever have is that run
will use the execution graph encoded by the provided Task
. As soon as you perform IO, you have the potential for all kinds of failures and partial results.
In this light, the value resulting from calling run
with a given Task
cannot be used as the basis for an equivalence relation. An equivalence relation is reflexive, symmetric, and transitive. Imagine a Task
fetchMy2
that fetches a 2
stored in a database across the network. run(fetchMy2)
might yield a 2
, or an Error("Network unavailable")
, or even a "Note from your DBA: April Fools!"
.
So run(fetchMy2) != run(fetchMy2)
, which fails the reflexivity requirement of an equivalence relation.
However, equivalence can be defined for the execution graphs encoded by a Task
. In this light, the execution graph for a parallel computation will not be equivalent to the execution graph for a sequential computation.
If this seems strange because a parallel Task<A>
might appear to have the same type as a sequential Task<A>
, consider that Task
s are actually recursive data types, whos types are only equivalent up to a fixed point. Explicit types for parallel and sequential Task
s might actually look like ParallelTask<Task<A>, Task<B>>
and SequentialTask<Task<A>, Task<B>>
. A complicated example might look like SequentialTask<ParallelTask<Task<A>, Task<B>>, SequentialTask<Task<C>, Task<D>>>
. However, we can use a trick involving the fixed points of recursive data types to make any of these look like simple Task
s. General fixed points over recursive data types might make our types look nice and neat, but they also lead to unsound logic. It's a trade off that we make, but we do lose some type safety. This is probably what's actually happening implicitly with this Task
library, which leads to a lot of confusion. For more on this line of reasoning, there's a whole can of worms waiting for you on the other side of a Google search for recursion schemes.
- For different reasons, it also seems inappropriate for the spec to require that, because
ap
can (under very specific assumptions) be implemented in terms ofchain
, it should be.
ap
and chain
don't even come from the same typeclass. An Applicative is not required to be a Monad, and a given instance for an Applicative doesn't have to be related to another arbitrary instance for a Monad. There can be multiple instances for any given typeclass (even with the same carrier type), and laws can only be defined in terms of the operations of a specific instance of a specific typeclass.
In summary, I think you can define an equivalence relation for Task
s based on execution graphs. Given that idea, the execution graph defined by a Task
of sequential operations is not equivalent to the execution graph defined by a Task
of parallel operations, and you should probably have different combinators (or instances of typeclasses) for each. This is similar to defining different monoid instances for integer addition and integer multiplication.
Final note - all of the above is very hand wavy. I hope it's helpful, but I haven't proven anything, and I mostly wrote this relying on my fallible human memory. I'm also not an expert mathematician, logician, category or type theorist, etc - just a relatively inexperienced enthusiast. Please challenge everything I've said, and feel free to point out any mistakes in my informal lines of reasoning.
I like the idea of thinking about Tasks in terms of their "execution graph" to define equivalence. The execution graph for a task using parallel ap
would not match the execution graph using ap
as derived from chain
, thus breaking the law of derivation (as I'll call it for now):
If a data type provides a method which could be derived, its behaviour must be equivalent to that of the derivation (or derivations). -- fantasy land derivations
The way I see it, this law defines a relation between a "child" algebra and its "parent". Eg, Monad is a "child" of Applicative (ref), in that all values with Monad instances must also have Applicative instances. But through the law of derivation, Applicative is also a "child" of Monad, in that its behaviour is defined in terms of Monad.
I've recently been using an example to defend sequential ap
behaviour, keeping this relationship in mind. It uses this piece of code:
discard :: a -> b -> b
createDatabase :: String -> Task Error a
insertRecord :: String -> Object -> Task Error Object
createDatabase ('users') .chain (_ => insertRecord ('users') ({name: 'bob'}))
The code above uses a monadic Task type to encode a sequential program. Knowing that it's monadic, and knowing the relationship between Monad and Applicative, the following program must be equivalent to the prior:
createDatabase ('users') .map (discard) .ap (insertRecord ('users') ({name: 'bob'}))
However, if ap
were to break the derivation law and execute these two Tasks in parallel, the program would be unstable. The users
database often would not yet exists before attempting to insert Bob into it.
The benefit we get out of using these strictly defined algebras, is that we can create reliable general abstractions. We should be able, for example, to create a general abstraction that uses the Applicative instance, but restricts the input to have Monad instance so that we can rely on "sequential" behaviour.