tweag/inline-java

Introduce type-level regions

facundominguez opened this issue · 8 comments

There are three problems that could be addressed by the introduction of type-level regions:

  1. Preventing reference leaks. Related #7
  2. Caching class references (by using a reader monad). Related #72
  3. Ensuring local references are used in the thread they belong to

There are alternative ways to address each of these issues but using regions might yield a simpler and still effective design.

mboes commented

FTR, by typed regions @facundominguez means tagging each J object reference with a tag s, à la ST monad, in the style of the monadic regions presented here.

So instead of J ty, we'd have J s ty. This is in fact how HaskellR does things, but we did without it thus far out of an effort to keep the API approachable.

cc @aspiwack - eventually all three uses could be replaced with linear types + borrowing?

I can see how linear types could help with (1), but not with (2) and (3).

Another problem that perhaps regions could help solving:
4. Deleting global references when they are no longer needed. Related #75

The current approach is to have the Haskell GC delete the global references. Unfortunately, the Haskell GC has no pressure to delete the global references when the Java heap is full. Moreover, the GC runs finalizers in unbound threads, which requires sending the computation to some bound thread or thread pool to delete the references.

Linear types might help with this too.

I don't understand what 2. is about.

But 3 can at least be partially addressed with linear types: in general, fork shouldn't use linear values in the other thread (because the forked thread has unknown lifetime which breaks the contract of linear types).

So it could seem to solve 3, but in fact, I think that it's desirable to have linear Async which contain linear variables (they will be consumed by the type the Async is consumed (which, in this case, means wait) so it doesn't break the linear type contract). And with such a linear Async, unless I misunderstand, then linear values can escape their threads. Pure thread locality may require some extra work.

I'm not sure what should be done about 4, either with regions or linear types. We could have them be linear so that we have to call a free functions manually, but that kind of seems awkward to me.

I see global references as serving two purposes:
A. Allow references obtained in one thread to be used in another.
B. Allow references to survive stack frames. A local reference is deleted when control returns back to the java code which made a native call to haskell. A global reference remains valid until deleteGlobalRef is called on it.

Solving A

With regions, local references can be cloned and given to another thread vial global references. Inspired by Rust, imagine this primitive:

forkOSWithRefs :: [JObject r0] -> (forall r. [JObject r] -> IOR r ()) -> IOR r0 ThreadId
forkOSWithRefs xs f = do
    xs' <- mapM newGlobalRef xs
    liftIOR $ forkOS $
      xs '' <- mapM newLocalRef xs'
      mapM_ deleteGlobalRef xs'
      -- Or we might find a way to pass the global refs directly here
      -- and destroy them at the end of the region.
      runIOR (f xs'')

Solving B

In this case, a global reference is like a local reference that lives in a longer scoped region than a native call. I don't see immediately what the impediment would be to assign it to an appropriate region and delete it there.

So it could seem to solve 3, but in fact, I think that it's desirable to have linear Async which contain linear variables (they will be consumed by the type the Async is consumed (which, in this case, means wait) so it doesn't break the linear type contract). And with such a linear Async, unless I misunderstand, then linear values can escape their threads. Pure thread locality may require some extra work.

I think I need an example of how linear Async works, and how it would interact with references to make sense of it.

Like problem (2), a fifth problem that would be solved with a reader monad is:
5. Caching MethodID look ups. Currently call and callStatic do not guarantee that the MethodID is cached.

Currently we write:

createList :: IO (J ('Class "java.util.List"))
createList = do
  jIntegerArray <- reflect ([1..10] :: Int32)
  callStaticObject "java.utils.Arrays" "asList" [coerce jIntegerArray]

with linear types we need some monad with a linear bind I guess. Otherwise, we could not enforce the returned reference to be released eventually.

createList :: LIO (J ('Class "java.util.List"))
createList = do
  jIntegerArray <- reflect ([1..10] :: Int32)
  callStaticObject "java.utils.Arrays" "asList" [coerce jIntegerArray]

(>>=) :: LIO a -o (a -o LIO b) -o LIO b
runLIO :: LIO a -o IO a -- ?

Perhaps borrowing could be implemented with:

borrow :: J ty -o (forall s. JS s ty -> SIO s b) -> LIO b

JS s ty is reference J ty without linear constraints, but with a thread parameter s which ensures the reference does not escape the scope of the borrow. SIO is just IO tagged with the thread parameter s.