typelead/eta

Direct Java Interop

carymrobbins opened this issue · 10 comments

While I think it's valuable to have an FFI mechanism in Eta, I think it might be immensely beneficial to support direct Java interop in some way by default, similar to what's possible in Scala, Clojure, etc.

Here's a completely contrived example of what this may look like -

java import qualified java.util.Arrays as Arrays
java import java.util.List

mkList :: Java x (List Int)
mkList = Arrays.asList $ arrayFromList [1, 2, 3]

nonEmpty :: List a -> Java x Boolean
nonEmpty xs = not <$> (xs <.> isEmpty)

For the java import statements, Eta would use some generated FFI bindings based on the class definitions, simply wrapping everything in the Java type. Here's an example of what this might look like for the imported modules -

module java.util.Arrays where

java import java.util.List

foreign import java unsafe asList :: JArray a arr => arr -> Java x (List a)
module java.util.List where

data {-# CLASS "java.util.List" #-} List a =
  List (Object# (List a))
  deriving Class

foreign import java unsafe isEmpty :: Java (List a) Boolean

One problem is the fact that Haskell module paths must be made up of Consym tokens, which must start with an uppercase letter. This conflicts with the Java convention of using lowercase characters. We might be able to get around that easily by using an uppercase first letter.

java import qualified Java.Util.Arrays as Arrays
java import qualified Java.Util.List
...

If that's not possible or maybe even desirable (possibly due to the ambiguity introduced) we might consider adding new rules to the lexer.

I'm going to sketch an implementation for this so we can see where what limitations (if any) will arise from this.

When the compiler sees something like this:

import java java.util.Arrays

(The java keyword can trigger a new lexing rule that doesn't have to be consym.)

It will signal the Renamer (the compiler pass that resolves all identifiers exactly) that all the method names from the Arrays class become valid Eta identifiers. Let's take for example:

static int binarySearch(byte[] a, byte key)

This will generate binarySearch :: JByteArray -> Byte -> Java a Int where all those types already exist in the standard Java module in base.

So when the signature only has primitive Java types and primitive Arrays, we can do direct imports without issues since the types are already defined somewhere.

If you look at the Arrays class you'll find that there are many overloaded methods.

How do we deal with overloaded methods?

One way we can do this is to add a mechanism to specify the type of an overloaded method as well as a way to specify the Eta identifier that correponds to it:

import java java.util.Arrays (binarySearchChar as binarySearch :: JCharArray -> JChar -> Java a ()
                             ,binarySearchDouble as binarySearch :: JDoubleArray -> Double -> Java a ()
                             ,myAsList as asList)

If you don't import this way and the compiler detects that it's an overloaded method, it will not import it by default. Instead, when you try to use binarySearch it will give you an error telling you that it's ambiguous and suggest that you spell out the type signature.

Now suppose we imported another module that had an Eta function with the name binarySearch. We can resolve the ambiguity by

  1. Using the class name:
Arrays.binarySearch
  1. Or a qualified import:
import java java.util.Arrays as Arrays2

Arrays2.binarySearch

Now the next question becomes: how do we deal with importing arbitrary Java objects that we haven't imported as JWTs yet?

The main issue here is that we need to assign some Eta type to a given Java object type. One way to do this to use a type-level string and a type-level list:

type JString = Object "java.lang.String" '[]

Note that '[] - that stores the generic type parameters (if any) for the given type.

The kind of the expression above is:

Object "java.lang.String" '[] :: *

Object can be pictured as defined like below:

data Object (a :: Symbol) (as :: [*]) = Object {- native Java object here -}

Consider the overloaded method case above, but with Android's View class:

void 	autofill(AutofillValue value)
void 	autofill(SparseArray<AutofillValue> values) 

(Stealing Scala import syntax since it's nice)

import java android.util.{SparseArray, AutofillValue}
import java android.view.View (autofillSparse as autofill :: SparseArray Autofillvalue -> Java View (),
                               autofill :: AutofillValue -> Java View ())

Looking closer:

import java android.util.{SparseArray, AutofillValue}

will generate

type SparseArray a = Object "android.util.SparseArray" '[a]
type AutofillValue = Object "android.util.AutofillValue" '[]

You can rename the corresponding Eta type as follows:

import java android.util.{SparseArray as SArray, AutofillValue as AValue}
import java android.view.View as V (autofillSparse as autofill :: SArray AValue -> Java V (),
                                    autofill :: AValue -> Java V ())

Inner classes can be imported as follows, again using View as an example:

import java android.view.View (OnLayoutChangeListener as LayoutChangeListener)

and the type can be referred to as LayoutChangeListener in the module. If the as clause was not specified, OnLayoutChangeListener would be available by default. This will happen recursive for all available inner classes. You must use as clause to disambiguate if there are many inner classes (nested) with the same name.

Again, the listener example above:

import java android.view.View (addOnLayoutChangeListener :: (View -> Int -> Int -> Int -> Int -> Int -> Int -> Int -> Int -> Int -> Java a ()) -> Java View ())

By specializing the type as shown above, you can subsitute a Java monadic action for a SAM (Single Abstract Method). The compiler will then verify if your signature matches up with the signature read in the class file.

So what's the difference between this FFI implementation and the old FFI implementation? This new one will have a very minor perf cost in the sense that we will now use a generic Object type as a JWT instead of specialized types. What this means at the end of the day is that we will have to 1) unbox the generic JWT 2) cast to the appropriate type denoted by the type-level string 3) call out to the Java method. In the old FFI, step 2) was not necessary because of specialization.

Even then all is not lost - we can add a pass in the bytecode generator that looks for unnecessary casts and elides them.

And what are the limitations? There are none - you can do pretty much anything you could do with the old FFI in the new one and omit type signatures in a lot of cases.

The overall implementation requires:

  • A classpath/JAR manager
  • Changes to parser/lexer
  • Changes to importing mechanism
  • Renamer changes - the renamer should use the classpath manager above when resolving identifiers.
  • Minor FFI generation changes
  • Typechecker/Constraint Solver changes - the Extends typeclass/type family should be built into the compiler.

By going through all these changes, I think we should have a much more pleasant FFI to work with. Thanks for starting the discussion @carymrobbins!

One thing we might want to take into consideration is the potential for nullable values in the Java FFI. I'm not entirely sure how that could look, but maybe there's some way to annotate it. In cases where methods have been annotated with a @Nullable, we could figure it out (if that's available in the byetcode, or maybe even if it isn't, I'm not entirely sure) and automagically promote those to Maybe; however, for ones which we know are semantically nullable it might be more tricky. The solution might be to just have the user do it in userland, or we might even be able to pull off some sort of bytecode analysis.

If the retention policy is at least CLASSFILE for the @Nullable annotation, we can detect it. Currently, the plan for direct imports is that if you return an Object reference and you don't wrap it in a Maybe, the compiler will emit a warning that it is potentially unsafe. Similarly for specifying the type signature without a Java monad in the return type.

Looks great!

Random idea off the top of my head, maybe we can omit the java keyword after the import one, or at least make it optional.

Java packages have to be named with lowercase letters, so the compiler might see that the package name starts with a lowercase letter and infers that it is a Java import. In cases like import Utils where it could be either an Eta module or a Java class without package, the java keyword could be used to disambiguate:

import Utils (profunctorLensMapping)
import java Utils (abstractFactoryBuilderBeanBuilder)

Because the java import mechanism will behave differently than the default import mechanism (which we will stick to keeping compatible with Haskell for now), I think requiring import java and being explicit about when you are bringing Java FFI bindings into scope is helpful for people reading the code. There's always the tradeoff of explicit/implicit - in this case I think being explicit will actually be helpful for newbies who are reading Eta code. They'll know exactly when FFI is being used instead of having to remember conventions.

I've been looking through this issue and it's occurred to me that most of the discussion here has revolved around importing Java classes and methods only. However, the current foreign import system also allows the importing of constructors (@new), fields (@field), getters and setters (@static @field) and functional interfaces (@wrapper and @wrapper @abstract). How will Direct Java Interop deal with these cases?

I would think that the direct interop could probably work for fields, getters, setters, functional interfaces, etc. automatically when the names (and/or types) aren't ambiguous.

However, new might be an issue. Maybe to represent new Foo() Eta could generate a newFoo function. Alternatively, maybe some sort of New type class might help with overloaded constructors and allow for a more general mechanism that doesn't require generated function names.

class New o i | o -> i where
  new :: i -> o

For constructors taking multiple arguments, we could either chained instances or, more simply, just require tupled arguments, e.g. the following could be generated for new String(bytes[], Charset)

instance New (ByteArray, Charset) JString where new = ...

This doesn't take into account the effect type, but we could either add that as a type argument to the type class or in the instance definition for the o type -

instance New (ByteArray, Charset) (Java JString) where new = ...

Possibly another way would be to just export a new function and expect the user to qualify it with an import -

import javalib.Foo (Foo)
import qualified javalib.Foo as Foo

stuff = Foo.new "bar"

You could even combine the qualified new function with the type class approach for getting overloaded new specialized to the return type in the cases where there are more than one constructor.

These are just a few ideas. Auto-FFI will get very complex if we're not careful and we should try to opt for the simplest solution, still allowing FFI for "low-level" interop, IMHO.

@carymrobbins I quite like the New typeclass idea! After implementing it as a proof-of-concept it does seem to work, albeit with a slightly different definition:

{-# LANGUAGE MultiParamTypeClasses, ScopedTypeVariables #-}

module Main where

import Java

foreign import java "@new" newString1 :: JString -> Java a JString
foreign import java "@new" newString2 :: Java a JString
foreign import java "@new" newDouble :: JString -> Java a JDouble

foreign import java "doubleValue" doubleValue :: Java JDouble Double

class New i o where
  new :: i -> Java a o

instance New JString JString where new = newString1
instance New ()      JString where new = const $ newString2
instance New JString JDouble where new = newDouble

main :: IO ()
main = java $ do
  str :: JString <- new (toJava "10" :: JString)
  dbl :: JDouble <- new str
  (dbl <.> doubleValue) >>= io . print

The only problem is that you have to provide explicit type signatures using ScopedTypeVariables. It would be much nicer if TypeApplications Eta had the TypeApplications extension; then we could have something like new @JavaClass (arg1, arg2), which looks similar to Java new JavaClass(arg1, arg2).

I also agree with @carymrobbins that we should still allow 'manual' FFI; it will always be more flexible than auto-FFI.

@bradrn I also considered TypeApplications for this. Modelled it in Haskell and seems to work well if you reverse the order of the New type args -

class New o i where
  new :: i -> Java a o

instance New JString () where ...

stuff = do
  s <- new @JString ()
  ...

Good point about the argument order - I missed that. But Eta doesn't seem to have TypeApplications yet...