tweag/inline-java

Implement a compiler plugin to aid code generation

facundominguez opened this issue · 2 comments

We don't have yet a good way to know which type the program expects for a given quasiquotation, therefore, the implementation assumes that all generated java stubs return java.lang.Object. At runtime, we can check that the type of the actual object matches the type that the program needs, but it would be better if we could learn about it at build time.

To address this we can implement a GHC plugin. It would work roughly as follows.

Given a module like

module M where
import Language.Java.Inline

io1 :: IO Int32
io1 = [java| 1  |]

io2 :: Int32 -> Int32 -> IO Int32
io2 x y = [java| $x + $y |]

TH produces a Haskell module like

module M where
import Language.Java.Inline

io1 :: IO Int32
io1 = callAnnotation (Proxy :: Proxy 1)
                     (callStatic "Inlinejava_M" "function1" [])
                     ()

io2 :: Int32 -> Int32 -> IO Int32
io2 x y = callAnnotation (Proxy :: Proxy 2)
                         (callStatic "Inlinejava_M" "function2" [coerce x, coerce y])
                         (x, y)

{-# ANN module (1, "1", []) #-}
{-# ANN module (2, "$x + $y", ["x", "y"]) #-}

where callAnnotation is an auxiliary function to carry the types we are interested in to the plugin phase.

callAnnotation :: Proxy i -> IO b -> args_tuple -> IO b
callAnnotation _ iob _ = iob

The module annotations would carry the AST of the java code in the quasiquotations and the names of the antiquotations, together with an index that is used to learn to which occurrence of callAnnotation they correspond.

Next, the compiler plugin makes a pass over the initial core to collect the types b and args_tuple from all the occurrences of callAnnotation. Then matches them with the corresponding values from the ANN pragmas. Then it produces the java code

class Inlinejava_M {
  int function1() { return 1; }
  int function2(int $x, int $y) { return $x + $y; }
}

Next this code is compiled with javac and the bytecode is included in a global bytecode table.
We can no longer use the static pointer table because, by this point, static pointers have already been desugared to core.

One could conceivably do away with the ANN pragmas. For this, the implementation should get the quasiquotation text and the variable names from core. In principle, this would make it more fragile to changes in how ghc desugars the output of the typechecker, so the current proposal might be preferable.

Work in progress branches

master...fd/bytecode-table
This has an implementation of a bytecode table. The static pointer table is not used anymore to embed java bytecode in executables. The main problem with it is that if the programmer forgets to pass -fplugin=Language.Java.Inline.Plugin to ghc, it will get NoClassDefFoundException at runtime. This is going to be solved by https://ghc.haskell.org/trac/ghc/ticket/13608#comment:21. While that is not implemented, the situation will improve when the generation of Java happens in the plugin, because in that case we can have the code produce a better runtime error when the quasiquotation annotations are not removed.

master...fd/plugin-types
This is not a finished feature but some hacking on the inline-java plugin to show how the program can be annotated with a function that carries the types that the plugin will need to generate the java code. It features a function to convert TH names into Core names, and a Core pass to collect the occurrences of the function used to annotate the quasiquotations.

#81 has been merged.