Integrate Squid
ggevay opened this issue · 3 comments
Edit: We have resolved some more issues!
It would be really good to integrate Squid into our compiler pipeline, because its quasiquote-based API would probably make it easier to write tree transformations. I tried to do it [1], but unfortunately there were a number of issues. You can see the details of the issues in comments in the code [1], but here is a list:
Resolved issues:
- Squid expects the
tpt
of ValDefs to be filled, but we don't fill them [3] (resolved by an extra transform pass (addValAndVarDefTpts
) before passing the tree to Squid) - Implicit modifiers were falling off from ValDefs, but Lionel fixed this [6].
- Generic inner classes were not working, but Lionel fixed this [7].
- Squid expects implicits to be already inferred [4], but we can't really infer them before passing code to Squid. The solution suggested in [4] works (see
addImplicitPlaceholders
). - Path-dependent types were not supported by Squid [5]. This is an issue, because, e.g.,
TypeTag
is path-dependent on the universe. Lionel fixed this in Squid. - The tests in emma-examples-* and emma-lib-* (e.g.
TransitiveClosureFlinkSpec
) rely on accesses to local variables, which is not supported by Squid by default. However, the solution suggested by Lionel in the below comment solved this issue. - DefDefs are not supported in Squid, which is a problem, because we represent control flow with DefDefs (DSCF). I resolved this by doing a
dscfInv
/dscf
before/after Squid. this
references to enclosing classes or objects inLogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, StatsFlinkSpec
(and many other unit tests). Resolved by the workaround in #369 (comment) (2bd6cbc)
Not yet resolved issues:
Probably Emma bugs:
TransitiveClosureFlinkSpec
fails withsquid.quasi.EmbeddingException$Unsupported: Unsupported feature: Reference to local method 'doWhile$m2'
. ButdscfInv
should have removed this, so this is probably a bug indscfInv
.
Probably Squid missing features:
- Existential types in
NaiveBayesFlinkSpec, KMeansFlinkSpec, TokenizeFlinkSpec
. The most prevalent isArray[_]
, which appears due to implicitly resolvingCanBuildFrom
in code such asline.split(",").map(_.toDouble)
.
[2] says that "using type aliases may circumvent this limitation".
We also havex$1.label.type forSome { val x$1: org.emmalanguage.lib.ml.LDPoint[Long,Double] }
, but I'm not really sure how this type appears. - [minor]
squid.quasi.EmbeddingException$Unsupported: Unsupported feature: org.emmalanguage.api.FlinkDataSet.memoizeTypeInfo[String](implicitly[scala.reflect.runtime.universe.TypeTag[String]], org.apache.flink.api.scala.`package`.createTypeInformation[String])
This issue is not really a big problem, as these calls come in only near the end of our pipelines, so we can just avoid using Squid at that stage.
Probably Squid bugs (or maybe we give Squid some invalid trees):
EvalFlinkSpec, LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec
,BaseCodegenIntegrationSpec.'Updated tmp sink'
fail with an undefined variable issue. InEvalFlinkSpec
, in the tree that we give to Squid we have a var namedanf$m25
, and a reference to it somewhere. Squid seems to "rename" this var toanf$m25$m1
but doesn't change the reference. In the other test, the variable get renamed fromp$p$r2
top$p$r2$r1
.- [minor]
NGramsFlinkSpec
fails with
java.lang.Error: bad constant value: String of class class scala.reflect.internal.Types$ClassNoArgsTypeRef
at scala.reflect.internal.Constants$Constant.<init>(Constants.scala:52)
at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:34)
at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:272)
at squid.ir.ScalaTyping$class.constType(ScalaTyping.scala:107)
at squid.ir.SimpleAST.constType(SimpleAST.scala:20)
<omitted many more lines>
- [minor]
BaseCodegenIntegrationSpec.CSV
fails with
(List(),Stream(java.lang.String*)) (of class scala.Tuple2)
scala.MatchError: (List(),Stream(java.lang.String*)) (of class scala.Tuple2)
at squid.quasi.ModularEmbedding.squid$quasi$ModularEmbedding$$processArgs$1(ModularEmbedding.scala:337)
<omitted many more lines>
TODO:
- We should consider Lionel's suggestion to implement the Base interface:
"Note that I'm still of the opinion (as expressed earlier) that in your case, it would be more appropriate to not convert Emma's IR into a Squid IR, but instead to provide a new implementation of the Base interface so Squid can use Emma's own IR behind the scenes. Have you considered that? It represent a tighter coupling with Squid, but I could help with making it work and evolve as needed." - An even more extensive test of the integration would be to call Squid in
compileSpecPipelines
. (Or just comment in the testSquid line in Compiler.scala)
Other notes:
- There is a list of Scala features that Squid supports [2].
[1] https://github.com/ggevay/emma-1/tree/squid
[2] https://github.com/epfldata/squid/blob/master/doc/reference/Quasiquotes.md#supported-scala-features
[3] #234
[4] epfldata/squid#55 (comment)
[5] epfldata/squid#57 (comment)
[6] epfldata/squid#55
[7] epfldata/squid#56
Note: to support local variables you can always override ModularEmbedding
's unknownFeatureFallback
method, as is done exactly here.
The last "missing feature" issue you mention (squid.quasi.EmbeddingException$Unsupported: Unsupported feature: org.emmalanguage.api.FlinkDataSet.memoizeTypeInfo[String](..., ...)
) is strange. In principle what's inside the arguments should not be the problem, otherwise the error would have been reported for them. There is apparently something specific and unhandled going on with the memoizeTypeInfo
call.
this references to enclosing classes or objects in LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, StatsFlinkSpec (and many other unit tests).
There is also a (unfortunately rather hacky) workaround for this, similar to the one for local variables; see here.
[minor] EvalFlinkSpec and BaseCodegenIntegrationSpec.'Updated tmp sink' fail with an undefined variable issue. In EvalFlinkSpec, in the tree that we give to Squid we have a var named anf$m25, and a reference to it somewhere. Squid seems to "rename" this var to anf$m25$m1 but doesn't change the reference.
Strange. Perhaps due to the variable having $
in the name, about which Scala's freshName
implementation may be confused? (Not too likely, though.)
[minor] BaseCodegenIntegrationSpec.CSV fails
Nice catch! Seems to be due to the vararg being a Java vararg (differently encoded than Scala 😅)
which we omitted to handle.
Don't hesitate to open issues on the Squid tracker for this kind of bugs 🙂
OK, I resolved the this
issue by mostly copy-pasting the solution that you linked above.