emmalanguage/emma

FreshNames created at compile time can clash with freshNames created at runtime

ggevay opened this issue · 7 comments

ggevay@d19cb61
The test to run is org.emmalanguage.codegen.spark.CodegenTest. The first test fails, because it can't compile the following tree:

{
  val jabberwocky$1 = _root_.org.emmalanguage.codegen.BaseCodegenTest.jabberwocky;
  val apply$1 = _root_.org.emmalanguage.api.DataBag.apply[String](jabberwocky$1);
  val guard$1 = ((x$2: String) => {
    val length$1 = x$2.length();
    val x$2 = length$1.>(10);
    x$2
  });
  val x$1 = apply$1.withFilter(guard$1);
  x$1
}

The problem here is that there is a name clash: x$2 appears twice.

I did some debugging here: ggevay@44a05b3
If you run the same test, this will print all the intermediate trees between the transformations. Note, that one x$2 is already present in the very first tree. I suspect that this was created at compile time from the underscore in the lambda. Then this x$2 is retained through a lot of transformations, and then suddenly another x$2 appears when uninlineLetExprs calls api.TermName.fresh("x").

The implementation of freshTermName is in scala.reflect.internal.FreshNames, and uses FreshNameCreator. My conjecture about what is the problem is that when FreshNameCreator is used at compile time, that is a different FreshNameCreator than the one used at runtime. Therefore, there is nothing preventing the two from creating identical names.

We've seen this before (in the old IR). The least painful workaround is to use a different suffix. We can do this by declaring an abstract method freshNameSuffix in AST and then defining it in the sub-traits. However I'm not sure what would be appropriate suffixes. $ is good, because it's considered alphanumeric, yet is almost never used by programmers (by convention). Longer prefixes like $runtme$ and $macro$ will result in very long names. So I'm open to suggestions.

Maybe just $r and $m, it doesn't matter all that much.

👍 for the short prefixes.

OK, then I'll add $r and $m.

I've run into a slight complication. The unapply methods of objects Eta, While, and DoWhile rely [1,2,3] on the exact form of the fresh names, which I would be modifying now. Should I just also modify the regexes by adding (r|m) to them? This feels like a bad thing to do (DRY principle). Also, there are no tests for these methods, so I'm a little afraid of touching them. (And I also don't understand the purpose of the last + in those regexes.)

Note that the unapply methods of While and DoWhile are not used anywhere, and the unapply method of Eta is only used in etaCompact, which is again not used anywhere. Maybe I could just remove these methods? Or is there some planned use for them in the future?

[1] https://github.com/emmalanguage/emma/blob/newir/emma-language/src/main/scala/org/emmalanguage/ast/Terms.scala#L93
[2] https://github.com/emmalanguage/emma/blob/newir/emma-language/src/main/scala/org/emmalanguage/ast/Terms.scala#L106
[3] https://github.com/emmalanguage/emma/blob/newir/emma-language/src/main/scala/org/emmalanguage/ast/Terms.scala#L119

👍 for removing TermName.While, TermName.DoWhile, and TermName.Eta altogether. The loop names turned out unnecessary, because we can distinguish method symbols used as loop labels by a flag. Eta expansion is not used anywhere so we can remove etaCompact and etaExpand too.

Resolved via #266