ceu-lang/ceu

loop x in [a->b] tags x as constant but remains set after loop exits

Opened this issue · 1 comments

var int x;
loop x in [0 -> 5] do
  // x = 9; rejected by the compiler within the loop, fine
end
x = 8; // x is mine now and should not be tagged as constant any further.

Probably the constant tagging code stems from the time where Céu did not require a declaration of the variable outside the loop body. However:

loop x in [0->3] do end // tags x as read-only
loop x in [7->9] do end // but the loop-compiler doesn't check that.

This led me to another bug discovery. The compiler does not reject:

var int i = 0; var int x;
loop x in [0 -> 5] do // x becomes read-only hence unavailable
  // as an iteration variable in an inner loop.
  loop x in [30 -> 40] do i=i+1; end end
escape i;

This causes the outer loop to be aborted prematurely because x got modified!

It is perfectly fine that (unlike C) the loop variable must not be modified within the loop body (following the principle of least surprise), but that should apply to every use, incl. an attempted re-use of this variable in an inner loop.

  1. The loop compiler must check that x is not yet constant.
  2. Somehow, clear the "read-only" tag when exiting the loop body.
  3. Perhaps mark x as uninitialized at loop exit? (see below)

The current requirement that the loop variable be declared prior to the loop led me to the question "What is the value of x after loop exit?" and thus to wonder about non-determinism in Céu.

  • Will x have the value it had during the last iteration?
    This is conceptually nice, as it immediately yields a subtype for x: the same interval [a,b] that defined the loop.
  • Will x have the overflow value that caused the loop to exit?
    E.g. presumably 3 from [1->3[,1 but possibly 250 from [0,200],50 and
    even scarier, 206 from loop u8 x in [0<-250],50, e.g. apparently in-range 0-250!
  • Will x have as value any combination of the above, depending on the exact form [a->b], [a->b[, ]a->b], ]a->b[, [a<-b] etc.?
  • Will x remain uninitiliazed/untouched in the case of ]a->a[?
  • Might x be made unreadable by some trick, e.g. by tagging it as "uninitialized" upon exit of the loop?

This reminds me of similar discussions concerning the final value of the LOOP iteration variable of the FOR clause within the FINALLY clause of ANSI Common Lisp's LOOP macro (that also features inclusive and exclusive limits). Even though the ANSI-CL specification tries to be precise, there is still room for some interpretation and the various Lisp implementations don't agree in all corner cases.

It would be conceptually much simpler if the variable need not be declared prior to the loop statement. I therefore propose the syntax loop var <type> x in [...] do
Benefits:

  • The syntax makes it obvious that the variable is local to the loop.
  • x can be tagged as constant within the body (same as its whole lifetime), no complications.
  • No discussion about final value, because x is simply no more accessible outside the loop.
    This is unlike C and Python. In those languages, people sometimes read the value after exit, e.g. to check whether the loop reached the end or not. In some other languages (e.g. Scheme and perhaps C++), the loop variable is simply gone. So some use-cases need be coded differently.

Cost of non-adoption e.g. current behaviour: Céu ought to remove non-determinism as much as possible from the language, define the value of x after loop exits and add test cases that document this behaviour in machine-checkable form.

If x remains defined upon exit, I'd vote for the option "x = overflow value", cf. my loop implementation suggestion in bug #98 -- but I have not yet thought about the case ]a->b[ (x = a or a+step?).

The compiler already refuses to initialize or assign variables later used in loop, reporting that they are read-only. So I was wrong when I said "x is mine".

var int x = 7; // invalid assignment : read-only variable "x"
loop x in [2->8] do ... end

Hence what's left of this issue is:

  1. Bug: The compiler should detect and prevent the nested loop example.
  2. Enhancement: Make the loop variable unreadable after loop exit, to avoid undefined behaviour and thus non-determinism. (Alternatively try really hard to precisely define the value. Not trivial with C as a backend, where lots of simple operations cause undefined behaviour, e.g. signed addition overflow.)
  3. Enhancement: Syntax loop var int x in ... which would avoid all these pitfalls.

Meanwhile, one can take advantage of the current behaviour to provide immutable variables to Céu:

#define constvar(typ,name,value) \
  var typ name; \
  loop/1 name in [ (value) -> _] do break; end
constvar(uint, x, 1+2*3);
_printf("%u\n",x);
//x = 7; // invalid assignment : read-only variable "x"

Alas, vec[x] is still not the same as vec[7] (dynamic vs. static allocation). That would require real support for constants in Céu.