sannybuilder/dev

Function Syntax RFC

Opened this issue ยท 84 comments

x87 commented

Goal

  • Concise syntax for SCM functions
  • Make SCM functions easier to use by hiding some low-level details

Considerations for function syntax

Pascal-style:

function a12(name: string): string
  • easier to find using a "function" keyword
  • param declaration is consistent with the "var" block syntax

#### C-style:

string a12(string name)

* consistent with inline var declaration

Pascal-style is more consistent with the rest of the language.

Rules

  • Function must be declared before first usage Function are available anywhere within current scope. Functions defined inside other functions are available anywhere in that function body

  • Function body starts with the "function" keyword followed by the function signature.

  • Function name prefixed with @ is a valid label

  • The signature includes input parameters (if any) and their types, comma-separated.

  • Input parameters, when present, must be enclosed in "()". For zero-input functions "()" is optional

  • Input parameters are followed by a return type if the function returns anything

  • Multiple return types are comma-separated, and parenthesized.

  • Return type(s) can be prefixed with the optional keyword. Function with an optional result may return nothing.

  • Function body ends with the end keyword.

  • return keyword immediately exits the function and returns control to the calling code. See Return Semantics

    • functions with optional return type(s) can use blank return to bail out immediately while returning nothing
    • There is special logical return type. This result can only be validated in IF..THEN and can not be stored in a variable. Logical function returns true or false.
    • functions that define return type(s) must return same number of values. logical function must return a condition flag. Optional functions may return nothing.
    • When you return with some value(s), it is always a true condition for IF..THEN. Empty return is always a false condition. Value returned from a logical function defines the condition.
  • Function can return one or more values using the following syntax: return <condition flag> <value1> <value2> ....

  • return keyword in functions should not be confused with return keyword in gosubs. Function's return is always followed by some values, or true, or false using CLEO5's CLEO_RETURN_WITH

  • end keywords serves as an implicit return with the default values matching the function signature using RETURN (CLEO5 is required) using CLEO_RETURN

  • return false is a special case that can be used in any function. It sets the condition result to false and exits the function ignoring all output variables

  • return true is a special case that can be used in function with no return type. It sets the condition result to true.

Examples

Declaration

  • Function name serves as a label. it can not duplicate an existing label.
  • Functions can be declared upfront to allow calls be located before the function body (forward declarations, see below)
  • "end" represents CLEO_RETURN_FAIL RETURN CLEO_RETURN 0 if there no an explicit cleo_return_* on the preceding line
{$CLEO}
function a0
end // implicit CLEO_RETURN 0

function a1(x: int)
end // implicit CLEO_RETURN 0

function a2(x: int, y: int)
end // implicit CLEO_RETURN 0

function a3(): int
end // implicit CLEO_RETURN 0

function a4(): int, int
end // implicit CLEO_RETURN 0

function a5(): int
    return 42 // explicit cleo_return_with true 42
end // implicit CLEO_RETURN 0

function a6(): int, int
    return 42 84 // explicit cleo_return_with true 42 84
end // implicit CLEO_RETURN 0

function a7(): int
    if 0@>0
    then
        return 42 // explicit cleo_return_with true 42
    end
end // implicit CLEO_RETURN 0

function a8()
    if 0@>0
    then
        return // explicit 0051: return
    end
end // implicit CLEO_RETURN 0

function a9(): logical
    return false // explicit cleo_return_with false
end // implicit CLEO_RETURN 0

function a10(): float
    if 0@>0
    then
        return 42.0 // explicit cleo_return_with true 42.0
    end
end // implicit CLEO_RETURN 0

function a11(): string
    if 0@>0
    then
        return 'test' // explicit cleo_return_with true 'test'
    end
end // implicit CLEO_RETURN 0

function a12(name: string): string
    return name // explicit cleo_return_with true 0@ 1@
end // implicit CLEO_RETURN 0

function a14(): int
  if 0@>0
  then return 1 // cleo_return_with true 1
  else return 0 // cleo_return_with true 0
  end
end  // implicit CLEO_RETURN 0


function a15(): int, float
  if 0@>0
  then return 1 2.0 // cleo_return_with true 1 2.0
  else return 0 0 // cleo_return_with true 0 0
  end
end // implicit CLEO_RETURN 0

Examples of logical/optional return types

function no_ret_ok
    return
end

function no_ret_error
//    return 1 // error, should not return a value
end

function ret_logical_ok: logical
    return 1
    return 0
    return 0@ == 0
end

function ret_logical_error: logical
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_1_ok: int
    return 0
end

function ret_1_error: int
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_2_ok: int, int
    return 0 0
end

function ret_2_error: int, int
//    return      // error, should return 2 integer values
//    return 1    // error, should return 2 integer values
end

function opt_1_ok: optional int
    return
    return 0
end

function opt_1_error: optional int
//    return 1 2 // error, should return 1 integer value
end

function opt_2_ok: optional int, int
    return
    return 1 2
end

function opt_2_error: optional int, int
//    return 1 // error, should return 2 integer values
//    return 1 2 3 // error, should return 2 integer values
end



if and
    ret_logical_ok()
    0@ = ret_1_ok()
    0@, 1@ = ret_2_ok()
    0@ = opt_1_ok()
    0@, 1@ = opt_2_ok()
then
    // ok
else
    // error
end

Calling functions

a0() // ambiguous: gosub or call?, need lookahead
a1(5) // cleo_call a1 1 5
a2(5,6) // cleo_call a2 2 5 6 

// single result
0@ = a3() // cleo_call a3 0 0@

// multiple results
0@, 1@ = a4() // cleo_call a4 0 0@ 1@

// use functions in initialization position
int x = a3() // cleo_call a3 0 0@
string name = a12("test") // cleo_call a12 1 "test" 0@ 1@

// logical functions
if a9()
then
...
end

Grammar

function := ["export" whitespace] "function" whitespace identifier "(" params ")" ( return_type1 | return_type2)
params := [ identifier ":" type ] [ "," params ]

return_type1 := [ "(" ] type [ ")" ]
return_type2 := "(" type "," types ")"
types := type [ "," types ]

var1 := [ "(" ] var [ ")" ]
var2 := "(" var "," vars ")"
vars := var [ "," vars ]

function_call := [ ( var1 | var2 ) "=" ] identifier "(" args ")"
args := ( var | const ) [ "," args ]

x87 commented

TODO: make it compatible with #45

What about calling function reference stored in variable (like passing callback function via parameter, or functions stored in array)?
Currently in 0AB1 function label can be variable.

What about functions that can take/return multiple types of variables?
Like numToText accepting first param int or float depending on second boolean param.
Same situation with return type, where depending on logic return type might be int or float.

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

x87 commented

What about functions that can take/return multiple types of variables? Like numToText accepting first param int or float depending on second boolean param. Same situation with return type, where depending on logic return type might be int or float.

union types are out of scope for this RFC. There should be a separate function for each combination of types

x87 commented

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

Though case. For sure condition result can not be on the right side of the return keyword. It leads to idea having it on the left:
return 5@ - condition result true
not return 5@ - condition result false
5@ return 5@ - condition result based on value of 5@

x87 commented

How condition result will be handled? Your earlier idea with return and not return seems nice. Possibility to use variable as condition result should also be a thing. For situations where function always have to perform some clean-up, then exit with condition result.

any idea for complete syntax?

Though case. For sure condition result can not be on the right side of the return keyword. It leads to idea having it on the left: return 5@ - condition result true not return 5@ - condition result false 5@ return 5@ - condition result based on value of 5@

this syntax is complicated and confusing. I think the last case should be handled outside of the function, or the boolean value should be returned as a regular value.

A function should have the condition result set to true at the start of the function. It would allow to use empty functions or functions without ifs in conditions.

if fun1()
then
//  <---  fun1() is true
end

function fun1() <--- condition result set to true by cleo_call

end <--- return as is

If the function wants to explicitly change the condition result to false, we can use return false. Note that return 0 would be considered as a 'true' result, so return false and return 0 are not the same.

Then you are allowed to use regular tricks with conditional opcodes to alter the condition result:

function fun1()

  is_australian_game // changes condition_result to false
  is_pc_game // changes condition result to true

end

so to summarize:

If you return any value, it is always a success. Condition result is true (unless altered by the last conditional opcode)
If you return false it is always a failure. No output variables are modified.

We can use 8AB2 for return false case. It collects all returned values, then skips the output variables in the caller, then sets the condition result to false.

0@=100 
1@=200 
2@=300
if  (0@,1@,2@) = fun() // cleo_call 0 0@ 1@ 2@
then
  print 0@ 1@ 2@ // here it prints 10 20 30
else
  print 0@ 1@ 2@ // here it prints 100 200 300
end

function fun(): (int, int, int)
  if not <cond>
  then
    return false // 8AB2: not cleo_return 3 0 0 0
  end
  
  return (10,20,30) // 0AB2: cleo_return 3 10 20 30
end

Um I really don't like fact that return false would behave differently than other cases, and what is situation where function is supposed to return one bool param? Syntax is getting ambiguous.
Feature of leaving the function sounds nice, but should not be performed with return keyword. This seems be ideal case for 'break'.

Inexplicit carrying on condition state of last executed opcode also seems not right. Instead of easy functionality we get hidden convoluted logic without clear rules enforcement. Function returns true by default, but seemingly not related change will make behave it differently.
Generally in my practice I always use variable to carry function ok\failed state, as it can not be trusted to set in in middle of the function and hope it is still valid while exiting. In my opinion end user should not be even aware that condition result exists in the background all the time.

Maybe instead of return we should only allow return_true and return_false as function ending keywords?

x87 commented

If function returns one bool param, it suits the proposed logic.

Bool result

if 
  test() // cleo_call @test 0
then
  // true
else
  // false
end

function test(): bool
  if x 
  then
    return true // cleo_return 0
  else
    return false // not cleo_return 0
  end
end

note that bool return type does not require a variable to store the result

IF and SET

if 
  0@ = test() // cleo_call @test 0 0@
then
  // 0@ is 1
else
  // 0@ is not set
end

function test(): int
  if x
  then
    return 1 // cleo_return 1 1
  else
    return false // not cleo_return 1 0  /// last zero is just a placeholder to match the function signature. CLEO does NOT set result to 0
  end
end

Currently there is no such thing as bool type. Will it be reserved for condition result only? It might be good idea do declare if function sets condition state or else it will be true by default. Then condition result is just one of the args in return call statement.
Hm, doesn't it just boils down to treating first return param as condition result? Where value different than 0 is considered as true?
That seems reasonable. Cuts extra complications.

`func test() : (int, string, float)
(...)
return(true, "result", 3.0)
end

test() // allowed, discard return

if test() // allowed, check first return <> 0

if (0@, 1@, 2@) = test() // allowed, first as condition result

if (0@, 1@) = test() // error, not all return values used

if (_, 0@, _) = test() // allowed, first returned value as condition (as usual). Store just second argument`

Idea to consider is to allow '_' in return calls like return(false, _, _, _, -1)

I see one problem with // 0@ is not set example:
0@ = true (...) 0@ = isThingEnabled()

x87 commented

Not sure we need to overengineer it. Many functions are just calculations and they don't need to work with condition at all.

As you mentioned, there is no bool type which is correct. It means you can't store a result of a bool function into a variable. It only works as a condition.

0@ = true
if isThingEnabled()
then
 0@ = true // or 1
else
 0@ = false // or 0
end

Many functions are just calculations and they don't need to work with condition at all.

Then it simply just returns single argument (for example float). If used in condition statement then in typical fashion return[0] is tested for <> 0.

Introducing Boolean type will make everyone question why it is not possible to use it for var declarations or as input argument type.

Some corner cases:
func test() : (int, int) // 2 return values
func test() : (bool, int) // 1 return value now? Bit confusing
func test() : (int, bool) // hm, what now?
func test() : (bool, int, bool) // ???

Over engineering is when you have multiple rules to describe simple thing.

I propose single rule: condition result of function call is return param[0] <> 0, true if no returns

x87 commented

My idea was to use bool in functions that return no values. You can't mix bool with other types.

x87 commented

I propose single rule: condition result of function call is return param[0] <> 0, true if no returns

how do you express this with opcodes?

What do you mean with opcodes?
I posted examples above.

func test() : (int, string, float)
   (...)
   return(true, "result", 3.0)
end

test() // allowed, discard return all return values

if test() // check first return <> 0, discard all return values
   (...)
end

if (0@, 1@, 2@) = test() // check first return <> 0
   (...)
end

Mentioned _ is just next feature proposition inspired on what recent C++ language received.
It is possible to store multiple returns in similar fashion you proposed, where _ is often used as 'ignored' param.

Maybe it should be keyword 'null' instead. Now it makes more sense to call function with null as some params.
Currently 0AB1 accepts providing less parameters than expected, they get default value 0. That why I was complaining about default legacy mode in main.scm

x87 commented

What do you mean with opcodes? I posted examples above.

func test() : (int, string, float)
   (...)
   return(true, "result", 3.0)
end

test() // allowed, discard return all return values

if test() // check first return <> 0, discard all return values
   (...)
end

if (0@, 1@, 2@) = test() // check first return <> 0
   (...)
end

Rewrite this example using opcodes only please. As if you just decompiled the script.

:test
   (...)
   0AB2: cleo_return args 3 true "result" 3.0 // set CLEO condition result based on arg[0]

0AB1: @test args 0

if
   0AB1: @test args 0
then
   (...)
end

if  0AB1: @test args 0 result 0@ 1@ 2@
   (...)
end
x87 commented
  1. Your script will crash on the line 0AB1: @test args 0 0AB1 should have enough variables to match 0AB2.
  2. You can't change the behavior of 0AB2 because it breaks the existing scripts.

Imagine there is a script:

{$CLEO .cs}
0000:
wait 1000
if
    0AB1: @test args 0 0@ 1@ 2@
then
    0ACE: show_formatted_text_box "Yes"
else
    0ACE: show_formatted_text_box "No"
end

0A93: terminate_this_custom_script


:test
059A:  return_false
0AB2: cleo_return args 3 1 2 3

Today, it shows the message "No", because the condition result was modified by a conditional opcode 059A (which is expected and fits the language). 0AB2 does not modify the result. With your proposal the behavior changes and it will now display "Yes".

Can you address these two concerns in your script (both high-level and low-level)?

  1. I propose to add support of discarding all return values of 0AB2. So there should be accepted scenarios where all parameters are used, or none. If any other prams count is specified then it should result in error message. I think currently in case of mismatch 0AB2 just consumes following opcodes in the script instead.

  2. Yes it would need change/update in 0AB2 condition result behaviour. It was fixed only recently, so I don't know if anybody ever used it.
    Anyway, nobody says function return have to be based on 0AB2. Recently there was also that idea to redesign return keyword into universal fit all cases function. New return opcode could do that, manage condition result and return values, plus work with GOSUB commands too.

x87 commented

I propose to add support of discarding all return values of 0AB2. So there should be accepted scenarios where all parameters are used, or none.

0AB2: cleo_return 0 does it already. You can return all or nothing.

Yes it would need change/update in 0AB2 condition result behaviour. It was fixed only recently, so I don't know if anybody ever used it.

It was fixed for a scenario with multiple conditions. A single condition (see my example) has been used for years.

:test
   (...)
0AB2: cleo_return args 4 1 2 3 4

cleo_call @test args 0 // this is not possible now

Yep, legacy behaviour of 0AB2 is untouchable then.

x87 commented

UPDATE 11/13/2023

Make two separate opcodes to allow for true/false return without arguments.

CLEO_RETURN_FALSE - 0 params, exits current function, ignores all caller's variables, sets the cond result to false
CLEO_RETURN_WITH - 0 or more params, exits current function, must match all caller's variables, sets the cond result to true

retf // CLEO_RETURN_FALSE
retw true // CLEO_RETURN_WITH 1
retw 1 // CLEO_RETURN_WITH 1
retw 0 // CLEO_RETURN_WITH 0 /// this is a true condition!
retw 1 2 3 // CLEO_RETURN_WITH 1 2 3


My proposal is to add a new cleo_return. We can certainly use 8AB2 but it goes against the language design, so a new command could be better.

CLEO_RETURN_WITH - changes the condition result and returns values

Examples

CLEO_RETURN_WITH                    // when used with no arguments sets the condition result to false
CLEO_RETURN_WITH 1                  // with arguments the condition result is true, output variable is set to 1
CLEO_RETURN_WITH 1 2 3              // condition result is true, output variables are set to 1 2 3

These examples are based on the assumption that we could omit the nResults parameter and figure it out dynamically.

Before writing the result, CLEO_RETURN_WITH checks if the calling code has the variables.

scmFunc->Return(thread);
-if (nRetParams) SetScriptParams(thread, nRetParams);
+if (nRetParams && (*thread->GetBytePointer())) SetScriptParams(thread, nRetParams);

it solves the case when CLEO_RETURN_WITH 1 is used as a pure boolean call:

if x()
then
(...)
end

:x
CLEO_RETURN_WITH TRUE
end

Then we can use the following syntax with this proposal:


return false // CLEO_RETURN_WITH
return true // CLEO_RETURN_WITH 1
return 1 // CLEO_RETURN_WITH 1
return 0 // CLEO_RETURN_WITH 0  /// this is a true condition!
return (1, 2, 3) // CLEO_RETURN_WITH 1 2 3
x87 commented

My idea is, that a function either returns something and the condition result is true, or returns nothing and the condition result is false. There is no case, when you need to return something and set the condition to false.

Sounds reasonable.
I have some functions that return both condition result false and values, like obtaining entity where for failed case returned handle is -1.
I guess with new return it would be possible to assign error fallback value before function call, so it just won't be updated if function fails.

x87 commented

Pure Functions

  • It should be impossible to use global variables in function body if this function is located in a headless script (a CLEO script, a module). Function should rely only on input arguments
  • It should be impossible to use labels outside of function body (e.g. for jump or gosub). Functions may call other functions in the same script.
    • functions can define scoped labels visible within the function body and reference them; these labels are not visible to the outside code
function f()
  gosub @sub // OK
  return true

  :sub
  return
end

gosub @sub // ERROR

Are these rules too strict?

Forbidding usage of global variables seems too strict. If you really want to force people stop messing global variables via .cs, maybe Sanny should require some macro in script to enable "globals write mode".

External labels are useful in case of accessing hex blocks. Redirecting program flow to outside labels should perhaps be forbidden, as local variables declared by the function will be allocated but their declarations will not be accessible outside.

Calling other functions (0AB1) declared outside function body should be possible.

Hiding local labels outside sounds to be great feature. Keeping autocomplete list clear and preventing bugs when copy-pasting the code.

I agree that a function should not be dependent on global variables which might vary in custom mains. A good compromise would be to restrict global variables but allow aDMA, so the varspace is available for stuff like VarspaceSize = &3 and global opcodes can be used for evaluating and manipulating memory.

Yep I keep forgetting about dynamic allocation of new globals just by using new variable name. This is problematic case. In all other scenarios (well known global variables like $PLAYER_CHAR or defined with Alloc) this will make functions inferior to regular cleo_call.

x87 commented

If you need a global variable in your function, pass it as an input from calling code.

function getPlayerMoney(player: Player): int
 int money = Player.Money(player)
 return money
end

getPlayerMoney($PLAYER_CHAR)

How to make function updating global variable?

x87 commented

Return a value and update global variable in the main scope.

Makes no sense for functions like "initialize" or "load project from file"

x87 commented

can you show an example of such code? why would you need global variables in it?

When developing main script or MPACK it is natural you wish to work with globals.
Yes I have multiple instances of functions like that in my project.

x87 commented

right, usage of global variables makes perfect sense in SCM. I was thinking about CLEO scripts. Maybe we can specify that global variables can not be used in functions inside CLEO scripts.

Example with $PLAYER_CHAR is still valid for cs. If you chain 20 functions then each of them will need to carry player argument just so it can pass it further.

I see the problem of using module compiled with one set of globals on main with different globals. Question is is it worth to totally cripple functions just because somebody can use it incorrectly.

If somebody want to create mode independent module he still is able to do it even with globals support enabled.

Assume global support is forbidden. How are you going to enforce it? If gosub and/or cleo_call inside functions will be enabled, will be contents of these analysed for globals usage too? Furthermode: what about jumps/gosubs/cleo_calls inside that nested calls?
I'm not trying to walk around the limitations, just stating it might be difficult to analyse and perhaps needs some additional restrictions, shrinking functions usage even more.

x87 commented

UPDATE 12/20/2023

Make two separate opcodes to allow for true/false return without arguments.

CLEO_RETURN_FAIL - 0 params, exits current function, ignores all caller's variables, sets the cond result to false
CLEO_RETURN_WITH - 1 or more params, exits current function, must match all caller's variables, sets the cond result to the first argument

return false // CLEO_RETURN_WITH false
return true // CLEO_RETURN_WITH true
return true 1 // CLEO_RETURN_WITH true 1
return false 1 // CLEO_RETURN_WITH false 1
end // CLEO_RETURN_FAIL

x87 commented

UPDATE 4/3/2024

The following section is no longer relevant. Since 4.0.0 beta.7 function declarations are hoisted (available anywhere in the current scope).

Forward declarations

Since Sanny's compiler is one-pass, functions must be declared before usage. It may be solved in the future to provide "hoisting" of functions regardless of their position in source code.

Functions whose implementation precedes any call don't need a declaration:

function foo
end

foo // compiles

If, however, function implementation is located later in the code, it will produce a compilation error, as the function name can not be resolved.

foo // error, foo is not defined

function foo
end

Having all function defined before their first usage is not convenient as it clutters main logic in the start of the file. In case of two mutually recursive functions it's not even possible to arrange them in a way their body always precedes the first call:

function foo
bar // error, bar is not defined
end

function bar
foo
end

Hence, the source file must have a way to declare functions upfront to compile the call and validate input and output parameters properly.

Syntax

define function <name>(<input types>):<output types>
  • forward declaration is an ambient construction, it does not produce any instructions
  • forward declaration starts with the word DEFINE. Same word is used to declare elements of SCM header
  • Name must be a valid identifier
  • Input arguments may omit names and only list types (float, float, float)
  • Forward declaration must have the same number of input and output parameters as the actual implementation. Types of parameters must match as well
  • Each function may have only one forward declaration

Examples

define function foo
define function setPos(float, float, z: float)
define function bar(int, float): int

Oh, why making it so complicated? How it works with labels now? Labels are available everywhere.

x87 commented

we need to know function signature, so we can validate input and output arguments

And how is it made for labels? Labels also throw compilation error if not declared label is used. Is it really one pass?

Sounds like definitions are solving corner case scenario, so maybe make them optional?

x87 commented

No, definitions are there for good. I plan to expand them to cover external functions like this:

define function setPos<cdecl, 0xABCDEF>(float, float, float)

setPos(1, 2, 3) -> 0AA5: call_function 0xABCDEF num_params 3 pop 3 func_params 1 2 3

Also, declarations are optional if function defined before the first call:

function setPos(x: float, y: float, z: float)
...
end

setPos(1, 2, 3) // works

So Pascal style arguments definition was chosen?

int a
function foo(b: int)

Seems bit inconsistent
Do we even use var blocks after new way of declaration was provided?
That's also one more character in each argument declaration (:) that actually brings nothing beside more visual noise.

How about default parameter values, like:
function(int type = 1)

x87 commented

How about default parameter values, like: function(int type = 1)

I was thinking about it, probably in some future update

x87 commented

Foreign Functions

Declaring and calling cdecl function at address 0x400000

define function Sum<cdecl,0x400000>(int, int): int

int result = Sum(10, 20)
// 0AA7: call_function_return {address} 0x400000 {numParams} 2 {pop} 2 {funcParams} 20 10 {funcRet} result

Declaring and calling stdcall function at variable address

define function Sum<stdcall>(float, float): float

var 0@: Sum = 0x400000
float result = 0@(10.0, 20.0)
// 0AA7: call_function_return {address} 0@ {numParams} 2 {pop} 0 {funcParams} 20 10 {funcRet} result

Declaring and calling thiscall function (class method) at address 0x400000

define function Clone<thiscall, 0x400000>(self: int, arg: int): int
const struct = 0xDEADBEEF

int result = Clone(struct, 5)
// 0AA8: call_method_return {address} 0x400000 {struct} 0xDEADBEEF {numParams} 1 {pop} 0 {funcParams} 5 {funcRet} result

Declaring and calling thiscall function (class method) at variable address

define function Clone<thiscall>(self: int, arg: int): int
Clone fn = 0x400000
const struct = 0xDEADBEEF

int result = fn(struct, "full")
// 0AA8: call_method_return {address} 0@ {struct} 0xDEADBEEF {numParams} 1 {pop} 0 {funcParams} "full" {funcRet} result
x87 commented

Final Proposal on Return Semantics

NOTE: this only concerns return syntax in Sanny 4. For more exotic cases that don't fit into the rules below, opcode 2002 can be used directly.

update 3/20/2024: bool replaced with logical
update 3/21/2024: added optional modifier for return type(s)

  1. Functions can mark their return type as optional and use naked return to bail out immediately while returning 0 values
  2. There is special logical return type (see discussion below). This result can only be validated in IF..THEN and can not be stored in a variable. Logical function returns true or false.
  3. Functions that define return type(s) must return same number of values. logical function must return a condition flag. Optional functions may return nothing.
  4. When you return with some value(s), it is always a true condition for IF..THEN. Empty return is always a false condition. Value returned from a logical function defines the condition.
function no_ret_ok
    return
end

function no_ret_error
//    return 1 // error, should not return a value
end

function ret_logical_ok: logical
    return 1
    return 0
    return 0@ == 0
end

function ret_logical_error: logical
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_1_ok: int
    return 0
end

function ret_1_error: int
//    return // error, should return 1 value
//    return 1 2 // error, should return 1 value
end

function ret_2_ok: int, int
    return 0 0
end

function ret_2_error: int, int
//    return      // error, should return 2 integer values
//    return 1    // error, should return 2 integer values
end

function opt_1_ok: optional int
    return
    return 0
end

function opt_1_error: optional int
//    return 1 2 // error, should return 1 integer value
end

function opt_2_ok: optional int, int
    return
    return 1 2
end

function opt_2_error: optional int, int
//    return 1 // error, should return 2 integer values
//    return 1 2 3 // error, should return 2 integer values
end



if and
    ret_logical_ok()
    0@ = ret_1_ok()
    0@, 1@ = ret_2_ok()
    0@ = opt_1_ok()
    0@, 1@ = opt_2_ok()
then
    // ok
else
    // error
end

I think condition result should be true by default for all functions. There was success, until explicitly user defined otherwise. Automatic condition result based on return usage will be surprising and difficult to remember.
There is still option to use not return or syntax like return(false) 0@ 1@

This makes sense. All return types should set condition result to true. Simple as that.
not return would be coherent with current language philosophy (perhaps it even works already?)
It takes away the option to use value as condition result, but it was like that since ever. Simple if else block can solve it.

x87 commented

If your function is outside of if...then, you should not care about condition result. Your next IF will reset the flag anyway.

If it is part of IF condition, then you have to care what it returns.

x87 commented

It takes away the option to use value as condition result, but it was like that since ever.

Example?

Currently cleo_return_with 0@ {args} 1@ 2@ where 0@ is condition result value

if
   0@ == 4
then
   return 5@ // return the handle
else
   not return -1 // condition result false, invalid handle
end
x87 commented
if
   0@ == 4
then
   return 5@ // return the handle
else
   return // condition result false
end

Then you call it as usual

If
0@ = fn
Then
// got handle in 0@
Else
// do something else
End

You are trying to bend the language to be corresponding to other languages, but in only introduces ambiguity and makes understanding concept of condition result more difficult for both newcomers and for people who already have programming knowledge as well.

Placing bool in function declaration like function foo(): bool, int, int creates misunderstandings like:

  • bool, a commonly known type, seems to be supported by language and can be used in other places
  • return args signatures do not match at declaration and caller side (int is second, or the first?)
  • functions do not return/modify condition result until specified in declaration
  • it takes away opportunity to use bool as type in future
  • return false becomes highly ambiguous and requires function declaration check
  • Calling return false in function returning int will cause counter-intuitive result

Functions like

function f1(): bool
function f2(): int

appears to be interchangeable, but are totally different at call site.

I think return should always expect correct amount of arguments, as it is easy to leave naked return by mistake and it will cause difficult to debug problems (target variables keeps previous state)
for this scenario there is cleo_return_fail opcode. Maybe there should be alias return_fail for it then

We had series of talks about that topic already and finally settled on having mandatory first arg as return result. As you said in most functions condition result is not important so it is bit inconvenient to have to type it every time.
I'm strongly against putting condition result and return args on the same shelf in both function declaration as return call.

Maybe better solution would be replace bool in declaration with condition keyword. Then threat is as other args, also at return call. In this case it would return syntax look exactly like it is mandatory now.

x87 commented

bool can not be mixed/combined with other types. you can't declare a variable of this type. the only place where it's currently allowed is in a function that returns a new condition flag:

function foo: bool
  return true
end

you can only use foo as a standalone statement or inside IF..THEN:

if foo()
then
...
end

foo()

0@ = foo() // compile error

Ok, then I don't like the idea of automatic condition result deduction:

function foo: int
return true // condition result true
return // condition result false
return false // condition result TRUE

C/Java languages also do not allow you to exit from function expecting return argument with naked return

Maybe mixing bool with args should be a thing. If it is important to you that function sets condition result then make it obligatory like:

function foo: bool, int

so then there is no way you can forget it when calling return (this again creates discrepancy between declaration/call args format)

x87 commented

bool, a commonly known type, seems to be supported by language and can be used in other places

no, see above

return args signatures do not match at declaration and caller side (int is second, or the first?)

not possible, see above

functions do not return/modify condition result until specified in declaration

any function already may modify condition result, as 2003 (function's end) sets the flag to false.

it takes away opportunity to use bool as type in future

for bool to ever become a real value (another int?) all conditional opcodes should return explicit 0 or 1 and mechanic of IF/JF should change completely. it will be different language then.

return false becomes highly ambiguous and requires function declaration check

correct. but this is also true with any other types, as today compiler does not validate them.

Calling return false in function returning int will cause counter-intuitive result

it's already counter-intuitive. function returning int should return 0 or 1.

Functions like function f1(): bool function f2(): int appears to be interchangeable

no, see above

I think return should always expect correct amount of arguments, as it is easy to leave naked return by mistake and it will cause difficult to debug problems (target variables keeps previous state).
for this scenario there is cleo_return_fail opcode. Maybe there should be alias return_fail for it then

this is possible, but you can also return wrong values, or mix up the order (y x z), etc.

return and cleo_return_fail currently have exact same behavior. return_fail looks more explicit, but adds another keyword to learn.

with single (naked) return you just need to learn once, that it is always "exit with nothing" and it's never a success.

no, see above

Keyword in that whole list was creates misunderstandings like:

x87 commented

Ok, then I don't like the idea of automatic condition result deduction:

function foo: int
return true // condition result true
return // condition result false
return false // condition result TRUE

C/Java languages also do not allow you to exit from function expecting return argument with naked return

Maybe mixing bool with args should be a thing. If it is important to you that function sets condition result then make it obligatory like:

function foo: bool, int

so then there is no way you can forget it when calling return (this again creates discrepancy between declaration/call args format)

You keep thinking of bool as another form of integer value, but in fact nothing in runtime supports it. bool here is an indicator of success/failure and only have meaning within IF..THEN check. You can't store result of conditional opcode anywhere except for explicit

if
 condition
then
  result =1
else
  result = 0
end

but then result is an int and condition is bool and they are not interchangeable.

think of return as a form of Option type (https://en.cppreference.com/w/cpp/utility/optional, https://doc.rust-lang.org/std/option). then if you return something, even 0, it's success. otherwise it's failure.

You keep thinking of bool as another form of integer value

I know the concept of condition result very well. Problem is that anyone who has any programming experience will think about it exactly like that.
I know std::optional. Still even when function returns optional type you are not allowed to use naked return.

You keep explaining the bool is meant to represent condition result, then why not name it "condition result". Problem with it is there is no straight analogy to other languages, so it would be better to not introduce confusion by calling it the same way as base type available in most languages.
IF SET was not meaningful at all, but at least it was not giving wrong clues.

x87 commented

:bool type was a getaway from no-type functions:

function ret
    return
//    return true // error as function returns nothing <---------- allowed in beta.6
//    return false // error as function returns nothing <---------- allowed in beta.6
//    return 0@ == 0 // error as function returns nothing <---------- allowed in beta.6
//    return 0@ // error as function returns nothing <---------- allowed in beta.6
//    return 1 2 // error as function returns nothing
end

I can't think of a better name other than :bool that is not completely out of the place. Ideas?

I can't think of a better name other than :bool that is not completely out of the place. Ideas?

It seems like :bool describes in fact property o of the function causing it to be conditional
We are already polluted by knowledge it is called condition result internally.
condition, verdict, outcome, decider, decision?
in terms of actual technical name it should perhaps be 'accumulator`, but nowadays no one will even know what that means.

x87 commented

image

back to the roots
https://en.wikipedia.org/wiki/Boolean_data_type#Fortran

function foo: logical
  return true
end

branches to one of three locations WTF

How about naked return then?
Beside already mentioned possibility of human errors there is aspect of people learning by editing existing scripts.
Such hidden behavior of setting condition result will be impossible to figure out just by having the code.

x87 commented

I don't see any issue with naked return (or rather any return) setting condition result. Even now, if you "forget" to return values from a function you hit 2003 that sets the result to false and skips the variables.

Also there are many opcodes that set condition flag without you knowing it. e.g.

from SA SCM

Task.PlayAnimNonInterruptable($scplayer, "PARA_STEERR", "PARACHUTE", 1.0, True, False, False, True, -2)
Object.PlayAnim(17@, "PARA_STEERR_O", "PARACHUTE", 1.0, True, True) // <-------- condition state was changed
15@ = 2

Condition result should only be considered in context of current IF statement. Then you should carefully review what opcodes and functions you put in there.


I also don't see any problem with naked return not modifying output variables. I think returning default or garbage values are equally bad. When you call a function that may fail, you always have to inspect the returned value:

0@ = find_car()
// 0@ can be valid handle or -1
// how do you know?

even if any function must return some value, garbage or not, calling code still have to check the result. In this case, why bother returning garbage zeros or -1, if you discard those values anyway.


What I think a real flaw in the current implementation is that there is no indication of a fact the function MAY fail. Looking at the function declaration:

function find_car: int

how do you know, whether it always returns a valid handle, or may return nothing (or 0, or -1)? Without looking at the function's code you wouldn't know. In this sense, a naked return and return -1 are equally bad.

x87 commented

to address the last issue, in addition to logical we can introduce optional type to highlight the fact that function may not return any value. e.g.

function find_car: optional int
function get_colors: optional int, int

an optional type allows a naked return in functions that would otherwise had to return some values.

โ˜๏ธ assumes optional is applied to all returned types.

we could also expand this idea to mark any (trailing) type as optional: ignore ๐Ÿ‘‡ . there is no way for caller to understand how many real values have been returned. so, either all or nothing.

function get_colors: int, int // must return 2 values, naked return not allowed
function get_colors: int, optional int // must return 1 value, and 1 optional value (return 1 2 or return 1)
function get_colors: optional int, optional int // can return 0, 1 or 2 values

function get_colors: optional int, int // error, required type can not follow optional type

Woah returning only few of the args? I do not think I ever had need for something like that. This is getting over complicated now.
I would rather have optional meaning that the function can be exited without any args (naked return or lack of it), otherwise compilation error occurs.

x87 commented

Yes, all or nothing. I crossed out the other part, that suggested that only some params can be returned.

so, optional is required to exit function with naked return or via function's end
How then exit from logical function returning some args and set the result to false?

x87 commented

logical doesn't return any values, only sets the condition flag.
returning a value from a non-optional function is always a true condition
naked return is always a false condition

returning a value AND setting result to false is not supported with return syntax. use 2002 (cleo_return_with) in this case.

I thought it will be able to mix logical and args. Ok then.

x87 commented

Proposal updated

How about allowing using logical and return values in function declaration, like:
function foo() : logical, int, int
Then each return in the function would be translated to cleo_return_with. Using expression in return line would not be possible in that scenario.

Logical keyword would be allowed only as first argument (but technically it wouldn't be problem to make it any).

x87 commented

future extension: variable arguments

function foo(a: int, b: int, ...args: int[])
// args is defined here as an int array with size of 30 (because other 2 available slots occupied by a and b)
end


foo(1, 2) // args can be empty
foo(5, 6, 1, 2, 3)

Nice, but how to know provided args count inside the function?

x87 commented

Nice, but how to know provided args count inside the function?

up to you.
you can pass arg count as a separate argument, or check for a zero value

Hm information like this can possibly be accessed in future via VirtualVariables feature.

x87 commented

future extension: any type

function foo(a: any)
 // a has no type here
 string_format buf "%d" a
 string_format buf "%s" a
 string_format buf "%f" a
end

foo(5)
foo("test")
foo(1.0)
x87 commented

RFC Update:

  • use CLEO_RETURN 0 instead of RETURN for function's end to have better support where CLEO5 is not available

So for logical functions without return the last set condition result will be passed through? Seems like feature.