Modules RFC

Question

Modules RFC

x87 opened this issue a year ago · 12 comments

x87 commented a year ago

this is a complex feature that requires joint support from both the CLEO runtime & compilers.
See also Functions RFC

Idea

Be able to export SCM functions from CLEO scripts and import them in other scripts.

Goals

Have modular code, library code can be updated independently from the main code
third-party utils can be written in edit modes different from the main script or even in different languages
DRY (Don't repeat yourself)
Improve development velocity
help with migrating from legacy modes to the new ones (custom->SBL/SRC)

Design

Export

CLEO scripts could export pure SCM functions. In the sense of this document a pure function is the one that only depends on its inputs. Functions are exported using the export keyword:

:fun1
...
:fun2
...
:fun3
...

export @fun1
export @fun2
export @fun3

Export line can be anywhere in the script. The label must mark an SCM function start. Duplicate entries are allowed:

export @fun1
export @fun1

By default exported functions get named after the label name (e.g. fun1, fun2). This name is important as another script uses it for the import. To give the export a custom name, as keyword can be used:

export @fun1 as matrix_mult

It should not be possible to name different functions with the same export name:

export @fun1 as matrix_mult
export @fun2 as matrix_mult

If @fun1 and @fun2 mark different locations, it is a hard compilation error. Otherwise import of 'matrix_mult' would be ambiguous.

Import

A script can import functions using the import keyword.

import fun1 from "scripts.s"

cleo_call fun1 3 1 2 3

This is a syntactic sugar for:

cleo_call "fun1@scripts.s"  3 1 2 3

@ separates the function name that is invoked and the file name.

A label used in the import statement can not be used in a script as a regular label and vice versa. Otherwise the call destination is ambiguous.

Note that this is not relevant if the function syntax is implemented (see below)

import fun1 from "scripts.s"

cleo_call fun1 3 1 2 3

:fun1  // error because :fun1 is an external label

:fun1
... 
import fun1 from "scripts.s"  // error because :fun1 is a local label

cleo_call fun1 3 1 2 3

Import can also use as keyword to avoid name collisions:

import @fun1 as matrix_mult from "scripts.s"

Duplicate imports are allowed:

import fun1 from "scripts.s"
import fun1 from "scripts.s"

Duplicate aliases are not allowed:

import fun1 as fun1 from "scripts.s"
import fun2 as fun1 from "scripts.s" // error

Import names are case-insensitive. import FUN1 and import fun1 are equivalent.
Import statement can be used anywhere before cleo_call, preferably at the top of the script.

Path separator can be either \\ or /. The runtime normalizes them.

import fun1 from "folder\\scripts.s"
import fun1 from "folder/scripts.s"

Because a single \ serves as an escape character, double \\ is required.

CLEO Library support

Changes have to be made to cleo_call and cleo_return commands.

If the first argument to cleo_call is a string, CLEO treats as an import path (NEW).

If a string is given, CLEO resolves the path similarly to 0A92, 0A94. See the post down below for the path resolution algorithm.

Once the path is resolved and the file is found, it gets loaded into the game process and the pointer is obtained (P). If the file is already loaded, the P is returned immediately.

Then cleo_call saves all current lvars, gosub stack and base IP (NEW).

Then the base IP of the current script is set to P (NEW).

Then as with any SCM function local variables are reset and input arguments are passed.

Then CLEO finds the export table. The runtime uses jump offsets to find the section with id 01 (see Future extensions) and scans the memory after it. Each "row" is a pair of a null-terminated string and a 32-bit offset. When a match between the requested function name and an exported name is found, current ip of the script is set to the found offset. Search is case-insensitive.

Execution continues inside the loaded file, all offsets work relative to P.

When the script encounters a cleo_return, results are stored in shared ScriptParams.

Then base ip is restored alongside other things (NEW).

Then stored values get copied into the host script variables.

Compiler support

Exported function should be discoverable by the runtime. When fun@scripts.s gets loaded, there must be a way to find the offset for 'fun' inside the scripts.s.

To achieve this, the compiler constructs an export table. It is trivially located at the start of the script and routed away from the normal execution with a jump instruction. This technique is similar to what a main.scm header uses.

source code for

:fun
...
:fun2
...

export @fun
export @fun2

transformed into this by the compiler:

0002: @after_table

hex
 "fun" 00 @fun 00 00
 "fun2" 00 @fun2 00 00
end

:after_table

:fun
...
:fun2

See "Compatibility with Function syntax" down below for the structure of the export table. Last two 00 are reserved for input and output.

As with main.scm header, it gets constructed after the full pass on the code, so the compiler knows all functions that need to be exported.

Disassembler support

No extra work is planned. Export table will be present as a regular jump and hex..end block (Extra info is needed). A stretch goal would be to reconstruct the export table to the initial form.

Possible limitations

length of the import argument is limited to 255. Very long file names or function names will result in a compilation error.
there is no limit on a number of function to export from a single file

Compatibility with future extensions

Header extension in CLEO scripts could be used to store more than just an export table. Need to provide a clear distinction between different sections. It could be a service byte after the jump instruction:

02 00 01 xx xx xx xx 01

where the last 01 is the marker for an export table.

The runtime logic should then jump +8, not +7.

Custom Headers proposal

Stretch goal: https://gist.github.com/x87/5d0bd6bdd0062380628eb35103894e1b

IDE Support (stretch goal)

IDE should scan the export table and offer a list of available functions for autocomplete, and also display function signature if function syntax is implemented

Answer 1 · 2023-09-14T14:47:29.000Z

Compatibility with Function syntax

#263

Export

export function fun1(x: int, y: int, z: int)
//
end

export function fun2(): int
//
end

Import Opaque Function

import fun1, fun2 from "scripts.s"

/// implicitly declares functions
/// function fun1(...): ...
/// function fun2(...): ...

fun1(1, 2, 3) // transforms into a cleo_call 3 1 2 3
int x = fun2() // transforms into a cleo_call 0 x

Note that combining imports in one statement (import fun1, fun2) is currently not supported. Each import has to be on its own line.

Import with Function Declaration (not supported)

import function fun1(a: int, b: int, c: int) from "scripts.s"

/// explicitly declares a function
/// function fun1(a: int, b: int, c: int)

fun1(1, 2, 3) // transforms into a cleo_call 3 1 2 3

Export table

Export table should store function signatures (number of input and output params and their types). Types are encoded as a single byte using the Sanny Builder types (in decimal):
01 - int
02 - float
03 - string, short string
04 - longstring
20+ - class ids

Each line in the export table contains:

function name, 00, offset, N inputs, input 1 type, input 2 type, ... input N type, N outputs, output 1 type, output 2 type, ... output N type, flags (1 byte), address (4 bytes)

0002: @after_table

hex
 "fun1" 00 @fun1 03 01 01 01 00 00 00000000 // 3 input args: i i i, 0 output args, 00 flags,  00000000 - address
 "fun2" 00 @fun2 00 01 01 02 AABBCCDD // 0 input args, 1 output arg 02 flags,  AABBCCDD - address
end

:after_table

Reserve space for possible extensions?

Answer 2 · 2023-09-14T20:24:51.000Z

Imports should perhaps also support 'as' feature. It will solve problem with name collisions in multiple modules and local code, as well as give name to imitate namespaces by giving imported functions prefix.

Answer 3 · 2023-09-15T23:28:21.000Z

I do not like fact that addressing module with name depends on current working directory. As discussed before regarding other topics, currently working directory is global property shared between scripts.

If I use import @fun1 from "scripts.s" then I expect to @fun1 always leads to the imported module. Currently intention is to encode module and export name as string param for cleo_call.

This will lead to problem where calling:
`
0A99: set_current_directory 0
fun1()

0A99: set_current_directory 1
fun1()
`
will fail in one case, or run different module if it happens to exist.

I was thinking about it and solution might be to include directory in path itself, like:
"0:\cleo\script.s"
"1:\MPACK6"
"2:\script.s" // cleo dir?

This would also solve problems in other opcodes receiving file paths if supported everywhere.

Answer 4 · 2023-09-15T23:47:11.000Z

0A99: set_current_directory 0
fun1()

0A99: set_current_directory 1
fun1()

in the compiled code will look like:

0A99: set_current_directory 0
cleo_call "fun1@scripts.s" 

0A99: set_current_directory 1
cleo_call "fun1@scripts.s"

path resolution only happens during the first call, then CLEO remembers that "scripts.s" is associated with, for example, "D:\Games\SA\CLEO\scripts.s".
The second call does not lead to the new path resolution, and CLEO uses already loaded module. The second 0A99 plays no role there.

Answer 5 · 2023-09-15T23:53:41.000Z

So, you can not have two modules named "utils.s" in different locations?

Answer 6 · 2023-09-16T00:38:07.000Z

I think it's more of a runtime problem, not the compiler. In the script, there is only file name as it's given in the import statement.

import X from Y // this Y goes as is into all X calls -> cleo_call "X@Y"

What is your proposal on how to resolve Y?

Answer 7 · 2023-09-16T00:46:09.000Z

I gave solution for runtime.
Problem is that both compiler and running script are meant to localize same file, but of course game environment looks differently than development one.

What are solutions?
Force module's target location and store it in the module itself, so compiler can read it?
Create some kind of unique GUID for modules? Hard to do as these should be unique, but still same after alerting module's code.

Or something like that in Sanny

{$MODULE mod_scripts="include\scripts.s", "0:\cleo\scripts.s"}

import @fun1 from mod_scripts
import @fun2 from mod_scripts

Answer 8 · 2023-09-16T02:29:51.000Z

We can't enforce module path as it limits the usage. You should be able to use a module from any place where your script is located.

If there is a file called utils.s you can copy it to CLEO folder and import in any CS file using

import X from 'utils.s'

or copy to Documents\GTA San Andreas User Files\MPACK5 and using the same statement import module functions in scr.scm.

Also modules can import other modules, which means those files should be located in the same place.

With that being said, path resolution could work like this:

if the path is absolute, it gets resolved as is.

import X from "D:\Games\SA\utils.s"

This probably should be forbidden.

if the page is relative, it gets resolved relative to the current script's file (regardless of cwd).

Answer 9 · 2023-09-16T02:55:17.000Z

Path Resolution

When we encounter a 0AB1 with a string argument (module call), we need to determine the script directory. We can't rely on cwd as it could change in runtime.

If current script is not custom
- if this is not a mission pack
  - the directory is "game\data\scripts"
- if this is a mission pack
  - the directory is "Documents\GTA San Andreas User Files\MPACKx" where x is the mission pack id
if current script is custom
- the directory is the script's directory (see below for the meaning)

For custom scripts, the directory is stored on CCustomScript struct and is changed in two cases

on initial load, based on szFileName
during 0AB1, set to the module directory.

To illustrate, consider this example.

File my.cs is located in D:\Games\SA\CLEO. When CLEO loads this file, CCustomScript constructor sets the script's baseDir to D:\Games\SA\CLEO
Then there is a command cleo_call "fun1@utils\a\u.s".
CLEO loads module utils\a\u.s relative to the baseDir from the file D:\Games\SA\CLEO\utils\a\u.s.
Then CLEO stores the current baseDir on the new ScmFunction struct.
Then CLEO sets the current script's baseDir to D:\Games\SA\CLEO\utils\a
Then it runs the function.

Imagine there is another import inside u.s file, e.g. import from "extra\f.s". This import gets resolved relative to the current baseDir which is D:\Games\SA\CLEO\utils\a.
CLEO loads new module from file D:\Games\SA\CLEO\utils\a\extra\f.s.
Then CLEO stores the current baseDir on the new ScmFunction struct.
Then CLEO sets the current script's baseDir to D:\Games\SA\CLEO\utils\a\extra
Then it runs the function.

Function returns and the baseDir is restored to D:\Games\SA\CLEO\utils\a
Function returns and the baseDir is restored to D:\Games\SA\CLEO

We are now back in my.cs and code flow continues.

For SCM files (main.scm or mission packs) the algorithm is the same.

Path Normalization

Because paths can use both \ and / as separators, the runtime replaces all / with \.

Answer 10 · 2023-09-23T18:33:36.000Z

CLEO Implementation cleolibrary/CLEO4#101

Answer 11 · 2024-04-08T20:39:52.000Z

@MiranDMC We need an update in CLEO5. Export table is now 5-bytes longer for each function. I added a flags byte and 4 bytes for the address (used for static foreign functions). None of those are relevant to CLEO, so you just need to skip extra 5 bytes.

cleolibrary/CLEO5#121

Answer 12 · 2024-07-27T17:11:28.000Z

Added Sanny Builder documentation here https://docs.sannybuilder.com/language/import-export