sannybuilder/dev

Local Variable Space Extension & Virtual Variables RFC

x87 opened this issue ยท 20 comments

x87 commented

Goal

  • Increase variable limit in CLEO scripts for advanced scripting
  • Get easy access to some data that is normally available via Global variables or Memory addresses

Scope

  • Initial phase: GTA SA.
    • Eventually can be ported to III/VC
  • Does not affect functionality of main.scm scripts and missions

Glossary

  • LVI - local variable index. Index 0 through 31 in the static array pre-allocated for each script. 32 and 33 are built-in timers.

Design

  • Allow LVI beyond 33@
  • Make LVI a signed 16 bit integer (-32,768 to +32,767)
  • LVI 0-33 act like normal variables and store their data in the script struct
  • LVI 34-32,767 store their values in the static buffer allocated by the compiler in a custom CLEO header
    • compiler identifies the highest LVI and allocates just enough space for variables in the current script (similar to global varspace in main.scm)
  • Negative LVI are virtual variables, they point to a specific memory location
    • -1 is a pointer to ONMISSION variable
    • -2 is a pointer to current script struct
    • more virtual variables can be added if needed

Runtime support

CLEO should provide new implementation for

  • CRunningScript::CollectParameters
  • CRunningScript::StoreParameters
  • CRunningScript::GetPointerToLocalVariable
  • CRunningScript::GetPointerToLocalArrayElement

Compiler support

  • Compiler should preallocate local varspace in the custom header
  • Ranges check should be lifted
  • Virtual variables should be defined as constants (e.g. const ONMISSION = -1@)

Backward compatibility

  • Old scripts using standard LVI will produce expected results with the new version of CLEO.
  • Scripts using non-standard LVI will not produce expected results with older versions of CLEO 4.

A mockup of constants.txt that could support common global variables. Player_Actor in particular adds a lot of overhead for many scripts.

const
false=0
true=1
TIMERA=32@
TIMERB=33@
ONMISSION= -1@
SCRIPT_STRUCT= -2@
PLAYER_CHAR=0
PLAYER_ACTOR= -3@
PLAYER_GROUP= -4@
end
var
TIMERA: int
TIMERB: int
ONMISSION: int
SCRIPT_STRUCT: int
end

The following global variables are maintained at the start of each MAIN loop. They can be considered as always available, up to date within 250ms, and read only.

077E: get_active_interior_to $Active_Interior 
0652: $STAT_Unlocked_Cities_Number = integer_stat 181 
07D0: $Weekday = weekday 
09FB: $Current_Language = current_language 
0842: $Current_Town_Number = player $PLAYER_CHAR town_number 

Are all these new (positive indexed) variables considered as local scoped? I think there is need for 'global' variables as people keep using real globals despite warned many times to not.
Maybe similarly to missions last local variable index should be @64, and then all above will be this script's 'globals'.

What is motivation of "virtual variables" concept? Will these variables be writeable? I see no advantages of having SCRIPT_STRUCT as in most cases you need to add offset to the address before using it. PLAYER_ACTOR and PLAYER_GROUP are just some arbitrary variables set by main. Is there any benefit of hard coding these adresses in CLEO instead in Sanny mode?

How it will be implemented from technical side? New variables do not follow same address calculation method as regular locals in script struct. Do all opcodes have to be hooked, or there is way to somehow just hack "read param" function?

x87 commented

Are all these new (positive indexed) variables considered as local scoped?

I think variables in 34+ range are global scoped, i.e. they are the same in all functions. Think of it as an extra storage. Each function own 32 variables that should be enough for intermediate calculations.

Note that this extra space is exclusive to the current script and not shared between scripts.

What is motivation of "virtual variables" concept? Will these variables be writeable?

They are fully functional variables. E.g. this is possible

if -1@ == 0 // if not ONMISSION
then
  -1@ = 1 // set ONMISSION
end

this is to avoid extra memory reads/writes. You can use them in arrays too:

0@ = 5
int baseIp = -2@(0@,1i)  // read baseIp of current script

Maybe similarly to missions last local variable index should be @64, and then all above will be this script's 'globals'.

missions have 1024 variables.

there is way to somehow just hack "read param" function?

yes, this is what is needed. I mentioned it under Runtime support.

Good. cleo_calls provide local scope, then the @34+ will provide lacking 'global' scope of the current script.
I think I saw somewhere opcodes for 'cleo shared' values. Anyway, you can always pass memory pointer to current globals as parameter to spawned child script.

So, some of the virtual values will be writeable and some not, as modifying current script's struct ptr does not makes sense if the change will be permanent.
Array approach will only work when offset is 4 aligned. Can syntax with '&' solve problems in other cases?
These virtual variables should also be defined in Sanny's modes, as 2@ will be confusing to use.

x87 commented

These virtual variables should also be defined in Sanny's modes, as 2@ will be confusing to use.

Definitely. TIMERA and TIMERB work just like that. See @OrionSR 's example with constants.txt

Array approach will only work when offset is 4 aligned. Can syntax with '&' solve problems in other cases?

this was just a general example, we can see how to improve it. Maybe we can extend array support to allow for 1 and 2 byte values:

 0@ = 5
 -2@(0@,1i) // 5*4=20
0@ = 10
 -2@(0@,1w) // 10*2=20
0@ = 20
 -2@(0@,1b) // 20*1=20

That would be useful feature. BTW I think array definitions should support not providing array size just by not providing the number, like 2@(0@,i)

x87 commented

They can be considered as always available, up to date within 250ms, and read only.

077E: get_active_interior_to $Active_Interior
0652: $STAT_Unlocked_Cities_Number = integer_stat 181
07D0: $Weekday = weekday
09FB: $Current_Language = current_language
0842: $Current_Town_Number = player $PLAYER_CHAR town_number

Given that we have a room for 32000 virtual variables I can't see a reason why we should be shy about replacing some getter opcodes with direct memory access. Then instead of

 07D0: 0@ = weekday 
if 0@ > 3

you can do

if WEEKDAY > 3

where WEEKDAY is a virtual variable pointing to a memory address of a week day.

x87 commented

modifying current script's struct ptr does not makes sense if the change will be permanent.

good point ๐Ÿ‘

Struct pointers and other addresses are very useful for commands like:

0D37=WRITE_STRUCT_PARAM
0D38=READ_STRUCT_PARAM
0D4E=READ_STRUCT_OFFSET
0E28=WRITE_STRUCT_OFFSET
0EE2=READ_STRUCT_OFFSET_MULTI
0EE3=WRITE_STRUCT_OFFSET_MULTI

0E72=CREATE_LIST The new List class of commands look very useful. I'm expecting that I can share the handle/address of the lists between scripts.

I sure seems like some of what I had in mind for a virtual variable ought to be consider as a virtual constant instead. Something like an address returned from a pointer would be a variable in cleo, but the script would need to treat it as a constant. SB may want to highlight is as such.

const playerX = -20@
Would playerX[1] be -19@ or playerY?

If playerY, then how could I assigned a virtual variable to a label of a hex buffer, or memory allocation address? [rhet]

And in general, how are the virtual variables assigned, or given an address? Can they be assigned in the script? Could the assignment step through a pointer to the actual address?

x87 commented

And in general, how are the virtual variables assigned, or given an address?

they just exist in a script, like TIMERA and TIMERB. Runtime takes care of providing a valid value.

Think of virtual variables as a dynamic memory address. You don't need to know where main script keeps the ONMISSION status, you just use -1@ which points to the same address as the $ONMISSION variable.

-1@ = 1 // same as $ONMISSION = 1
-1@ = 0 // same as $ONMISSION = 0
-1@ <> 0 // same as $ONMISSION <> 0

Take -2@ as another example. It gives you the address of the current script. You can use it instead of 0A9F: 0@ = get_this_script_struct. As it was mentioned, this value is read-only so -2@ = 0 should be a NOP.

0@ = -2@ // same as 0A9F: 0@ = get_this_script_struct
-2@ = 0 // NOP? or error?

Since each virtual variable represents a unique address, indexing them can be problematic, so operations like -2@[0] or -2@(1@,1i) should be forbidden.

I made an example above with accessing current script struct fields using array notation. Now I think it could work differently:

wrong:

0@ = 5
int baseIp = -2@(0@,1i)  // read baseIp of current script

right:

int address = -2@ + 20
0A8D: baseIp = read_memory address size 4 virtual_protect 0  // read baseIp of current script

I really seems like we're trying to discuss two features as one; virtual constants and virtual variables. (cleo constants?)

if -1@ == 0 // always false, -1@ = 409*4+SCMoffset
then
  -1@ = 1 // now forbidden
end

There is a great deal of potential value to this proposal. I keep wrestling with different ideas but can't quite pin down a complete concept. I think it'll help organize my thoughts if I try to write things down, so forgive me for going off scope.

Constants - In order to accomplish the stated goals, cleo is going to need a lot of hardcoded addresses - straight up constants that can never change or nothing is going to work. A specific example are pointers to structs. One thing I appreciate about fastman92's codes is that he always reads the pointer, because struct addresses tend to change when limits are adjusted. It would be very helpful, especially for mobile players, if these constants were available in the form of the mangled labels used in IDA. No magic is required. There are lots of ways to implement constants. The constants would compile as numerical values.

Global Virtual Constants - Virtual, because SB is going to compile a negative variable instead of a numerical value, but otherwise these would act like constants. There are no commands for altering constants or comparing constants to constants. The scope is global as all scripts have access to identical information. The data passed to commands, functions and scripts would be in the form addresses.

Local Virtual Constants - Similar to the GVCs, but the information is specific to the local script. I would expect only a handful of variables would be needed for this purpose.

Local Virtual Variables - Like LVCs, limited in scope to the local script. But where a LVC would past the address of the start of the local struct, a LVV would pass the contents of the of the first field - pNext. which is conveniently 4 bytes in size. The size of the fields may complicate the use of some structs, but in some cases, array indexes make a lot of sense for virtual variables. It would be particularly amazing if cleo could index word, byte and bit arrays using virtual variables.

Global Virtual Variables - Similar to LVVs but any changes have a global scope. OnMission is a good example. All scripts are working with the same data.

User Defined Virtual Arrays - Extending the power of direct use and comparison of local variables to allocated memory and hex buffers. A pie in the sky idea - who needs hex buffers and allocated memory with all these extended local vars to work with. Still, it might be a good idea to reserve a block of vars for users and worry about implementation later.

Virtual Functions - My mind keeps wanting to connect virtual variables to functions - functions that can be passed to commands, scripts and other functions, and compared directly with other values. I suppose performing cleo functions is pretty much exactly what's on with the virtual values - so look to examples of functions in other tools for inspiration.

As far as I know, static structs don't have pointers, and dynamic structs pick their offsets when the game loads. An interesting exception is... CTheText(?), which will change locations when the language is changed - one example of a virtual global constant that isn't so constant.

There. I think that summarizes all the thought that have been ricocheting around inside my skull all day.

Introducing any kind of global variables leads to exactly same problem we have with current global variables. You never know who is going to use them.

The term Global Virtual Variable was intended to describe one of the possible uses of a virtual lvar - the negatives. Global describes the scope, not the type of variable.

But yes, all scripts would alter the same data for something like... HUD colors, for example. What would be bad would be if someone tried to use virtual lvars as general use script lvars, but there doesn't appear to be a good way to keep people from being stupid. If you've got 32000 extended lvars to work with, why would they be messing with virtual lvars.

An odd thought: const nul = -13@ Where nul is always 0, no matter what you write into it.
The idea is inspired by 2D commands - read 3D coords and discard the Z.
nul could be used for the Z return var.
nul would be more useful when varspace is limited, but doesn't seem to help in the present context.

x87 commented

Getting Game platform/Game Version is interesting too

It quickly turns into "return whatever". My concern is how these variables will be named? Will there be aliases by default, so you can use on_mission instead of '-1@'? We need to think about name collisions, both cleo stuff and possible user's named entities. Maybe prefix should be added? like GVV_Game_Title_Type?
All these variables should be documented somewhere, along with meaning and possible return values.

It might be helpful to include as much class/struct and field information as possible. For example:
CTheScripts::OnAMissionFlag ==> CTheScripts__OnAMissionFlag

Users can always create shortcuts for the variables they want to use often in a script.
const OnAMission= CTheScripts__OnAMissionFlag

It might be helpful to include as much class/struct and field information as possible. For example: CTheScripts::OnAMissionFlag ==> CTheScripts__OnAMissionFlag

Seems like unnecessary introducing backstage CPP implementation details for users who do not need and might not understand it.

I think some var should has global aliases, like nul and on_mission, and other, maybe be hosted by static class?
GVV.VarName
This way they will be enclosed in 'namespace' and additionally autocomplete will show available options.