fuzzball-muck/fuzzball

Improve abort_interp macro to avoid usage of globals

tanabi opened this issue · 4 comments

abort_interp uses the global variables oper1, oper2, oper3, and oper4 which are defined in the header of each primitive file (p_*.c). abort_interp passes these variables to do_abort_interp which is an actual function that, in turn, may use those variables in messaging and similar purposes.

This is completely not threadsafe and generally just not good modern coding practice. Wyld has put effort into removing the usage of other globals in the p_*.c files but this will be a more concentrated effort as it will probably require a lot of rewrites. There's a lot of macro infrastructure built around those oper variables, used all over the place; how this functions will need to be re-imagined.

This kind of strikes me as a use-case for C++. If each MUF prim was an implementation of a base class, the operation variables could be base-class properties, and abort_interp could be a base class method. Really, the MUF implementation gets much cleaner in general as a class-based structure and it promotes a lot of re-use. It would basically remove the need for any of those macros.

Agree with the C++. I was also thinking we could look into doing things like CHECKOP, CHECKOFLOW, and a lot of the base sanity checks before the prim_* functions are even called. This would require a structure like mfn_list to capture argument information.

Possibly best spun off as another Issue, but I thought I'd mention it.

It sounds like it may be time to consider a C++ restructure of MUF prims, which could include the pipeline for base sanity checks. The sanity checks might be part of a base-class heirarchy or something akin to that -- an abstract method for "check_operands" or some such which can be implemented by other base classes that group prims with similar checks.

There's a lot that can be done there, and a lot of ways to do it :)

I spent some time this morning thinking about MUF in a C++ world. I think it would involve the following basic classes:

  • MUFException - for bubbling up MUF errors.
  • MUFPrimitive - the base class for MUFs.

MUFPrimitive would have one abstract method that must be implemented by its children:

  • primitive - this is the actual call that runs the MUF. Stack, etc, would be passed into it. It can throw MUFException

MUFPrimitive would have the following concrete methods:

  • call -- this calls MUF primitive and wraps the abstract method 'primitive' with a try/catch block to catch MUFException and handle it correctly. This is what the MUF interpreter calls to execute the primitive
  • push/pop/etc. -- all those MUF stack macros would be concrete methods
  • basic parameter/type checking methods

MUFPrimitive would then have some subclasses for common cases. I'm thinking like: MUFPrimitiveOneOper, MUFPrimitiveTwoOper, etc. Maybe for some common cases, we'll have type checking too; MUFPrimitiveOneOperInt, etc. Perhaps we could even use a template ... MUFPrimitiveTwoOper<MUFInt, MUFStr> .... though I'll admit my template skillsz are super rusty and I know there's caveats as to how those work.

Also I don't want to get too crazy with the classes as a first stab at C++, so making classes for all the MUF types (which I believe would be required for remplates) may be too much. I dunno! We just have to do what makes sense.

....

So now we have all these classes, what do we do with them? I'm thinking, the MUF primitives are in a giant array right now. Let's keep the array, but now on bootup, let's iterate over the array and initialize all the primitives. That means, we'll have one instance of each primitive's class (essentially making each primitive a singleton) and when we want to run the primitive we'll run it's 'call' method.

The downside to this is that member properites on each instance will need to be threadsafe or avoided. I'm tending to think avoided as there may not be much need for them with this code restructure.

In the interpreter loop this would look something like:

while ( pc iterates over program bytecode) {
    ...
    result = prim_func[pc->data.number-1]->call(player, program, mlev, pc, arg, &tmp, fr);
    ....
}

A couple alternatives I have thought of:

ALTERNATIVE 1: Instance each primitive before using it.

In the interpreter loop this would look something like:

while ( pc iterates over program bytecode) {
    ...
    prim = new prim_func[pc->data.number-1](player, program, mlev, pc, arg, &tmp, fr);
    result = prim->call();
    delete prim;
    ....
}

This makes each primitive legitimately its own separate thing which makes for great isolation, but we're going to be freeing/deleting constantly which isn't great for performance.

ALTERNATIVE 2: INSTANCE PER PRIMITIVE PER MUF CALL

Okay ... so this is a hybrid fo my original idea and alternative 1. The interp_loop will keep a 'cache' of instanced primitives that starts off empty and is cleared when interp_loop exits. Each time we encounter a primitive, we instance it once, and add it to the cache. Which means, for a given program, we will have one instance of each primitive in the program but programs won't share.

This makes for some interesting things, such as the ability for properties on the primitive class to be assured to persist through the run of the program. If a primitive, for example, has a different behavior based on what has happened in that primitive's past, we could do that. The only case I can think of, off the top of my head, is seeded random number generation where past results can influence future ones and differnet programs could then have different seeds.

However, this is complicated to program. It would look something like this:

local_cache = {};  // some hash of prim "ID" to prim instances

while ( pc iterates over program bytecode) {
    ...
    if (!local_cache->has(pc->data.number-1)  {
       local_cache->add(new prim_func[pc->data.number-1]());
    }

    local_cache->get(pc->data.number-1)->call(player, program, mlev, pc, arg, &tmp, fr);
    ....
}

delete local_cache and its members;

Because there's a finite and known number of MUF prims, local_cache could simply be a null-iniitialized array that is sizeof(prim_func) so that ther's no need for a hash. This is slightly memory wasteful as we have, I think, about 400 primitives ... but it would be reasonably efficient performance wise wand I think, for a large program, a hash or dictionary would be way more memory churn and inefficiency than is necessary.

As I type this up, I'm kind of liking ALTERNATIVE 2 a bit ... but anyway, those are my (verbose) thoughts on it.

Good stuff here! I'm leaning toward option 2 as well, and I actually came to the same conclusion about arrays and performance when ruminating on all the flag functionality we manage.

The template idea is intriguing. I'd like to see if that will work, because it seems like it would simplify a lot. I was originally thinking we could abstract out the CHECKARGS logic with some sort of primitive configuration data, and I think those ideas could work together well.

Thanks for typing up your thoughts! Let me know how you'd like to start testing this.