ZornsLemma/lib6502-jit

Partial Execution

edescourtis opened this issue · 3 comments

Hi, I would like to use lib6502-jit in jit mode but I want to regain control of the execution thread so I can multiplex many 6502's on the same thread. How can I accomplish this?

Hi,

Unfortunately I don't think there is currently a way to do this - the original lib6502 has this mentioned in the BUGS section of the manpage and lib6502-jit inherits this restriction.

However, if you're willing to do some hacking you can probably make this work. Disclaimer: I haven't tried any of the things I'm about to suggest! I'm trying to be helpful, but I'm not claiming this is going to be easy! :-)

If the code running on the emulated 6502s is under your control to some extent, you could put some illegal instructions (e.g. opcode 0x03, which is a 1 byte NOP on the 65C02 we are emulating) at strategic places. You can then register an illegal instruction callback on opcode 0x03 with something like:

M6502_setCallback(mpu, illegal_instruction, 0x03, my_callback);

my_callback() will then get called when that instruction is executed and you can use that as an opportunity to switch to a different emulated 6502.

If the code isn't under your control, a really hacky approach which I think will work without you needing to modify lib6502-jit itself is to register a call callback on every address in the 64K address space. Provided the 6502 code doesn't spin forever inside a loop using a relative branch instruction (these don't trigger call callbacks) with no JMP or JSR inside it, your callback (which can obviously be the same function for every address) will tend to be called relatively frequently. You can then do your multiplexing inside the callback.

An alternative possibility would be for you to hack lib6502-jit to treat some valid opcodes as an illegal opcode as far as callbacks are concerned. Pull out the last few lines of FunctionBuilder::illegal_instruction() into a new function:

void FunctionBuilder::handle_illegal_instruction_callback(uint16_t &ct_pc)
{
    uint16_t opcode_at = ct_pc;                                                  
    uint8_t opcode = ct_memory_[opcode_at];                                      

    if (callbacks_.illegal_instruction[opcode] != 0)                             
    {                                                                            
        return_illegal_instruction(ct_pc, opcode_at, opcode);                    
    }                                                                            
 }

Add a call to handle_illegal_instruction_callback() at the start of the opcode handling blocks in FunctionBuilder::build_at() for whichever opcodes take your fancy - I'd suggest all the branch and jump ones would be a good set of candidates, since 6502 code can't execute for very many cycles without hitting one of these. You can then register an illegal instruction callback on each of those opcodes and control will transfer to your callback periodically. (Edited to add: This may not work. I suspect the actual work of the opcodes you add a call to handle_illegal_instruction_callback() wouldn't get performed. You might be able to fix this by judicious placement of the call to handle_illegal_instruction_callback(), but the chances are it could get fiddly.)

Whatever method you use you get control to reach a callback periodically, you may want to longjmp() out of the callback to the function which called M6502_run() - otherwise I think you're at risk of the call stack getting arbitrarily deep. You can see an example of this in test/setjmp-trick.c.

You will probably get away with using the default 'hybrid' mode (where JIT compilation is done on a separate thread) but it may be safer to start out using M6502_ModeCompiled to reduce the number of threads in play. I'd hope the code is all thread safe since it tends to work on an M6502 object rather than having global state, but it hasn't been tested in this way and you may run into problems. (I don't know if LLVM itself is thread-safe; it may well be, of course.)

I think the "right" fix for this would be something like:

  • add a 'cycles executed' pseudo-register in Registers.h
  • generate code to increment this by a (possibly approximate) number of cycles as part of every instruction translated (or accept that it's not counting real cycles and only increment it on jump/branch instructions, so it's just a rough indicator of emulated time passing)
  • generate code to check that 'cycles executed' pseudo-register as part of translating jump/branch instructions and return control to the caller if it's higher than some value, so M6502_run() would return after approximately that many cycles

This would of course add overhead, but we could avoid generating any code to increment or check the cycles executed register unless a new M6502_set_cycle_limit() function had been called to set a finite limit.

I am not going to have the time to try implementing this in the near future, unfortunately, but I will bear it in mind. If you want to have a go and you have any more questions about this or the other hacky approaches I've suggested above I'm very happy to try to answer them! It has been a few years since I really touched the code so I may be a bit rusty...

Just out of interest, what are you working on here? Multiple 6502s on one thread sounds both cool and weird. :-)

Cheers.

Steve

Hi Steve,

Thanks for the very thoughtful and detailed answer. I am looking for the right fix here. Thanks for offering feedback. I will have some questions along the way.

And now for the explanation of why in the world I would want to multiplex many 6502's on a single system and also why I care about JIT.

In my professional life, I work in Actor languages like Erlang and Elixir and build distributed systems. Actors make remarkably good objects because they decouple time and provide excellent fault isolation. In fact, Alan Kay which coined the term object-oriented described objects as message passing computers (this talk really got me thinking about this a few years ago - https://www.youtube.com/watch?v=oKg1hTOQXoY ). Also, it has been said that every object should have an IP address (instead of a pointer reference).

So here is what I am thinking and I know this will sound crazy. A 6502 computer in my system is an object it sends and receives messages (maybe CBOR+UDP packets or maybe CoAP http://coap.technology/ ). Its pointer is simply an IPv6 address. The object can be migrated to different machines by simply serializing the memory and CPU state and rerouting traffic (maybe with DNS or a more elaborate routing mechanism). A 6502 can spawn another 6502 machine instance and set up a bidirectional link to detect normal or abnormal termination (or be automatically terminated if it doesn't handle the event, think about how Unix handles processes). I estimate it should be possible to run at least a hundred thousand 6502 instances on a modern server using a library like lib6502-jit using epoll for IO and handling sockets message queueing and other things outside of the 6502 emulator.

Why the 6502? Many compilers can target the 6502 so that makes it convenient and the assembly language is trivial to learn. I was also considering just making a x86 hypervisor but that seems like a lot more work and the resulting objects would be too big (memory is the bottleneck in actor systems usually). Everything that targets the 6502 has absolutely no bloat in it. That means 400 bytes of 6502 code really does something interesting.

Another problem I would like to tackle is this idea of data abstraction. The idea that data should travel with an implementation and present an interface instead of having to understand every format (think browsers trying to implement everything instead of providing a VM). For example, why should a browser understand jpeg, png, gif etc... Why should that be built-in? Why can't the implementation of jpeg be shipped with the page? Same goes for Javascript, CSS, HTML and so on (by the way that would eliminate most browser bugs because you would only have to target one implementation of CSS, HTML, Javascript and so on). I don't want to write a browser but it could be interesting to experiment with that idea (with 6502s of course).

Thanks again!

Eric

Hi Eric,

That does sound a bit crazy, but intriguing as well! I'll be interested to see how you get on with it.

Sounds like you're going to have a go at implementing what I suggested as the "right" fix yourself - I don't think that will be all that hard (probably easier than at least one of the hacks I proposed...), but let me know if you run into any difficulties and I'll do what I can to help.

Cheers.

Steve