metacall/core

Allow MetaCall to run from `python.exe` or `node.exe`

viferga opened this issue ยท 0 comments

๐Ÿš€ Feature

This is a cool feature that I have discovered while talking to @Duckxz . This feature is also related to GSoC-22: https://github.com/metacall/gsoc-2022#polyglot-debugger

Right now MetaCall suffers from a limitation, which is, if we import metacall in node, python or whatever, it have to be run from metacall CLI itself, otherwise what happens is that metacall creates a new instance of the host language instead of reusing the one it is already running in. For example:

duplicated.js:

const { metacall_load_from_file } = require("metacall");
metacall_load_from_file("node", ["./index.js"]); // Here metacall will launch a new instance of Node (V8 + Libuv)
node duplicated.js

To avoid this duplicated instance in the past.. we forced to use metacall to run always under metacall CLI, so the host language is already managed by metacall, and this problem won't happen. We were able to detect if we are running under metacall or not thanks to this mechanism (node):

return process._linkedBinding('node_loader_port_module');

When talking with @Duckxz in order to use metacall for debugging external processes (i.e attach to a running process and intercept the functions from it transparently), this problem appeared again because there's no way to do this unless we remove this limitation.

So we came up with the following ideas in order to remove this implementation.

First of all, we need a way to detect if the program is running under metacall CLI or not, this is already implemented in NodeJS as shown before. If this is detected, we should mark the specific language as "host", a flag that can be queried from the loaders at runtime, so we can act consequently depending on that. For example, if we are running in node.exe, we would not create a new thread with the NodeJS instance at the initialization, and just initialize node_loader related things as if we were already on the V8 thread, we do not need to do the trampoline magic or anything because we are already in the thread.

Another problem is the symbols. For example, node_loader.dll is already linked to node.dll, this will be highly problematic because it won't work at all. So when the host language is NodeJS (and we are running under node.exe), we have to use the symbols from the node.exe executable instead the ones pointing to node.dll, for this we have two options (which I don't know all implications to of all of them in detail, but I have a general idea).

  1. Static, portable and safe version:
    Create two versions of each loader, one will be the normal one, linked to node.dll as it is right now, and we will have another version of the loader called *_loader_host or *_host, for example: node_loader.dll and node_host.dll. The second one won't be linked to node.dll, and the symbols will be marked as unresolved, so they get resolved at runtime using the already existing symbols (in this case, the ones present and exported in node.exe, and in the case of python, the ones previously loaded by libpython.so). This should work in theory because most runtimes have the interpreter designed to be a plugin system itself, so the symbols are usually available in the already running process in some way or another (either previously loaded by dynamic linked libraries or exported by the executable (compiled statically)).

  2. Dynamic, non-portable and unsafe version:
    This will be achieved by a technique called runtime symbol interposition. This technique allows to patch on the fly the symbols of a loaded library and redirect them to whenever you want. Here's a full explanation of the technique: https://www.codeproject.com/Articles/70302/Redirecting-functions-in-shared-ELF-libraries ; and here's the source code of the PoC for Linux: https://github.com/shoumikhin/ELF-Hook . This technique will avoid to have duplicated versions of each loader, but it has drawbacks as it has to be implemented for each platform and it is highly unsafe. The idea of this solution is to load the libnode_loader.so, then iterate through the list of dependencies (which libraries it is linked to) and get the symbols. The POSIX library allows you to list all the dependencies and list the symbols of each dependency ( https://stackoverflow.com/questions/29903049/get-names-and-addresses-of-exported-functions-in-linux ). We can store all this references into a list, then use the dlsym trick to get the symbols of the process itself, and then use runtime symbol interposition to redirect the libnode_loader.so symbols to the ones already loaded in the host.

Apart from this, there's also other problems related to threading and runtime execution. For example, this is already solved in node and it will work, but... all the calls will happen already in the V8 process, as it will be running in the main process, instead of a new thread as it does right now in node_loader. We should also detect if it is running on the host or not, and avoid calling Py_Initialize and Py_Finalize. Although we will need to clean the resources of the host anyway, in a delayed form (i.e without calling directly to metacall_destroy).

So we will have to implement a handler for atexit event in each loader, that triggers the metacall_destroy if it is being run from the host (Py: https://docs.python.org/3/library/atexit.html ; Node: using AtExit from C++ or process.on('exit'...).

And there's an extra limitation which is not extremely important for this feature itself but it's necessary for the debugging part, which is, we should get the main module and run discovery against it ( Py: https://stackoverflow.com/a/8706331 ; Node: https://nodejs.org/api/modules.html#requiremain ), so the modules already present in the running instance are also populated into MetaCall, so we can control them or detour them... etc.

There may be other limitations and problems but I can't detect them unless we start with the PoC. Right now those are the main ones.

For summarizing, here's the list of tasks:

  1. Define 2 libs for each loader (host and loader).
  2. In the port, register as host the current language if we detect it is not running from metacall (only one host language is possible), so we provide API for detecting the host language at runtime in the loaders.
  3. In the port, if we detect it is running outside of metacall, call metacall_initialize and register atexit hook for calling metacall_destroy when the host finalizes.
  4. For each loader, either detect by compile time or runtime if we are the host and then avoid the init and destroy of the runtime if we are the host.
  5. For each loader, if we are the host, run introspection against the top module itself.