Generate functions for doing fact IO

Question

Generate functions for doing fact IO

luc-tielen opened this issue 3 years ago · 1 comments

Right now, the compiler only generates function for running the actual Datalog program.
There is however no way to get data in or out of the system.

Some ways this could be done (this should be done at the EIR level):

Register (de-)serialization in a (global) registry, nested inside the program struct.
Generate a add_facts / get_facts function that does a switch based on a constant representing the fact type, e.g.:

typedef enum { 
    edge, 
    path 
    // more cases as needed..
} fact_type;

void eclair_add_edge_facts(void* memory, size_t count);
void eclair_add_path_facts(void* memory, size_t count);

void eclair_add_facts(fact_type type, void* memory, size_t fact_count) {
    switch (type) {
        case edge: eclair_add_edge_facts(memory, fact_count);
        case path: eclair_add_path_facts(memory, fact_count);
        // more cases as needed...
    }
}

// NOTE: the return type would signify how many facts are actually read
// NOTE 2: how to deal with insufficient memory? different type signature needed!
size_t eclair_get_facts(fact_type type, void* memory, size_t num_bytes) {
    switch (type) {
        case edge: 
            return eclair_get_edge_facts(memory, num_bytes);
        case path: 
            return eclair_get_path_facts(memory, num_bytes);
        // more cases as needed...
    }
}

Answer 1 · 2022-04-23T17:10:33.000Z

Did some investigation how WASM does IO with JS. If arrays are used, they all need to be of the same type, e.g. Uint32Array. Arrays can be returned directly from WASM, but they do need to be freed at some point. This probably means a low level unsafe interface to eclair needs to be exposed. On top of that unsafe low level layer, an easy to use layer can be build on top of it.

The API could look like this:

// NOTE: this function mallocs, so needs to be freed!
// NOTE 2: array size is retrieved via btree_size!
uint32_t* eclair_get_facts(eclair_program*, fact_type type);

void eclair_add_facts(eclair_program*, fact_type, uint32_t* data, size_t fact_count);
void eclair_add_fact(eclair_program*, fact_type, uint32_t* data);  // same as eclair_add_facts, but the data contains only values for 1 record.

This can also work with strings (and potentially ADTs as well?), by requiring the symbol table to be manually used (both for reading and writing strings in facts). This would only happen in the unsafe API and could be done automatically in the higher level API.

Benefits of this approach:

Straight forward to build this API, if we know the size of the relation, we know how much to allocate all at once.
The compiler could generate some helper code for this, or this could be provided in the form of a library.

Cons:

It is possible to misuse the lower level, unsafe API and will require a user-friendly layer build on top. (Both in C and JS.)