Shapiro

Shapiro is a zoo of datalog interpreters, alongside a relational engine.

Here you can find, at the moment, three simple in-memory interpreters that support recursive queries.

A datalog and relational algebra engine that relies on an ordered(not necessarily sorted) container for storage and sorted sets as indexes - Simple Datalog
A engine that uses no indexes and does not rely on order - ChibiDatalog
A sort of fast in-memory parallel and distributed datalog engine that supports adding and removing rules and data with optimal update times - DifferentialDatalog

The following snippet showcases ChibiDatalog in action.

#[cfg(test)]
mod tests {
   use crate::models::reasoner::{Dynamic, Materializer, Queryable, UntypedRow};
   use crate::models::datalog::{SugaredRule};
   use crate::models::index::ValueRowId;
   use crate::reasoning::reasoners::chibi::ChibiDatalog;

   #[test]
   fn test_chibi_datalog_two() {
      // Chibi Datalog is a simple incremental datalog reasoner.
      let mut reasoner: ChibiDatalog = Default::default();
      reasoner.insert("edge", vec![Box::new(1), Box::new(2)]);
      reasoner.insert("edge", vec![Box::new(2), Box::new(3)]);
      reasoner.insert("edge", vec![Box::new(2), Box::new(4)]);
      reasoner.insert("edge", vec![Box::new(4), Box::new(5)]);

      // Queries are structured as datalog programs, collections of rules. The following query
      // has two rules, one of them dictating that every edge is reachable from itself
      // and another that establishes reachability to be transitive. Notice how this rule
      // is recursive.
      let query = vec![
         // In database-terms, this first rule says: for every row in the table edge, add it to table reachable
         SugaredRule::from("reachable(?x, ?y) <- [edge(?x, ?y)]"),
         // and the second: for every row r(?x, ?y) in reachable, for every other row s(?y, ?z) in reachable, add a new
         // row where (?x, ?z). 
         SugaredRule::from("reachable(?x, ?z) <- [reachable(?x, ?y), reachable(?y, ?z)]"),
      ];

      // To materialize a query is to ensure that with any updates, the query will remain correct.
      reasoner.materialize(&query);

      // The input graph looks like this:
      // 1 --> 2 --> 3
      //       |
      //         --> 4 --> 5
      // Then, the result of the query would be:
      // (1, 2)
      // (1, 3)
      // (1, 4)
      // (1, 5)
      // (2, 3)
      // (2, 4)
      // (2, 5)
      // (4, 5)
      let mut update: Vec<(&str, UntypedRow)> = vec![
         ("reachable", vec![Box::new(1), Box::new(2)]),
         ("reachable", vec![Box::new(1), Box::new(3)]),
         ("reachable", vec![Box::new(1), Box::new(4)]),
         ("reachable", vec![Box::new(1), Box::new(5)]),
         ("reachable", vec![Box::new(2), Box::new(3)]),
         ("reachable", vec![Box::new(2), Box::new(4)]),
         ("reachable", vec![Box::new(2), Box::new(5)]),
         ("reachable", vec![Box::new(4), Box::new(5)]),
      ];

      update
         .into_iter()
         .for_each(|(table_name, query)| assert!(reasoner.contains_row(table_name, &query)));

      // Now, for the incremental aspect. Let's say that we got an update to our graph, removing
      // three edges (1 --> 2), (2 --> 3), (2 --> 4), and adding two (1 --> 3), (3 --> 4):
      // 1 --> 3 --> 4 --> 5
      // Given that the query has been materialized, this update will not re-run it from scratch.
      // Instead, it will be adjusted to the new data, yielding the following:
      // (1, 3)
      // (3, 4)
      // (3, 5)
      // And retracting
      // (1, 2)
      // (2, 3)
      // (2, 4)
      // (2, 5)
      // However, this adjustment isn't differential, that is, the computation isn't
      // necessarily proportional to the size of the change, hence you should avoid updating
      // until you have a batch large enough. Empirically, batches of size 1-10% are alright.
      // Take note that this will not be a problem, at all, unless you are handling relatively
      // large amounts of data (a hundred thousand elements and above) with complex queries.
      reasoner.update(vec![
         (true, ("edge", vec![Box::new(1), Box::new(3)])),
         (true, ("edge", vec![Box::new(3), Box::new(4)])),
         (false, ("edge", vec![Box::new(1), Box::new(2)])),
         (false, ("edge", vec![Box::new(2), Box::new(3)])),
         (false, ("edge", vec![Box::new(2), Box::new(4)])),
      ]);

      update = vec![
         ("reachable", vec![Box::new(1), Box::new(3)]),
         ("reachable", vec![Box::new(3), Box::new(4)]),
         ("reachable", vec![Box::new(3), Box::new(5)]),
      ];

      update
         .into_iter()
         .for_each(|(table_name, query): (&str, UntypedRow)| assert!(reasoner.contains_row(table_name, &query)));

      update = vec![
         ("reachable", vec![Box::new(1), Box::new(2)]),
         ("reachable", vec![Box::new(2), Box::new(3)]),
         ("reachable", vec![Box::new(2), Box::new(4)]),
         ("reachable", vec![Box::new(2), Box::new(5)]),
      ];
      update
         .into_iter()
         .for_each(|(table_name, query): (&str, UntypedRow)| assert!(!reasoner.contains_row(table_name, &query)));
   }
}

In case you are interested in performance, there is a benchmark harness under ./src/bin.rs. In order to run it, clone the project and run:

cargo run --release -- ./data/lubm1_with_tbox.nt ./data/rdfs.dl chibi true true 0.99 nt true

This will run three evaluations:

First it will materialize 99% of the input data, and report the runtime
Then it will incrementally maintain the materialization, while adding the remaining 1%
Lastly, 1% of the data will be removed.

You can set different percentages, data, program, and else. Just run --help to understand what kind of arguments are allowed.

Next up

Magic sets
Negation(stratification is already implemented)

brurucy/shapiro

Shapiro

Next up