DatalogBench is a collection of Datalog programs from the literature in various fields including databases, information retrieval, and program analysis.
Its main purpose is to serve as a benchmark suite to evaluate techniques for learning Datalog programs from input-output data. This is an important problem studied in fields such as program synthesis and Inductive Logic Programming, with applications in a variety of domains such as bioinformatics, big-data analytics, natural language processing, networking, program analysis, and robotics.
Besides the target Datalog programs themselves, this collection includes input-output datasets for each program, as well as code to generate candidate rules using various methods of inducing syntactic bias, akin to syntax-guided synthesis (SyGuS) or template rules in meta-interpretive learning.
All rule generation algorithms produce candidate rule sets in the format supported by Souffle. We select a subset of these candidate rule sets as our final program. All rule generation algorithms can be found in the rule-gen folder. There are four variants:
generate, which is a standard brute-force rule generation algorithm. It first enumerates all possible combinations of relations and variables, then applies a number of filters to remove redundant and clearly incorrect rules. Notably, it filters any rules that have variables that match different types in different relations.
generate-fast, which, unlike generate, does not enumerate any rules whose types do not match the types of the relations, thus dramatically cutting down on the algorithm's runtime. However, this algorithm offers less control over the candidate rule set generated. The algorithm applies a number of filters to remove redundant and clearly incorrect rules as well.
generate-back, which is equivalent to generate, but inserts the Rule relation, which is what switches on a candidate rule within Souffle, at the front of each rule rather than the back. This increases the speed in which Souffle compiles the candidate rules, since it reads rules from left to right.
generate-negation, which is equivalent to generate, but is capable of handling negation.
Each benchmark is represented by a folder and has a number of supporting files. Note that each benchmark may not contain all files listed below.
Input-output datasets are provided in two different formats, called primary and alternate, as described below.
This format provides input data in *.facts files and output data in *.expected files. In particular, the tuples of each input relation R are provided in a file named R.facts, and the expected tuples of each output relation S are provided in a file named S.expected.
This format provides all input-output data combined in a single .d file. In particular, the input-output data for benchmark B is provided in a file named B.d.
Rule generation data is provided in two different formats, called primary and alternate, as described below.
This format defines input and output relations for the rule generation algorithm in a rules.t file. The rule generation algorithm takes rules.t as input, and enumerates every possible combination of candidate rules. It applies a number of filters to remove redundant and clearly incorrect rules, such as ensuring that the variable numbers are minimized left to right and that variables match the types of the relations throughout the rules. Finally, it produces a rules.large.dl file. There should be one rules.t file per benchmark.
The candidate rule set generated by the rule generation algorithm using the provided rules.t file is stored in a rules.large.dl file. Note that rules.large.dl usually contains between 2 - 10 times the number of candidate rules as rules.small.dl (see 2.2, Alternate format). There should be at most one rules.large.dl file per benchmark.
For the below files, X may be "small" or "large". If X is "small", it represents files derived from the alternate (see 2.2, Alternate format) rule generation algorithm. If X is "large", it represents files derived from the primary rule generation algorithm.
The auxillary rule set used to generate coprovenance, the intersection of all possible derivation trees for a given tuple under a candidate rule set, for rules.X.dl is stored in rules_notexists.X.dl. Note that these rules are not part of the candidate rule set. rules_notexists.large.dl is created automatically by the primary candidate rule generation algorithm when generating rules.large.dl. There should be at most two rules_notexists.X.dl files per benchmark.
The auxillary rule set used to generate allprovenance, the union of all possible derivation trees for a given tuple under a candidate rule set, for rules.X.dl is stored in rules_exists.X.dl. Note that these rules are not part of the candidate rule set. rules_exists.large.dl is created automatically by the primary candidate rule generation algorithm when generating rules.large.dl. There should be at most two rules_exists.X.dl files per benchmark.
The list of rule numbers which define the candidate rules the algorithm is able to access is stored in ruleNames.X.txt for rules.X.dl. By default, rule numbers for all candidate rules are listed in the file. ruleNames.large.txt is created automatically by the primary candidate rule generation algorithm when generating rules.large.dl. There should be at most two ruleNames.X.txt files per benchmark.
This format provides rule template data in a *.tp file, which is required for an alternate rule generation procedure described in this FSE 2018 paper. In particular, the template data for benchmark B is provided in a file named B.tp. There should be at most one *.tp file per benchmark.
The candidate rules set sourced from an alternate rule generation algorithm from a previous project is stored in a rules.small.dl file. There should be at most one rules.small.dl file per benchmark.
Additionally, a few benchmarks (quickfoil-webkb and quickfoil-bongard) contain variants:
quickfoil-webkb has 8 variants which are generated from different data sources but have similar size candidate rule sets.
quickfoil-bongard has 10 variants which are generated from the same data source but have varying size candidate rule sets.