CONQUESTO

Conquesto is a program used to compare the performance between two algorithms for the problem of CERTAINTY(q), with q a conjunctive query.

This README assumes the reader is familiar with CERTAINTY(q). More details are given in our paper (TODO : put the link).

Description

Conquesto generates every possible conjunctive query over a fixed schema, such that each query is unique and can be rewritten in first-order logic. It then proceeds to write Answer Set Programming (ASP) programs that encode the problem CERTAINTY(q) (with q a query). More precisely, it generates four files for each query:

One that contains the general algorithm for CERTAINTY(q), called Generate and Check because it generates every possible model and checks if it is a correct model.
One that contains the algorithm relying on the fact that CERTAINTY(q) can be written in first-order logic.
One database that falsifies CERTAINTY(q).
One database that verifies CERTAINTY(q).

The program can also be used to run two different ASP solvers on these files. Moreover, information such that CPU time is retrieved and can be plotted in order to compare the performance between Generate and Check and the first-order rewriting. The results can be consulted in the paper. The graphics are also available in the generated/figures folder.

This program was written for a project for the course of Knowledge Representation and Reasoning given by Jef Wijsen at the University of Mons. It has only been tested on Linux. However, the Python scripts should also work under Windows and Mac OS X.

Files

We give a short overview of the contents of each file:

conquesto/query/query.py defines a Query class. This class is used to encode a query, and to write the corresponding ASP programs.
conquesto/query/attack_graph.py is used to verify that a query has a first-order logic rewriting and to create it.
conquesto/query/fo_rewriting.py effectively creates the first-order logic rewriting.
conquesto/generate_statistics.py run the two ASP solvers and extract the useful statistics.
conquesto/generate_figures.py uses the generated statistics to produce the figures shown in the paper.
conquesto/generate_files.py generates the files needed for the experiments (see How to use).
conquesto/example.py presents how to create queries and how to produce the corresponding ASP programs.
The files in conquesto/generators are the queries generators. That is, given a schema, they generate the queries (randomly, or not).
run_programs.sh and run_clingo_with_cardinality_constraints.sh are used to run clingo or DLV on the files generated by the Python scripts (see How to use).

This directory contains the code we have done for the KRR project.

Dependencies

Conquesto relies on a few dependencies. Note that none of these dependencies are distributed in this repository. That is, they must be manually installed.

The Python code relies on the following dependencies:

numpy
matplotlib

To be able to run the ASP solvers, the following programs must be accessible:

clingo.
dlv. Note that the DLV license available online is a free license for academic and non-commercial use, as well as for use by non profit organizations.

Under Linux, you can ensure these two programs are available by using a package manager, or by installing the programs in /usr/local/bin or in /home/user/.local/bin (if your system supports the .local folder).

How to use

Run

python3 conquesto/generate_statistics.py

to generate the statistics' results, using the Python script. Note that this method does not correctly evaluate the CPU time for DLV. Instead, it captures the real time (or the user time).

Therefore, it is recommended to instead use the Bash script run_programs.sh, which only works under Linux. This script generates the ASP programs and databases by executing the Python script generate_files.py. There are two points we would like to highlight:

These files only focus on a single database schema with two relations R and S:
- R has three positions. Among these positions, the first two form the primary keys.
- S has two positions. The first position is the primary key.
- If you would like to run experiments on a different schema, it is enough to modify generate_files.py (see the documentation within the file).
In order to be able to compare clingo and DLV results, run_programs.sh's Generate and Check programs do not use the cardinality constraints (as only clingo support these constraints).
- If you want to run experiments with Generate and Check programs using the cardinality constraints, run run_clingo_with_cardinality_constraints.sh (but note that DLV will not be executed).

Then, run

python3 conquesto/generate_figures.py

to create the corresponding figures.

Depending on what you want to generate, you may need to modify the Python scripts. More precisely, in the main part of generate_statistics.py and generate_figures.py, you can set different parameters. We do not offer a command-line interface to set these parameters.

However, the Bash script does take four arguments, in this order:

The program to use ("clingo" or "dlv")
The database type ("yes", "no", or "random")
The minimal value for the parametric database size which controls the size of the generated databases.
The maximal value for the parametric database size. Also, the script has one optional argument (that must come after all the others):
The time limit, in seconds. By default, it is 100 seconds.

For example:

./run_programs.sh dlv no 40 70

will run DLV on the no-dbs with a parametric database size between 40 et 70. To change the time limit to 10 seconds, write the following line:

./run_programs.sh dlv no 40 70 10

Generated datasets

For each experiment, a dataset is generated. The first row of the file gives a short description of the dataset:

The database type (yes, no, or random).
The time limit, in seconds.
The number of queries (note that this depends on the schema used).
The number of relations, followed by the number of positions in each relation.

Each of the following rows of the file gives the results of both Generate and Check and first-order rewriting for a given query and database. Each row contains 21 columns, in this order:

The parametric database size.
The actual database size.
The query ID (starting at zero).
Then, we have 9 columns for Generate and Check, followed by the same 9 columns for the first-order rewriting:
- Whether the program reached the time limit.
- The satisfiability of the program (False, or True).
- The CPU time taken.
- The number of choices.
- The number of rules.
- The number of atoms.
- The number of bodies.
- The number of variables.
- The number of constraints.

For the definitions of choices, rules, and so on, see the documentation of clingo.

Note that dlv does not give the number of atoms, bodies, and variables. The columns are filled with zero in this case, but they should not be considered in a statistical analysis.

Contributors

Jonathan Joertz
Dorian Labeeuw
Gaëtan Staquet
Jef Wijsen

License

Conquesto is distributed under the BSD-3-Clause license. See LICENSE for more information.

DocSkellington/Conquesto