Bayesian Network - SAT-based Local Improvement Method

These instructions have been tested on linux (last updated: 17th June 2022)

Originally accepted at AAAI-21. The tags [cwidth] and [expert] refer to the two follow-up versions accepted at NeurIPS-21 and UAI-22 resp. See Results section for publication details.

Required programming languages and software

Java SDK (required to run BLIP)
- Maven (required to build BLIP)
Python 3.6 or higher (required to run BN-SLIM)
C++ (required to run Merlin) [cwidth]
Make (required to build Merlin and UWrMaxSat)
git (required to obtain ETL source code)
Python 2.7 (required to run ETL)

Note: Before you start, move all python (.py) files into the slim directory: (See Section: Directory Structure)

External tools

UWrMaxSat (`setup-uwrmaxsat.sh`)

Download and unzip source from MaxSAT Evaluation 2019

pushd solvers
wget https://maxsat-evaluations.github.io/2019/mse19-solver-src/complete/UWrMaxSat-1.0.zip
unzip UWrMaxSat-1.0.zip

Apply patch uwrmaxsat.patch to fix minor bug

pushd UWrMaxSat-1.0/code
patch -ub uwrmaxsat/MsSolver.cc -i ../../uwrmaxsat.patch

Build UWrMaxSat

chmod 777 starexec_build
./starexec_build
popd

Place built binary in solvers directory

cp UWrMaxSat-1.0/bin/uwrmaxsat . -v
popd

(Optional) Test

solvers/uwrmaxsat -m test/test.cnf -v1 -cpu-lim=2

BLIP (`setup-blip.sh`)

Download and unzip source from BLIP

pushd solvers
wget https://ipg.idsia.ch/upfiles/blip-publish.zip --no-check-certificate
unzip blip-publish.zip

Apply patch blip.patch

pushd blip-publish
patch -ub -p1 -i ../blip.patch

Compile BLIP [cwidth] and move jar to solvers/blip.jar
```
./compile-blip.sh
cp blip.jar ../blip.jar -v
```

Undo patch blip.patch and apply blip-con.patch

patch -u -p1 -i ../blip.patch --reverse
patch -ub -p1 -i ../blip-con.patch

Compile BLIP [expert] and move jar to solvers/blip-con.jar

./compile-blip.sh
cp blip.jar ../blip-con.jar -v
popd
popd

(Optional) Tests
- [cwidth] check if java -jar solvers/blip.jar solver.kg.adv contains -cw
- [expert] check if java -jar solvers/blip.jar solver.kg.adv contains -con
- Check if tmp.res contains elim-order field after running
```
java -jar solvers/blip.jar solver.kmax -j test/test.jkl -r tmp.res -w 5 -v 1
```

Merlin (`setup-merlin.sh`) [cwidth]

Download and unzip source from Merlin

pushd solvers
wget https://github.com/radum2275/merlin/tree/008d910307a2043f9446676f28010079ea49ec15 --no-check-certificate -O merlin-src.zip
unzip merlin-src.zip

Apply patch merlin.patch to enable elim-order output

pushd merlin-src
patch -ub -p1 -i ../merlin.patch

Compile Merlin
```
make
popd
```
Place executable in solvers directory
```
cp merlin-src/bin/merlin . -v
popd
```

(Optional) Test and check if tmp.PR.json contains value of -7.730569 after running

solvers/merlin -t PR -a wmb -M lex -o tmp -O json -i 4 -f test/test.uai -e test/test.evid

ET-Learn (`download-etlearn.sh`)

Download and unzip source from et-learn or clone the git repo

pushd solvers
git clone https://github.com/marcobb8/et-learn.git et_learn

Follow instructions for installing et-learn from readme

Note: et-learn requires Python 2.7 while BN-SLIM requires Python 3.6+ and these are not mutually compatible. Hence it is recommended to use virtual environments (also refer to this) and keep the two softwares separate.

Copy driver file hcet.py to same directory
```
mv ../hcet.py et_learn
```

(Optional) Test and check if tmp.res is generated after running

cd solvers/et_learn
python hcet.py ../../test/test.data 5 -p -o tmp

Datasets and Experiment Data

The datasets are available in the datasets folder, which contains subfolders dat, jkl and con (see Section on File formats for more details).

The Experimental results are contained in the experiments folder. In addition to some plots, the raw data from the experiments are provided as .csv files.

Requirements for BN-SLIM

To install requirements:

pip install -r requirements.txt

To install optional requirements that allow drawing graphs etc.:

apt install graphviz
pip install -r optional_requirements.txt

Directory Structure

The directory structure should now be as follows (.zip files hidden for brevity, * denotes the generated files)

bnslim
├── README.md
├── setup-uwrmaxsat.sh
├── setup-blip.sh
├── setup-merlin.sh
├── download-etlearn.sh
├── test
│   ├── test.cnf
│   ├── test.jkl
│   ├── test.dat
│   ├── test.uai
│   ├── test.evid
│   ├── test.data
│   ├── demo.res
|   └── demo.net
├── requirements.txt
├── optional_requirements.txt
├── solvers
│   ├── uwrmaxsat.patch
│   ├── blip.patch
│   ├── merlin.patch
│   ├── UWrMaxSat-1.0*
│   ├── uwrmaxsat*
│   ├── blip-publish*
|   ├── blip.jar*
│   ├── merlin-src*
│   ├── merlin*
|   └── et_learn*
|       ├── hcet.py
│       └── ... 
├── datasets
|   ├── dat/
|   ├── jkl/
|   └── con/
├── experiments
|   ├── cwidth/
|   └── expert/
└── slim
    ├── blip.py
    ├── bnio.py
    ├── constrained_encoding.py
    ├── complexity_encoding.py
    ├── berg_encoding.py
    ├── eval_model.py
    ├── generate_constraints.py
    ├── samer_veith.py
    ├── slim.py
    ├── utils.py
    └── verify_cwidth.py

Running BN-SLIM

cd slim
# bounded treewidth
python slim.py ../test/test.jkl 5 -v -u kmax -t 60
# bounded msss [cwidth]
python slim.py ../test/test.jkl 0 -v -u kmax -t 60 -b 0 -d ../test/test.dat --start-with ../test/demo.res -w --feasible-cw --feasible-cw-threshold 108
# constraints + bounded treewidth [expert]
#   - first generate starting network through blip-con.jar
java -jar ../solvers/blip-con.jar solver.kg.adv -v 1 -con ../test/test.con -j ../test/test.jkl -w 5 -r ../test/demo2.res -t 60 -seed 1
#   - then run slim on it
python slim.py ../test/test.jkl 5 -v -u kg.adv -t 60 -b 10 -s 5 --constraint-file ../test/test.con --start-with ../test/demo2.res -r 1

Run python slim.py --help for description of available options

For larger networks it is recommended to invoke as python -O slim.py ...

File formats

Since most of the file formats used/required for this software and its auxiliary helper softwares to run aren't standard, we provide a brief description of some of these file formats.

`.res` file type

A .res file contains the DAG expressed by means of the parent sets as well as a field indicating the score of the DAG and optionally the elimination ordering used to bound the treewidth of the DAG.

See test/demo.res for an example.

The patch applied to BLIP allows it to output .res files with the elimination ordering which isn't supported by it out-of-the-box. Additionally, it adds support for working with maximum state space size instead of treewidth.

[cwidth] The patch applied to Merlin allows us to customize the elimination ordering technique/heuristic of the variables via the -M command line argument. In our use-case, we pre-set the ordering in the uai file and supply -M lex to invoke the lexicographic ordering, thereby preserving and respecting our pre-supplied ordering.

`.dat`/`.data` file types

These files contain the samples drawn from the Bayesian Networks. This isn't a standard format and hence there exist discrepancies in the way different programs and applications treat them. In general, these are space-separated (or comma-separated) files with or without the header. The header refers to two lines at the beginning of the file. The first line of the header contains the names of the random variables (space-separated). The second line contains the domain sizes of the variables in the same order.

`.jkl` file type

A .jkl file contains the pre-computed parent set score function caches. Usually these are computed via the blip.jar scorer.is ....

`.uai`/`.net` file types

These are somewhat standard file formats for storing Bayesian Networks. The .uai format is required by Merlin and the .net format is used as an intermediate format to go from .res -> .uai.

`.evid` file type [cwidth]

File format for supplying evidence to Merlin.

The provided python script hcet.py allows one to obtain the solution of the ETL algorithm(s) in the form of a compliant .res file which can be supplied to slim.py through the --start-with option. Run python et_learn/hcet.py --help for more details.

Note: Since the DAG computed by ETL could contain parent sets that are absent from the pre-computed score function caches in datasets/jkl, one might need to "merge" the jkl files output when running hcet.py with the one from dataset/jkl to obtain a new jkl file which can then be used as input for BN-SLIM

`.con` file type [expert]

Non-standard file format for representing expert constraints. It contains one constraint per line in the format <type> <from> <to>, where <from> and <to> are the endpoints of the constraint of type <type>. The supported constraint types are: positive/negative/undirected arc (denoting direct dependence) and positive/negative ancestry denoting (indirect dependence).

Results

AAAI 2020 paper (∆BIC analysis)

Turbocharging Treewidth-Bounded Bayesian Network Structure Learning
Vaidyanathan Peruvemba Ramaswamy, Stefan Szeider
Proceeding of AAAI-21, the Thirty-Fifth AAAI Conference on Artificial Intelligence, pages 3895–3903, 2021, AAAI Press.)
[bibtex] [url] [pdf]

BN-SLIM(ETLd) vs ETLd

∆BIC	tw 2	tw 5	tw 8
extremely positive	89	77	75
strongly positive	3	2	4
positive	0	4	3
neutral	5	14	15

BN-SLIM(ETLp) vs ETLp

∆BIC	tw 2	tw 5	tw 8
extremely positive	91	80	83
strongly positive	0	2	5
positive	0	1	1
neutral	6	14	8

BN-SLIM(k-MAX) vs k-MAX

∆BIC	tw 2	tw 5	tw 8
extremely positive	94	74	79
strongly positive	0	1	0
positive	0	1	1
neutral	0	6	8
negative	0	0	0
strongly negative	0	1	0
extremely negative	0	11	6

Above tables obtained by running with the following configuration:

python -O slim.py <dataset> <treewidth> -r<seed> -t1800 -u kmax -vx --start-with=<sol.res>

Where sol.res is the initial heuristic solution which was computed before-hand and supplied using the --start-with option.

If that doesn't work, try running with a aaai tagged commit.

NeurIPS 2021 paper

Learning fast-inference Bayesian networks
Vaidyanathan Peruvemba Ramaswamy, Stefan Szeider
Proceedings of NeurIPS 2021, the Thirty-fifth Conference on Neural Information Processing Systems (M. Ranzato, A. Beygelzimer, K. Nguyen, P.S. Liang, J.W. Vaughan, Y. Dauphin, eds.), pages 17852–17863, 2021.
[bibtex] [url] [pdf]

experiments/cwidth/baseline_data.csv and experiments/cwidth/heuristic_data.csv contain all the data for the 16 datasets from datasets.zip, for multiple treewidths, msss bounds and random seeds run with the following configuration:

python -O slim.py <dataset> <treewidth> -t5400 -u <heuristic> -d <datfile.dat> --start-with <heur_sol.res> -w --feasible-cw --feasible-cw-threshold <msssbound> -r<seed> --budget 0 -v"

Where heur_sol.res is the initial heuristic solution which was computed before-hand and supplied using the --start-with option, and datfile.dat is the datfile corresponding to the supplied dataset.

UAI 2022 paper

Learning Large Bayesian Networks with Expert Constraints
Vaidyanathan Peruvemba Ramaswamy, Stefan Szeider
38th Conference on Uncertainty in Artificial Intelligence (UAI 2022), Endhoven, Netherlands, August 1–5, 2022 (James Cussens, Kun Zhang, eds.), 2022.
[bibtex] [url] [pdf]

Con-BN-SLIM vs Con-k-greedy

∆BIC	count
extremely positive	127
strongly positive	0
positive	0
neutral	14
negative	1
strongly negative	0
extremely negative	7

The data from the experimental evaluation for the paper are available as <method>_data.csv files for the scores and <method>_activity.csv files for the improvement activity within the experiments/expert folder.

The script generate_constraints.py was used to randomly select a subset of constraints (an equal number for each type) from the ground truth networks.

aditya95sriram/bn-slim