Author : Boltenhagen Mathilde with Thomas Chatain and Josep Carmona
Date : 09.2019
This project implements Process Mining algorithms with SAT encodings to get optimal results for verification problems.
Boolean formulas are first created, then converted into CNF form and solved with SAT solvers, thanks to pysat
.
This librairy used pm4py
Objects.
The project is a translation of the Ocaml version darksider
created by Thomas Chatain and Mathilde Boltenhagen.
- Encoding Conformance Checking Artefacts in SAT by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona
- Anti-alignments in conformance checking–the dark side of process models by Thomas Chatain, Josep Carmona
- Generalized Alignment-Based Trace Clustering of Process Behavior by Mathilde Boltenhagen, Thomas Chatain, Josep Carmona
python 3.7.x
Simply run :
pip install da4py
(https://pypi.org/project/da4py/0.0.1/)
The librairy uses pm4py.
pm4py.objects.petri import importer
pm4py.objects.log.importer.xes import factory as xes_importer
from da4py.src.main.conformanceArtefacts import ConformanceArtefacts
# get the data with pm4py
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
Formal definition : Given a finite collection
$L$ of log traces and a model$N$ , an anti-alignment is a run$u \in Runs(N)$ which maximizes its distance$\min_{\sigma \in L} dist(\sigma,u)$ to the log.
This launches the main module. This object, the model and the traces must be reloaded for each experimentation. This is an issue that will be fix soon.
artefacts = ConformanceArtefacts()
We can to set the size of the anti-alignment we want (usefull for prefix) :
artefacts.setSize_of_run(10)
For execution times or memory problems, we can set the maximum number of difference that will be tried.
artefacts.setMax_d(10)
Two types of distances are available :
- Hamming distance
- Edit distance
artefacts.setDistance_type("edit")
Then an anti-alignment can be found by running :
artefacts.antiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())
Then we can compute precision :
print(artefacts.getPrecision())
One can add silent transition label that will not cost in the distances :
artefacts.setSilentLabel("tau")
We can also compute sum instead of min :
artefacts.setOptimizeMin(False)
The same features (not precision) also work for multi-alignment:
model, m0, mf = importer.pnml.import_net('<PATH_TO_MODEL>')
traces = xes_importer.import_log('<PATH_TO_LOG>')
artefacts = ConformanceArtefacts()
artefacts.setSilentLabel("tau")
artefacts.setDistance_type("hamming")
artefacts.setOptimizeMin(True)
artefacts.setSize_of_run(10)
artefacts.setMax_d(10)
# run a multi-Alignment
artefacts.multiAlignment(model,m0,mf,traces)
print(artefacts.getRun())
print(artefacts.getTracesWithDistances())
AMSTC is a trace clustering method that allows one to extract subnet centroids from a process model. The input is then a log and a model and it outputs a set of subnets and associated clustered traces. The method is implemented in SAT but a sampling method allows to run large logs.
# process model
model, m0, mf = importer.pnml.import_net('examples/medium/model2.pnml')
# log traces
traces = xes_importer.import_log('examples/medium/model2.xes')
# sampleSize : number of traces that are used in the sampling method
sampleSize= 5
# sizeOfRun : maximal length requested to compute alignment
sizeOfRun = 8
# maxNbC : maximal number of transitions per cluster to avoid to get a unique centroid
maxNbC = 5
# m : number of cluster that will be searching at each AMSTC of the sampling method. Understand that more than m cluster can
be returned.
m = 2
# maxCounter : as this is a sampling method, maxCounter is the number of fails of AMSTC before the sampling method stops
# silent_label : every transition that contains this string will not cost in alignment
clustering=samplingVariantsForAmstc(net,m0,mf,log,sampleSize,sizeOfRun,maxD,maxNbC,m,maxCounter=1,silent_label="tau")
The clustering can then be used like :
from pm4py.visualization.petrinet import factory as vizu
for (centroid, traces) in clustering:
if type(centroid) is tuple:
net, m0,mf=centroid
vizu.apply(net, m0, mf).view()
print(traces)
The tool first constructs SAT formulas using operator classes AND and OR of da4py.utils.formulas.py. Those formulas are fully described in the published related papers.
AND( [], [],
AND( [m_ip [0, 0]], [m_ip [0, 1]],
AND( [], [],
OR( [], [],
AND( [tau_it [1, 0]], [tau_it [1, 1], tau_it [1, 2]], )
AND( [tau_it [1, 1]], [tau_it [1, 0], tau_it [1, 2]], )
AND( [tau_it [1, 2]], [tau_it [1, 0], tau_it [1, 1]], ))
OR( [], [tau_it [1, 0]],
AND( [], [],
OR( [], [],
AND( [m_ip [1, 0], m_ip [0, 0]], [], )
AND( [], [m_ip [1, 0], m_ip [0, 0]], ))
AND( [m_ip [1, 1], m_ip [0, 1]], [], )))
...
Then, the formula is translated to a WCNF form which is solved with pysat
library.
[[2], [-1], [7, -82], [-8, -82], [-9, -82], [8, -83], [82, 83, 84], [3, -86]...]
Affiliations : LSV, CNRS, ENS Paris-Saclay, Inria, Université Paris-Saclay