This is the implementation of several methods to solve the most probable paths problem of Stochastic context-free path querying (SCFPQ). SCFPQ --- a way to specify path constraints in terms of stochastic context-free grammars. Our main task here to solve most probable path problem. Most probable paths problem is for a given stochastic context-free grammars and a labeled directed graph, to find the maximum probabilities between all pairs of nodes.
Our solution is a reduction to systems of nonlinear matrix equation solving. Two methods to solve this systems is proposed:
- Naive Iterative Method
- Newton-Krylov Method
SCFPQ requires the following to run:
- Python v3.7+
- CUDA Toolkit 10.2+
- cupy
from main_structures import Grammar, Graph, Equation
You can download grammar and graph from file:
grm = Grammar(grammars_path)
grh = Graph(graph_path)
grh.fill()
And then construct the equation with:
eq = Equation(grm, grh)
Now you can solve the obtained system with two different methods:
- Naive Iterative Method
res = eq.naive_iteration()
- Newton-Krylov Method
res = eq.newton_krylov(equation)
Or you can also choose parameters, such as initial guess for iterative process and tolerance:
res = eq.newton_krylov(equation=another_function, initial_guess=x0, tol=10e-6, info=True)
The graph data is selected from CFPQ_Data dataset. Graphs related to RDF analysis problems was chosen.
The results of the most probable paths problem on graphs related to RDF analysis are listed below.
For query g1:
S -> subClassOf_r S subClassOf (0.4)
S -> subClassOf_r subClassOf (0.25)
S -> type_r S type (0.2)
S -> type_r type (0.15)
Graph | V | E | nnz | naive-iteration | newton-krylov | ||
---|---|---|---|---|---|---|---|
it | time(sec) | it | time(sec) | ||||
generations | 129 | 273 | 2164 | 5 | 0.02 | 2 | 0.4 |
travel | 131 | 277 | 2499 | 11 | 0.05 | 3 | 0.6 |
funding | 778 | 1086 | 17634 | 9 | 0.04 | 3 | 0.7 |
wine | 733 | 1839 | 66572 | 11 | 0.06 | 3 | 0.7 |
pizza | 671 | 1980 | 15195 | 15 | 0.09 | 4 | 0.9 |
core | 1323 | 2752 | 204 | 50 | 0.3 | 4 | 0.9 |
pathways | 6238 | 12363 | 884 | 9 | 0.2 | 4 | 2.1 |
enzyme | 48815 | 86543 | 396 | 9 | 0.2 | 4 | 5.2 |
eclass | 239111 | 360248 | 90994 | 11 | 19.6 | - | - |
go-hierarchy | 45007 | 490109 | 588976 | 17 | 8.3 | 10 | 144.2 |
geospecies | 450609 | 2201532 | 91 | 3 | 0.2 | 3 | 9.6 |
For query g2:
S -> subClassOf_r S subClassOf (0.45)
S -> subClassOf (0.55)
Graph | V | E | nnz | naive-iteration | newton-krylov | ||
---|---|---|---|---|---|---|---|
it | time(sec) | it | time(sec) | ||||
generations | 129 | 273 | 0 | 0 | <0.01 | 0 | <0.01 |
travel | 131 | 277 | 63 | 7 | 0.02 | 2 | 0.3 |
funding | 778 | 1086 | 1158 | 7 | 0.02 | 2 | 0.4 |
wine | 733 | 1839 | 133 | 7 | 0.02 | 2 | 0.3 |
pizza | 671 | 1980 | 1262 | 13 | 0.04 | 3 | 0.5 |
core | 1323 | 2752 | 214 | 7 | 0.02 | 2 | 0.3 |
pathways | 6238 | 12363 | 3117 | 9 | 0.02 | 4 | 0.8 |
enzyme | 48815 | 86543 | 8163 | 9 | 0.05 | 6 | 1.5 |
eclass | 239111 | 360248 | 96163 | 9 | 5.3 | 5 | 213 |
go-hierarchy | 45007 | 490109 | 738937 | 17 | 3.6 | 9 | 77.0 |
geospecies | 450609 | 2201532 | 0 | 0 | <0.01 | 0 | <0.01 |