SamyaDaleh/CL-Toolbox

Prune Trees: Do We Need All of This?

Closed this issue · 5 comments

All of these tasks take too long. Check for all of them if we really need all the trees generated or if we can prune something for a better performance. Feel free to checkout the tasks for difficulty 10 first.

Difficulty 50

TD:
G = <N, T, S, P>
N = {N2, N3, N31}
T = {t0, t1, t2}
S = N3
P = {N2 -> N3, N31 -> ε, N31 -> N3 N31, N3 -> t2 N31, N3 -> t0 t0 t2 N3 N31, N3 -> t0 N3 N31, N3 -> t2 t2 t1 N3 N31, N3 -> t0 N31}
w = t2 t0 t0 t2 t0 t2 t2 t1 t0

CYK:
G = <N, T, S, P>
N = {N2, X1, X2, Y1, Y2, N4}
T = {t0, t1}
S = N4
P = {N4 -> t0, Y1 -> t0, N4 -> Y1 X1, X1 -> Y1 N4, Y2 -> t1, N4 -> Y2 Y2, N2 -> N4 X2, X2 -> Y2 Y1, N4 -> N2 N4, N4 -> t1, N4 -> N4 N4}
w = t1 t1 t1 t1 t0 t0 t0 t0 t0

Earley:
G = <N, T, S, P>
N = {S, N1}
T = {t0, t1}
S = S
P = {S -> ε, S -> t0 t0, S -> t1 t1, N1 -> S S, S -> N1 S}
w = t1 t1 t0 t0

Difficulty 10

TD:
G = <N, T, S, P>
N = {N1, N2, N21}
T = {t0, t1, t2}
S = N2
P = {N1 -> N2, N1 -> t2 t1, N2 -> t0 N2, N2 -> t2 t1 t0 N2 N1 N21, N21 -> t0 N2 N1 N21, N2 -> t2 t1 t0 N1 N21, N21 -> t0 N1 N21, N2 -> t2 t1 t0 N2 N21, N21 -> t0 N2 N21, N2 -> t2 t1 t0 N21, N21 -> t0 N21, N2 -> t0 N21, N2 -> t0 N21, N2 -> t0 N2 N21, N2 -> t0 N2 N21, N2 -> t0 N1 N21, N2 -> t0 N2 N1 N21, N2 -> N21, N2 -> t0 t0 N21}
w = t2 t1 t0 t0 t0

Earley:
G = <N, T, S, P>
N = {S, N1}
T = {t0, t1, t2}
S = S
P = {S -> ε, S -> t1 t0, S -> t2, N1 -> S S, S -> N1 S}
w = t2 t1 t0

I did some improvements to Earley based on the simple example.

I hate to break it to you, but the TD example for difficulty 10 is really difficult (and much higher than 10). The chart is highly recursive with 299 entries of which a lot of have around 10-20 trees. Nothing out of the ordinary but it is a nice example for a difficult task tbh. Maybe add it to the set of test data. Runs about 10 seconds.

The Earley 50 example above is fast now.

CYK 50 is highly recursive and therefore generates many trees. It took over a minute until I stopped execution, no idea how long it takes. Also a good candidate for the test set.

TD 50 is also highly recursive and contains item trees of arbitrary strucutes with many epsilons, but no repititions. Good example, but there is nothing I can do for you.