gap-system/gap

Evaluation of SLPs slow

danielrademacher opened this issue · 3 comments

Observed behaviour

Evaluating SLPs over matrices seems to be quite slow. For this, note that multiplying 50x50 matrices over F_5 for 50.000 times requires 163 ms.

gap> G:=SL(50,5);
SL(50,5)
gap> for i in [1..50000] do a:=G.1*G.2; od; time;
163

On the other hand, if we have an SLP with roughly 10.000 lines and evaluate this SLP with 50x50 matrices over F_5, this is much slower:

Lines.txt

Read("Lines.txt");
slp := StraightLineProgram(lines,2);;
ResultOfStraightLineProgram(slp,GeneratorsOfGroup(SL(50,5)));;
time;
921

Even though we multiply 5 times less 50x50 matrices, the evaluation is 6 times slower than normal multiplications.

Expected behaviour

I would expect the evaluation to be much faster especially since this is basically just multiplying matrices.

Copy and paste GAP banner (to tell us about your setup)

 ┌───────┐   GAP 4.12.2 of 2022-12-18
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: aarch64-apple-darwin23-default64-kv8
 Configuration:  gmp 6.3.0, GASMAN, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.6, AutPGrp 1.11, Browse 1.8.19, 
             CaratInterface 2.3.4, CRISP 1.4.6, Cryst 4.1.25, CrystCat 1.1.10, CTblLib 1.3.4, 
             curlInterface 2.3.1, FactInt 1.6.3, FGA 1.4.0, Forms 1.2.9, GAPDoc 1.6.6, 
             genss 1.6.8, IO 4.8.0, IRREDSOL 1.4.4, LAGUNA 3.9.5, orb 4.9.0, Polenta 1.3.10, 
             Polycyclic 2.16, PrimGrp 3.4.3, RadiRoot 2.9, recog 1.4.2DEV, ResClasses 4.7.3, 
             SmallGrp 1.5.1, Sophus 1.27, SpinSym 1.5.2, TomLib 1.2.9, TransGrp 3.6.3, 
             utils 0.81

Not a fix, but for SLPs we use multiple times, we should probably use SlotUsagePattern to make evaluation a bit faster:

gap> Read("Lines.txt");
gap> slp := StraightLineProgram(lines,2);;
gap> gens:=GeneratorsOfGroup(SL(50,5));;
gap> ResultOfStraightLineProgram(slp,gens);;time;
643
gap> ResultOfStraightLineProgram(slp,gens);;time;
648

versus

gap> SlotUsagePattern(slp);;
gap> ResultOfStraightLineProgram(slp,gens);;time;
566
gap> ResultOfStraightLineProgram(slp,gens);;time;
578

It seems the SLP are in fact innocent, rather something else is going on... Compare

gap> for i in [1..13000] do x:=gens[1]*gens[2]; od; time;
59

versus

gap> x:=gens[1];; for i in [1..13000] do x:=x*gens[2];; od; time;
751

But there is more:

gap> x:=gens[1]; for i in [1..13000] do x:=x*x; od; time;
< immutable compressed matrix 50x50 over GF(5) >
53
gap> x:=gens[2]; for i in [1..13000] do x:=x*x; od; time;
< immutable compressed matrix 50x50 over GF(5) >
657

Looking at the kernel implementation the central workhorse is the function ProdVec8BitMat8Bit (in particular we only implement naive school book matrix multiplication...). In any case, it has some code in its core that avoids works for zero entries:

    bptr = CONST_BYTES_VEC8BIT(vec);
    for (i = 0; i + elts < len; i += elts, bptr++) {
        if ((byte = *bptr)) {
...

So multiplying with the identity matrix is a lot cheaper than multiplying with a "dense" matrix.

I also had a quick look at application of straight line programs in GAP's line-by-line profiler, and nothing looks unreasonable, or particularly expensive. About a third of the time is being spent in matrix multiplication, and the rest is spent going through lots of little lists in reasonable ways.