Evaluation of SLPs slow
danielrademacher opened this issue · 3 comments
Observed behaviour
Evaluating SLPs over matrices seems to be quite slow. For this, note that multiplying 50x50 matrices over F_5 for 50.000 times requires 163 ms.
gap> G:=SL(50,5);
SL(50,5)
gap> for i in [1..50000] do a:=G.1*G.2; od; time;
163
On the other hand, if we have an SLP with roughly 10.000 lines and evaluate this SLP with 50x50 matrices over F_5, this is much slower:
Read("Lines.txt");
slp := StraightLineProgram(lines,2);;
ResultOfStraightLineProgram(slp,GeneratorsOfGroup(SL(50,5)));;
time;
921
Even though we multiply 5 times less 50x50 matrices, the evaluation is 6 times slower than normal multiplications.
Expected behaviour
I would expect the evaluation to be much faster especially since this is basically just multiplying matrices.
Copy and paste GAP banner (to tell us about your setup)
┌───────┐ GAP 4.12.2 of 2022-12-18
│ GAP │ https://www.gap-system.org
└───────┘ Architecture: aarch64-apple-darwin23-default64-kv8
Configuration: gmp 6.3.0, GASMAN, readline
Loading the library and packages ...
Packages: AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.6, AutPGrp 1.11, Browse 1.8.19,
CaratInterface 2.3.4, CRISP 1.4.6, Cryst 4.1.25, CrystCat 1.1.10, CTblLib 1.3.4,
curlInterface 2.3.1, FactInt 1.6.3, FGA 1.4.0, Forms 1.2.9, GAPDoc 1.6.6,
genss 1.6.8, IO 4.8.0, IRREDSOL 1.4.4, LAGUNA 3.9.5, orb 4.9.0, Polenta 1.3.10,
Polycyclic 2.16, PrimGrp 3.4.3, RadiRoot 2.9, recog 1.4.2DEV, ResClasses 4.7.3,
SmallGrp 1.5.1, Sophus 1.27, SpinSym 1.5.2, TomLib 1.2.9, TransGrp 3.6.3,
utils 0.81
Not a fix, but for SLPs we use multiple times, we should probably use SlotUsagePattern
to make evaluation a bit faster:
gap> Read("Lines.txt");
gap> slp := StraightLineProgram(lines,2);;
gap> gens:=GeneratorsOfGroup(SL(50,5));;
gap> ResultOfStraightLineProgram(slp,gens);;time;
643
gap> ResultOfStraightLineProgram(slp,gens);;time;
648
versus
gap> SlotUsagePattern(slp);;
gap> ResultOfStraightLineProgram(slp,gens);;time;
566
gap> ResultOfStraightLineProgram(slp,gens);;time;
578
It seems the SLP are in fact innocent, rather something else is going on... Compare
gap> for i in [1..13000] do x:=gens[1]*gens[2]; od; time;
59
versus
gap> x:=gens[1];; for i in [1..13000] do x:=x*gens[2];; od; time;
751
But there is more:
gap> x:=gens[1]; for i in [1..13000] do x:=x*x; od; time;
< immutable compressed matrix 50x50 over GF(5) >
53
gap> x:=gens[2]; for i in [1..13000] do x:=x*x; od; time;
< immutable compressed matrix 50x50 over GF(5) >
657
Looking at the kernel implementation the central workhorse is the function ProdVec8BitMat8Bit
(in particular we only implement naive school book matrix multiplication...). In any case, it has some code in its core that avoids works for zero entries:
bptr = CONST_BYTES_VEC8BIT(vec);
for (i = 0; i + elts < len; i += elts, bptr++) {
if ((byte = *bptr)) {
...
So multiplying with the identity matrix is a lot cheaper than multiplying with a "dense" matrix.
I also had a quick look at application of straight line programs in GAP's line-by-line profiler, and nothing looks unreasonable, or particularly expensive. About a third of the time is being spent in matrix multiplication, and the rest is spent going through lots of little lists in reasonable ways.