Performance issues causing CI builds to fail
Closed this issue · 7 comments
This pipeline, for instance, fails because it runs out of memory/time trying to compile parts of the standard library. Known culprits include:
theories/Init/Byte.v
theories/Strings/Byte.v
theories/setoid_ring/Field_theory.v
The performance hit may be caused by the extra checks in Typeops.execute
, in particular for fixpoints, and possibly due to all the reductions that are done for fixpoint types.
EDIT: The reduction of fixpoint types should only be done once per fixpoint right now. Maybe #7 has something to do with this.
Some profiling data for theories/Init/Byte.v
without the to_bits_of_bits
lemma:
Function name Own time Tot. time Own alloc Tot. alloc Calls
execute 2.08 15.81 9672585 936866856 269 186514
Constraints.union 6.69 6.69 352449014 352449014 77070 0
next_annots 4.06 4.06 511612517 511612517 8471 0
check_branch_types 0.18 3.66 68391 144180532 387 19646
conv_leq 1.29 2.20 51578767 59492714 22433 57817
type_of_apply 0.30 1.35 3155051 20589385 9109 83037
add_constraint_from_ind 0.56 0.98 0 8255406 22672 38288
others 5.26 21.07 1164744796 2101611658 192786
Est. overhead/total 9.83 30.90 13181091 2114792749
Time in seconds and allocation in words (1 word = 8 bytes)
make[1]: Leaving directory 'coq'
make theories/Init/Byte.vo 188.31s user 19.41s system 98% cpu 3:31.75 total
Looks like a lot of the performance issues come from how constraints are combined. Constraints.union
creates a new graph and adds every edge in the list of graphs to it, which is inefficient. Maybe we could collect all the edges in a Set
, then transform the set into a graph when needed.
UPDATE: aaa90d3 fixes this particular problem with next_annots
and union_list
, but there seems to be other sources of performance issues based on the unchanged CI failure.
UPDATE: That commit does not fix the problem. Set union is also slow:
COQC -noinit theories/Init/Byte.v
Function name Own time Tot. time Own alloc Tot. alloc Calls
Constraints.union 17.76 17.76 2150241850 2150241850 296937 0
State.next_annots 4.21 4.21 684627588 684627588 21879 0
Constraints.add 2.15 2.15 32916866 32916866 342921 0
State.next 0.44 0.44 9382117 9382117 69779 0
annots_to_svars 0.19 0.19 185940 185940 34011 0
RecCheck.rec_check 0.00 0.00 0 0 0 0
RecCheck.closure 0.00 0.00 0 0 0 0
RecCheck.bellman_ford 0.00 0.00 0 0 0 0
RecCheck.sub 0.00 0.00 0 0 0 0
RecCheck.sup 0.00 0.00 0 0 0 0
RecCheck.contains 0.00 0.00 0 0 0 0
RecCheck.remove 0.00 0.00 0 0 0 0
RecCheck.insert 0.00 0.00 0 0 0 0
RecCheck.of_graph 0.00 0.00 0 0 0 0
RecCheck.to_graph 0.00 0.00 0 0 0 0
Constraints.union_list 0.00 0.00 0 0 0 0
SVars.union_list 0.00 0.00 0 0 0 0
others 26.16 50.89 3539586863 6416941224 765527
Est. overhead/total 35.97 86.86 50409256 6467350480
Time in seconds and allocation in words (1 word = 8 bytes)
make[1]: Leaving directory 'coq'
Are you reusing some existing representation for constraints? I know Coq must have one for, e.g., unification constraints in type inference.
No, I'm using a simple Set
on int * int * int
(during constraint collection), and later (during actual cycle checking) a graph implementation I lifted from ocamlgraph
. I'll look into what Coq uses for type constraints, but I don't think anything existing will be exactly what I need, since I need a graph that's both weighted and directed. Universe levels, for instance, use only an acyclic graph (no weights), but that graph module seems pretty coupled with universes, rather than being a general-purpose graph.
Right. I think we might have talked about this in a meeting sometime and I forgot. Still might be worth seeing if there's a pattern to abstract/steal from what Coq already does. I imagine they've tried to optimize it. (...assuming I didn't already suggest that and it didn't pan out)
69643cf fixes most performance issues with Byte.v
(I'm no longer too impatient to wait for them to compile), but Field_theory.v
still runs out of memory. It seems like this occurs during Qed.
in Fnorm_FEeval_PEeval
.
d9b3922 is a workaround for the memory issue, mainly caused by constants having a lot of stage annotations (each split
used within Field_theory.v
, for instance, is annotated with 1064 stages). It does not give Const
s any stage annotations if sized typing is off. The consequences of this and an example of its effects can be found here.
There's still the issue of whether the same should be done for Var
s. In the kernel, Var
s are implemented similarly to Rel
s, but on the user side, they're used as Let
s, similarly to Definition
s (which are Const
s). Semantically, Let
s are similar to let
s, but syntactically, they look at lot more like global definitions. In terms of memory, it's fine not to remove stage annotations for Var
s.