Performance issues causing CI builds to fail

Question

Performance issues causing CI builds to fail

Closed this issue 4 years ago · 7 comments

This pipeline, for instance, fails because it runs out of memory/time trying to compile parts of the standard library. Known culprits include:

~~theories/Init/Byte.v~~
~~theories/Strings/Byte.v~~
theories/setoid_ring/Field_theory.v

The performance hit may be caused by the extra checks in Typeops.execute, in particular for fixpoints, and possibly due to all the reductions that are done for fixpoint types.

EDIT: The reduction of fixpoint types should only be done once per fixpoint right now. Maybe #7 has something to do with this.

Answer 1 · 2020-05-29T21:43:22.000Z

Some profiling data for theories/Init/Byte.v without the to_bits_of_bits lemma:

Function name             Own time Tot. time  Own alloc Tot. alloc     Calls 
execute                      2.08     15.81    9672585  936866856    269 186514
Constraints.union            6.69      6.69  352449014  352449014  77070      0
next_annots                  4.06      4.06  511612517  511612517   8471      0
check_branch_types           0.18      3.66      68391  144180532    387  19646
conv_leq                     1.29      2.20   51578767   59492714  22433  57817
type_of_apply                0.30      1.35    3155051   20589385   9109  83037
add_constraint_from_ind      0.56      0.98          0    8255406  22672  38288
others                       5.26     21.07 1164744796 2101611658        192786
Est. overhead/total          9.83     30.90   13181091 2114792749
Time in seconds and allocation in words (1 word = 8 bytes)
make[1]: Leaving directory 'coq'
make theories/Init/Byte.vo  188.31s user 19.41s system 98% cpu 3:31.75 total

Answer 2 · 2020-05-30T18:53:35.000Z

Looks like a lot of the performance issues come from how constraints are combined. Constraints.union creates a new graph and adds every edge in the list of graphs to it, which is inefficient. Maybe we could collect all the edges in a Set, then transform the set into a graph when needed.

UPDATE: aaa90d3 fixes this particular problem with next_annots and union_list, but there seems to be other sources of performance issues based on the unchanged CI failure.

UPDATE: That commit does not fix the problem. Set union is also slow:

COQC -noinit theories/Init/Byte.v

Function name             Own time Tot. time  Own alloc Tot. alloc     Calls 
Constraints.union           17.76     17.76 2150241850 2150241850 296937      0
State.next_annots            4.21      4.21  684627588  684627588  21879      0
Constraints.add              2.15      2.15   32916866   32916866 342921      0
State.next                   0.44      0.44    9382117    9382117  69779      0
annots_to_svars              0.19      0.19     185940     185940  34011      0
RecCheck.rec_check           0.00      0.00          0          0      0      0
RecCheck.closure             0.00      0.00          0          0      0      0
RecCheck.bellman_ford        0.00      0.00          0          0      0      0
RecCheck.sub                 0.00      0.00          0          0      0      0
RecCheck.sup                 0.00      0.00          0          0      0      0
RecCheck.contains            0.00      0.00          0          0      0      0
RecCheck.remove              0.00      0.00          0          0      0      0
RecCheck.insert              0.00      0.00          0          0      0      0
RecCheck.of_graph            0.00      0.00          0          0      0      0
RecCheck.to_graph            0.00      0.00          0          0      0      0
Constraints.union_list       0.00      0.00          0          0      0      0
SVars.union_list             0.00      0.00          0          0      0      0
others                      26.16     50.89 3539586863 6416941224        765527
Est. overhead/total         35.97     86.86   50409256 6467350480
Time in seconds and allocation in words (1 word = 8 bytes)
make[1]: Leaving directory 'coq'

Answer 3 · 2020-05-30T21:49:21.000Z

Are you reusing some existing representation for constraints? I know Coq must have one for, e.g., unification constraints in type inference.

Answer 4 · 2020-05-30T21:57:41.000Z

No, I'm using a simple Set on int * int * int (during constraint collection), and later (during actual cycle checking) a graph implementation I lifted from ocamlgraph. I'll look into what Coq uses for type constraints, but I don't think anything existing will be exactly what I need, since I need a graph that's both weighted and directed. Universe levels, for instance, use only an acyclic graph (no weights), but that graph module seems pretty coupled with universes, rather than being a general-purpose graph.

Answer 5 · 2020-05-30T22:04:37.000Z

Right. I think we might have talked about this in a meeting sometime and I forgot. Still might be worth seeing if there's a pattern to abstract/steal from what Coq already does. I imagine they've tried to optimize it. (...assuming I didn't already suggest that and it didn't pan out)

Answer 6 · 2020-06-02T02:28:23.000Z

69643cf fixes most performance issues with Byte.v (I'm no longer too impatient to wait for them to compile), but Field_theory.v still runs out of memory. It seems like this occurs during Qed. in Fnorm_FEeval_PEeval.

Answer 7 · 2020-06-10T02:52:58.000Z

d9b3922 is a workaround for the memory issue, mainly caused by constants having a lot of stage annotations (each split used within Field_theory.v, for instance, is annotated with 1064 stages). It does not give Consts any stage annotations if sized typing is off. The consequences of this and an example of its effects can be found here.

There's still the issue of whether the same should be done for Vars. In the kernel, Vars are implemented similarly to Rels, but on the user side, they're used as Lets, similarly to Definitions (which are Consts). Semantically, Lets are similar to lets, but syntactically, they look at lot more like global definitions. In terms of memory, it's fine not to remove stage annotations for Vars.