Nixprof is a tool for profiling and visualizing build times spent on a Nix build graph.
Nixprof requires Nix 2.4+ for its internal-json
logging format and some nix
command functionality
...so we may as well use flakes
nix shell github:Kha/nixprof
First you need to record Nix building the desired graph once:
nixprof record nix-build ... # or `nix build ...`
This will save the Nix log annotated with timestamps in nixprof.log
.
You can use -o
to select a different output filename, which I like to use to record the specific commit and number of cores:
nixprof record -o $(git rev-parse @).json.16 nix build .#lean-all --option max-jobs 16
Make sure that the parts you are interested in are actually (re)built (and not fetched from a substituter), e.g. by introducing trivial changes in the source.
echo "-- $(date)" >> src/Init/Prelude.lean
Now you can use nixprof report
on that file to create various reports:
Usage: nixprof report [OPTIONS]
Options:
-i, --in FILENAME log input filename
-t, --tred remove transitive edges (can speed up and
declutter dot graph display)
-p, --print-crit-path print critical (longest) path
-a, --print-avg-crit print average contribution to critical paths
-s, --print-sim-times print simulated build times by processor count
up to optimal count
-d, --save-dot FILENAME write dot graph to file
-c, --save-chrome-trace TEXT write `chrome://tracing`'s `trace_event`
format to file. When combined with `-s`, also
write simulated traces to files with processor
count as suffix.
--all print all analyses, write all output files
--merge-into-pred TEXT for each derivation with exactly one
predecessor (dependency) and whose name
matches the given regex, merge build time and
dependents into that predecessor
--merge-into-succ TEXT for each derivation with exactly one successor
(dependent) and whose name matches the given
regex, merge build time and dependencies into
that successor
--help Show this message and exit.Usage: nixprof report [OPTIONS]
NOTE: In addition to the input log file, report
will also use nix path-info --json
to recover the dependency edges, so the build results still need to be available in the Nix store when running Nixprof.
The following examples are from a flake build of leanprover/lean4.
The critical path answers the question "Given sufficient parallelism, what is the chain of build steps that determines the total build time?"
$ nixprof report -p
Critical path
time [s] [cum] drv
---------- ---- ------- ------ ---------------------------------------------
1.6 1.3% 1.6 1.3% Init.Prelude
3.6 2.9% 5.2 4.3% Init.Notation
0.1 0.1% 5.3 4.3% Init.SizeOf
1.1 0.9% 6.4 5.3% Init.Core
0.2 0.2% 6.6 5.4% Init.SimpLemmas
0.7 0.6% 7.4 6.0% Init.Data.Nat.Basic
0.2 0.2% 7.6 6.2% Init.WF
0.2 0.2% 7.8 6.4% Init.Data.Nat.Div
0.1 0.1% 7.9 6.5% Init.Data.Nat.Bitwise
0.2 0.2% 8.1 6.7% Init.Data.Fin.Basic
0.4 0.3% 8.5 7.0% Init.Data.UInt
0.1 0.1% 8.6 7.1% Init.Data.Char.Basic
0.6 0.5% 9.2 7.6% Init.Data.String.Basic
0.5 0.4% 9.7 8.0% Init.Data.Format.Basic
0.4 0.3% 10.1 8.2% Init.Data.Repr
0.2 0.2% 10.3 8.4% Init.Data.ToString.Basic
0.1 0.1% 10.4 8.5% Init.Util
1.3 1.1% 11.7 9.6% Init.Data.Array.Basic
2.1 1.7% 13.8 11.3% Init.Meta
0.1 0.1% 13.9 11.4% Init.Data.ToString.Macro
0.1 0.1% 13.9 11.4% Init.Data.ToString
1.4 1.2% 15.4 12.6% Init.NotationExtra
0.3 0.2% 15.6 12.8% Init.Classical
0.2 0.2% 15.8 13.0% Init.System.ST
0.9 0.8% 16.7 13.7% Init.System.IO
0.1 0.1% 16.9 13.8% Init.Control.StateRef
1.3 1.1% 18.2 14.9% Init.Control.Lawful
0.2 0.2% 18.4 15.1% Init.Control.StateCps
0.1 0.1% 18.5 15.2% Init.Control
0.1 0.1% 18.6 15.3% Init
0.8 0.6% 19.4 15.9% Std.Data.AssocList
0.7 0.6% 20.1 16.5% Std.Data.HashMap
0.6 0.5% 20.7 16.9% Lean.Data.SMap
0.4 0.3% 21.0 17.2% Lean.Data.SSet
0.5 0.4% 21.5 17.6% Lean.Data.Name
0.4 0.3% 21.9 18.0% Lean.Data.KVMap
0.4 0.4% 22.4 18.3% Lean.Data.Options
0.2 0.2% 22.6 18.5% Lean.Data.Format
0.4 0.3% 23.0 18.8% Lean.Hygiene
1.2 0.9% 24.1 19.8% Lean.Level
2.3 1.9% 26.5 21.7% Lean.Expr
1.3 1.1% 27.8 22.8% Lean.LocalContext
4.2 3.4% 32.0 26.3% Lean.MetavarContext
0.2 0.2% 32.2 26.4% Lean.Util.PPExt
0.8 0.6% 33.0 27.1% Lean.Message
0.3 0.2% 33.3 27.3% Lean.Exception
0.8 0.7% 34.1 28.0% Lean.MonadEnv
0.6 0.5% 34.7 28.4% Lean.Util.Trace
0.5 0.4% 35.2 28.9% Lean.CoreM
1.1 0.9% 36.3 29.8% Lean.Attributes
0.7 0.6% 37.0 30.3% Lean.Class
3.6 2.9% 40.6 33.3% Lean.Meta.Basic
2.9 2.3% 43.4 35.6% Lean.Meta.InferType
3.8 3.1% 47.2 38.7% Lean.Meta.LevelDefEq
3.5 2.8% 50.7 41.6% Lean.Meta.WHNF
1.7 1.4% 52.4 43.0% Lean.Meta.DiscrTree
0.8 0.6% 53.2 43.6% Lean.Meta.Instances
2.8 2.3% 56.0 45.9% Lean.Meta.SynthInstance
2.1 1.7% 58.1 47.6% Lean.Meta.AppBuilder
1.1 0.9% 59.2 48.5% Lean.Meta.Offset
7.2 5.9% 66.3 54.4% Lean.Meta.ExprDefEq
2.4 2.0% 68.7 56.3% Lean.Meta.RecursorInfo
2.4 2.0% 71.1 58.3% Lean.Meta.Tactic.Induction
2.6 2.1% 73.7 60.4% Lean.Meta.Tactic.Cases
1.3 1.1% 75.0 61.5% Lean.Meta.Tactic.Contradiction
5.3 4.3% 80.3 65.8% Lean.Meta.Match.Match
3.4 2.8% 83.7 68.6% Lean.Meta.Match.MatchEqs
1.2 1.0% 84.9 69.6% Lean.Meta.Tactic.Split
0.3 0.3% 85.2 69.9% Lean.Meta.Tactic
5.6 4.6% 90.9 74.5% Lean.Meta.IndPredBelow
0.5 0.4% 91.4 74.9% Lean.Meta
8.3 6.8% 99.7 81.7% Lean.PrettyPrinter.Delaborator.TopDownAnalyze
1.8 1.5% 101.5 83.2% Lean.PrettyPrinter.Delaborator.Basic
7.2 5.9% 108.7 89.1% Lean.PrettyPrinter.Delaborator.Builtins
0.2 0.2% 108.9 89.3% Lean.PrettyPrinter.Delaborator
0.4 0.4% 109.3 89.7% Lean.PrettyPrinter
0.7 0.6% 110.1 90.2% Lean.Widget.InteractiveCode
1.7 1.4% 111.8 91.7% Lean.Widget.InteractiveGoal
1.2 1.0% 113.0 92.6% Lean.Widget.InteractiveDiagnostic
0.5 0.4% 113.5 93.1% Lean.Server.Snapshots
0.4 0.3% 113.9 93.4% Lean.Server.FileWorker.Utils
0.5 0.4% 114.5 93.8% Lean.Server.Requests
2.6 2.1% 117.0 96.0% Lean.Server.FileWorker.RequestHandling
0.9 0.8% 117.9 96.7% Lean.Server.FileWorker.WidgetRequests
1.5 1.3% 119.5 98.0% Lean.Server.FileWorker
0.3 0.2% 119.7 98.2% Lean.Server
0.3 0.3% 120.1 98.4% Lean
0.0 0.0% 120.1 98.5% Lean-cc
0.1 0.1% 120.2 98.5% Lean-lib
1.0 0.8% 121.1 99.3% leanshared
0.3 0.2% 121.4 99.6% leanc
0.1 0.0% 121.5 99.6% leanc
0.3 0.3% 121.8 99.9% lean
0.1 0.1% 121.9 99.9% lean-stage1
0.1 0.1% 122.0 100.0% lean
This chain gives you an idea of how to speed up the build by speeding up or avoiding an element of it. However, there is no guarantee that this path is still the critical one after optimization, so the speedup might be less than expected.
The critical path demonstrates contributions to the total build time, but what about partial rebuilds? If we assume that the sources of every derivation in the build graph are equally likely to be modified, we can compute the average contribution to a partial rebuild by averaging contributions to the critical paths starting from each derivation.
$ nixprof report -a
Average contribution to critical paths
time [s] [cum] drv
---------- ---- ------- ------ ---------------------------------------------
1.9 8.1% 1.9 8.1% Lean.PrettyPrinter.Delaborator.TopDownAnalyze
1.7 7.1% 3.6 15.2% Lean.PrettyPrinter.Delaborator.Builtins
1.2 5.1% 4.8 20.2% Lean.Meta.IndPredBelow
1.1 4.7% 5.9 25.0% Lean.Meta.ExprDefEq
1.0 4.2% 6.9 29.2% Lean.Meta.Match.Match
0.9 3.6% 7.7 32.8% leanshared
0.7 2.9% 8.4 35.7% Lean.Meta.Match.MatchEqs
0.7 2.8% 9.0 38.5% Lean.Server.FileWorker.RequestHandling
0.5 2.2% 9.6 40.7% Lean.Meta.LevelDefEq
0.5 2.1% 10.1 42.9% Lean.Meta.WHNF
0.5 2.0% 10.5 44.9% Lean.Meta.Basic
0.4 1.8% 11.0 46.7% Lean.Meta.Tactic.Cases
0.4 1.8% 11.4 48.5% Lean.MetavarContext
0.4 1.8% 11.8 50.3% Lean.PrettyPrinter.Delaborator.Basic
0.4 1.8% 12.2 52.1% Lean.Widget.InteractiveGoal
0.4 1.8% 12.6 53.9% Lean.Meta.SynthInstance
0.4 1.7% 13.0 55.6% Lean.Server.FileWorker
0.4 1.6% 13.4 57.2% Lean.Meta.Tactic.Induction
0.4 1.6% 13.8 58.8% Lean.Meta.InferType
0.4 1.6% 14.2 60.4% Lean.Meta.RecursorInfo
0.4 1.5% 14.5 62.0% Lean.Elab.Structure
0.3 1.4% 14.9 63.4% Lean.Meta.AppBuilder
0.3 1.3% 15.2 64.6% Lean.Widget.InteractiveDiagnostic
0.3 1.3% 15.5 65.9% Lean.Elab.MutualDef
0.3 1.3% 15.8 67.1% lean
0.3 1.2% 16.0 68.3% Lean.Elab.Inductive
0.3 1.1% 16.3 69.4% leanc
0.3 1.1% 16.6 70.5% Lean.Elab.Term
0.2 1.0% 16.8 71.6% Lean.Meta.DiscrTree
0.2 1.0% 17.0 72.6% Lean.Meta.Tactic.Split
0.2 1.0% 17.3 73.6% Lean.Server.FileWorker.WidgetRequests
0.2 1.0% 17.5 74.6% Lean.Expr
0.2 1.0% 17.7 75.6% Lean.Meta.Tactic.Contradiction
0.2 0.8% 17.9 76.3% Lean.Elab.Binders
0.2 0.7% 18.1 77.1% Lean.Widget.InteractiveCode
0.2 0.7% 18.2 77.8% Lean.Meta.Offset
0.2 0.7% 18.4 78.4% Lean.Elab.Command
0.1 0.6% 18.6 79.1% Lean.Elab.PreDefinition.Structural.IndPred
0.1 0.6% 18.7 79.6% Lean.Attributes
0.1 0.6% 18.8 80.2% Lean.Server.Requests
0.1 0.6% 19.0 80.8% Lean
0.1 0.6% 19.1 81.4% Lean.Server.Snapshots
0.1 0.6% 19.2 82.0% Lean.LocalContext
0.1 0.6% 19.4 82.5% Lake.Build.Package
0.1 0.5% 19.5 83.0% Lean.Elab.PreDefinition.Main
0.1 0.5% 19.6 83.6% Lean.Elab.Declaration
0.1 0.5% 19.7 84.1% Lean.Meta.Instances
0.1 0.5% 19.8 84.5% Lean.Meta
0.1 0.5% 19.9 85.0% Lean.Level
0.1 0.5% 20.1 85.5% Lean.Elab.Tactic.Basic
0.1 0.4% 20.2 85.9% Lean.Elab.SyntheticMVars
0.1 0.4% 20.3 86.4% Lean.PrettyPrinter
0.1 0.4% 20.4 86.8% Lean.Server.FileWorker.Utils
0.1 0.4% 20.5 87.2% Lean.MonadEnv
0.1 0.4% 20.5 87.6% Lean.Message
0.1 0.4% 20.6 87.9% lean-stage1
0.1 0.4% 20.7 88.3% Lean.Elab.PreDefinition.Structural.Main
0.1 0.4% 20.8 88.6% Lean.Parser.Term
0.1 0.3% 20.9 89.0% Lean.Class
0.1 0.3% 21.0 89.3% Lake.CLI.Main
0.1 0.3% 21.0 89.6% Lean-lib
0.1 0.3% 21.1 90.0% Lean.Parser.Command
0.1 0.3% 21.2 90.3% Lean.Meta.Tactic
0.1 0.3% 21.3 90.6% Lean.Server
0.1 0.3% 21.3 90.9% lean
0.1 0.3% 21.4 91.1% Lean.Elab.PreDefinition.Basic
0.1 0.3% 21.4 91.4% Lean.Util.Trace
0.1 0.3% 21.5 91.7% Std.Data.AssocList
0.1 0.3% 21.6 91.9% Lean.Elab.DefView
0.1 0.3% 21.6 92.2% Lean.CoreM
0.1 0.3% 21.7 92.5% Init.Control.Lawful
0.1 0.2% 21.8 92.7% Std.Data.HashMap
0.1 0.2% 21.8 92.9% Init.Meta
0.1 0.2% 21.9 93.2% Lean.PrettyPrinter.Delaborator
0.1 0.2% 21.9 93.4% leanc
0.1 0.2% 22.0 93.6% Lean.Parser.Syntax
0.0 0.0% 23.5 100.0% [total]
Note: the table is sorted by descending time, with items < 0.05s omitted.
This output means that, since in this build graph most derivations correspond to a single source file, if I change a random file, 1.9s out of 23.5s are spent on rebuilding Lean.PrettyPrinter.Delaborator.TopDownAnalyze
on average.
The above analyses are independent of the number of available cores or parallel jobs configured in Nix, or how many of them were used when recording; you can think of them assuming an infinitely parallel build. Nixprof can give you an idea of the parallelism of your build by simulating build times for a given number of parallel jobs based on the recorded build graph and times.
$ nixprof report -s
Simulated build times by processor count up optimal power of two
#CPUs time [s] CPU% [avg]
------- ---------- ------------
1 601.733758 100%
2 310.011294 194%
4 178.041022 338%
8 130.940682 460%
16 123.067779 489%
32 121.965845 493%
In this case, while the simulated build time continuously improves up to a number of 32 cores, we see that after 8 cores there is barely any improvement. And even then, only slightly more than half the cores are actually in use on average.
Caveats:
- We assume all build steps are single-threaded.
- There is no accounting of parallelism overhead.
- The simulation is a simple first-fit scheduling algorithm: whenever a task is finished, we schedule any ready dependencies on any free cores, using the order of the recorded build as a tie-breaker. This might not be the optimal scheduling, but as much is true for any real build.
$ nixprof report -d --tred --merge-into-succ '-cc|-depRoot' # see above for flags
# display e.g. using `xdot`
$ nix run nixpkgs#xdot nixprof.dot
# or convert to e.g. svg
$ nix shell nixpkgs#graphviz -c dot -Tsvg < nixprof.dot > nixprof.svg
Box heights are scaled by build time. The critical path is shown in red. My creativity was exhausted after that.
$ nixprof report -c nixprof.trace_event
Display in Chrome's chrome://tracing
or on perfetto.dev in any browser:
This trace shows the actual start and end times of each build step, though the assignment to threads is simulated using the same scheduling algorithm as described above. Gaps in the trace, especially at the beginning, show Nix being busy with other stuff than strictly building.
...jack of all trades, professor of nix