filecoin-project/lotus

Benchmark nv24 migration

rjan90 opened this issue · 7 comments

Tasks

Notes

  • "offline" and "online" mode are describe in #12432
  • Use the same "benchmark template" found in #12432 for commenting on this issue with results.
  • We should do the benchmark on a "lower powered machine" so we get a better sense of the worst case.

Spec: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 128GiB RAM, SSD.

In offline mode following the tutorial here:

./lotus-shed migrate-state --repo=/mnt/lotuschain/.lotus 24 bafy2bzacebyo2oz2sepnshikeuzncs56ehvyfme5weueebxmnuyo7gvs2tihc
REMINDER: If you are running this, you likely want to ALSO run the continuity testing tool!
2024-10-07T09:11:44.399+0200	INFO	bundle	bundle/bundle.go:60	manifest cid: bafy2bzaceakwje2hyinucrhgtsfo44p54iw4g6otbv5ghov65vajhxgntr53u
----------
completed round actual (without cache), took  1m1.439523705s
----------
completed premigration, took  45.345828372
completed round actual (with cache), took  33.843354743s

Max Memory usage was 15GiB

In online mode following the tutorial here:

Pre-Migration in "online-mode":

2024-10-08T10:02:00.208+0200	WARN	statemgr	stmgr/forks.go:250	STARTING pre-migration
2024-10-08T10:02:00.408+0200	INFO	fil-consensus	filcns/upgrades.go:2887	Creating migration jobs
2024-10-08T10:02:38.350+0200	INFO	fil-consensus	filcns/upgrades.go:2887	Done creating 3223497 migration jobs after 37.941213355s
2024-10-08T10:02:53.442+0200	WARN	statemgr	stmgr/forks.go:263	COMPLETED pre-migration	{"duration": 53.233990376}

And the actual migration in "online-mode":

2024-10-08T11:02:34.350+0200	WARN	statemgr	stmgr/forks.go:202	STARTING migration	{"height": "4335724", "from": "bafy2bzaceb4c6vqjwuagpx5j7popedibq2jquyadga3aoldftruxsulnbd3uw"}
2024-10-08T11:02:34.350+0200	INFO	fil-consensus	filcns/upgrades.go:2887	Creating migration jobs
2024-10-08T11:03:11.555+0200	INFO	fil-consensus	filcns/upgrades.go:2887	Done creating 3223746 migration jobs after 37.204443758s
2024-10-08T11:03:18.768+0200	WARN	statemgr	stmgr/forks.go:211	COMPLETED migration	{"height": "4335724", "from": "bafy2bzaceb4c6vqjwuagpx5j7popedibq2jquyadga3aoldftruxsulnbd3uw", "to": "bafy2bzaceds3g6uqjrq3ntl5hqcyt3rq4rgvzgyoqklubedhv46bslgpewgew", "duration": 44.418005785}

Max memory usage observed during the migration was 15GiB.

Spec: CPU: AMD EPYC 7F32 8-Core Processor, RAM: 512GiB, NVMe RAID

In offline mode following the tutorial here:

./lotus-shed migrate-state --repo=/mnt/nvmeraid0/daemon 24 bafy2bzaced2jgqbrv5bvoh3itk64xrnb4rbdxszbaylxjab2tkt74svkuffgw
2024-10-07T09:10:00.691Z	INFO	bundle	bundle/bundle.go:60	manifest cid: bafy2bzacecbueuzsropvqawsri27owo7isa5gp2qtluhrfsto2qg7wpgxnkba
----
completed round actual (without cache), took  27.183082707s
----
completed premigration, took  35.814424291s
completed round actual (with cache), took  24.831157465s

Max Memory usage was 15GiB

In online mode following the tutorial here:

Pre-Migration in "online-mode":

2024-10-07T09:45:00.218Z	WARN	statemgr	stmgr/forks.go:250	STARTING pre-migration
2024-10-07T09:45:00.222Z	INFO	fil-consensus	filcns/upgrades.go:2861	Creating migration jobs
2024-10-07T09:45:34.176Z	INFO	fil-consensus	filcns/upgrades.go:2861	Done creating 3222436 migration jobs after 33.953968051s
2024-10-07T09:45:44.846Z	WARN	statemgr	stmgr/forks.go:263	COMPLETED pre-migration	{"duration": 44.627655087}

And the actual migration in "online-mode":

2024-10-07T10:45:33.603Z	WARN	statemgr	stmgr/forks.go:202	STARTING migration	{"height": "4333050", "from": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq"}
2024-10-07T10:45:39.432Z	INFO	fil-consensus	filcns/upgrades.go:2861	Creating migration jobs
2024-10-07T10:46:00.441Z	INFO	fil-consensus	filcns/upgrades.go:2861	Done creating 3222582 migration jobs after 21.009787798s
2024-10-07T10:46:06.854Z	WARN	statemgr	stmgr/forks.go:211	COMPLETED migration	{"height": "4333050", "from": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq", "to": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq", "duration": 33.25062751}

Max memory usage observed during the migration was 15GiB.

Param finalization: based on time and memory usage of the experiments, we need to settle on the correct number of and epochs for premigration

Given the current numbers one the split store enabled nodes, we could potentially drop the pre-migration epoch to be 60 epochs before the network upgrade. But given that we have not tested this on an archival node, I feel like the default 120 epochs is fine.

Expected durations (premigration and migration) and memory requirements are added to Lotus CHANGELOG.

I have added the expected durations and memory requirements to the Changelog in the v1.30.0-rc1 prep here: #12564

Calibration migration benchmark with Spec: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 128GiB RAM, SSD.

In offline mode following the tutorial here:

migration height  2035235
old cid  bafy2bzacedk7jubgu3c33nflomgphmto2ajec7edcvrszuedoenzv6woaibe6
new cid  bafy2bzacecza56ewlqcpqcf7oz2nwcveiigxgcbsupuv46qyrza2fp7hejiks
completed round actual (without cache), took  1.25078817s
---
completed premigration, took  1.520957691s
completed round actual (with cache), took  1.07015202s
---

In online mode following the tutorial here:

Pre-Migration in "online-mode":

2024-10-08T10:44:00.029Z	WARN	statemgr	stmgr/forks.go:250	STARTING pre-migration
2024-10-08T10:44:00.075Z	INFO	fil-consensus	filcns/upgrades.go:2887	Creating migration jobs
2024-10-08T10:44:01.289Z	INFO	fil-consensus	filcns/upgrades.go:2887	Done creating 142017 migration jobs after 1.213533295s
2024-10-08T10:44:01.616Z	WARN	statemgr	stmgr/forks.go:263	COMPLETED pre-migration	{"duration": 1.586977954}

And the actual migration in "online-mode":

2024-10-08T11:44:46.761Z	WARN	statemgr	stmgr/forks.go:202	STARTING migration	{"height": "2035382", "from": "bafy2bzacedfcyxdwxtqkxp3bndnxmbreqavqgjx3rxxfykymxqne2aogh6zte"}
2024-10-08T11:44:46.761Z	INFO	fil-consensus	filcns/upgrades.go:2887	Creating migration jobs
2024-10-08T11:44:47.602Z	INFO	fil-consensus	filcns/upgrades.go:2887	Done creating 142080 migration jobs after 840.562052ms
2024-10-08T11:44:47.798Z	WARN	statemgr	stmgr/forks.go:211	COMPLETED migration	{"height": "2035382", "from": "bafy2bzacedfcyxdwxtqkxp3bndnxmbreqavqgjx3rxxfykymxqne2aogh6zte", "to": "bafy2bzacebwatu7bvobn65nkuhgvrn3jvkjyiwrrvim4brbn5knl4vr6k2hym", "duration": 1.036916135}

Closing this ticket as all the tasks has been completed.