Benchmark nv24 migration
rjan90 opened this issue · 7 comments
Notes
Spec: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 128GiB RAM, SSD.
In offline mode following the tutorial here:
./lotus-shed migrate-state --repo=/mnt/lotuschain/.lotus 24 bafy2bzacebyo2oz2sepnshikeuzncs56ehvyfme5weueebxmnuyo7gvs2tihc
REMINDER: If you are running this, you likely want to ALSO run the continuity testing tool!
2024-10-07T09:11:44.399+0200 INFO bundle bundle/bundle.go:60 manifest cid: bafy2bzaceakwje2hyinucrhgtsfo44p54iw4g6otbv5ghov65vajhxgntr53u
----------
completed round actual (without cache), took 1m1.439523705s
----------
completed premigration, took 45.345828372
completed round actual (with cache), took 33.843354743s
Max Memory usage was 15GiB
In online mode following the tutorial here:
Pre-Migration in "online-mode":
2024-10-08T10:02:00.208+0200 WARN statemgr stmgr/forks.go:250 STARTING pre-migration
2024-10-08T10:02:00.408+0200 INFO fil-consensus filcns/upgrades.go:2887 Creating migration jobs
2024-10-08T10:02:38.350+0200 INFO fil-consensus filcns/upgrades.go:2887 Done creating 3223497 migration jobs after 37.941213355s
2024-10-08T10:02:53.442+0200 WARN statemgr stmgr/forks.go:263 COMPLETED pre-migration {"duration": 53.233990376}
And the actual migration in "online-mode":
2024-10-08T11:02:34.350+0200 WARN statemgr stmgr/forks.go:202 STARTING migration {"height": "4335724", "from": "bafy2bzaceb4c6vqjwuagpx5j7popedibq2jquyadga3aoldftruxsulnbd3uw"}
2024-10-08T11:02:34.350+0200 INFO fil-consensus filcns/upgrades.go:2887 Creating migration jobs
2024-10-08T11:03:11.555+0200 INFO fil-consensus filcns/upgrades.go:2887 Done creating 3223746 migration jobs after 37.204443758s
2024-10-08T11:03:18.768+0200 WARN statemgr stmgr/forks.go:211 COMPLETED migration {"height": "4335724", "from": "bafy2bzaceb4c6vqjwuagpx5j7popedibq2jquyadga3aoldftruxsulnbd3uw", "to": "bafy2bzaceds3g6uqjrq3ntl5hqcyt3rq4rgvzgyoqklubedhv46bslgpewgew", "duration": 44.418005785}
Max memory usage observed during the migration was 15GiB.
Spec: CPU: AMD EPYC 7F32 8-Core Processor, RAM: 512GiB, NVMe RAID
In offline mode following the tutorial here:
./lotus-shed migrate-state --repo=/mnt/nvmeraid0/daemon 24 bafy2bzaced2jgqbrv5bvoh3itk64xrnb4rbdxszbaylxjab2tkt74svkuffgw
2024-10-07T09:10:00.691Z INFO bundle bundle/bundle.go:60 manifest cid: bafy2bzacecbueuzsropvqawsri27owo7isa5gp2qtluhrfsto2qg7wpgxnkba
----
completed round actual (without cache), took 27.183082707s
----
completed premigration, took 35.814424291s
completed round actual (with cache), took 24.831157465s
Max Memory usage was 15GiB
In online mode following the tutorial here:
Pre-Migration in "online-mode":
2024-10-07T09:45:00.218Z WARN statemgr stmgr/forks.go:250 STARTING pre-migration
2024-10-07T09:45:00.222Z INFO fil-consensus filcns/upgrades.go:2861 Creating migration jobs
2024-10-07T09:45:34.176Z INFO fil-consensus filcns/upgrades.go:2861 Done creating 3222436 migration jobs after 33.953968051s
2024-10-07T09:45:44.846Z WARN statemgr stmgr/forks.go:263 COMPLETED pre-migration {"duration": 44.627655087}
And the actual migration in "online-mode":
2024-10-07T10:45:33.603Z WARN statemgr stmgr/forks.go:202 STARTING migration {"height": "4333050", "from": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq"}
2024-10-07T10:45:39.432Z INFO fil-consensus filcns/upgrades.go:2861 Creating migration jobs
2024-10-07T10:46:00.441Z INFO fil-consensus filcns/upgrades.go:2861 Done creating 3222582 migration jobs after 21.009787798s
2024-10-07T10:46:06.854Z WARN statemgr stmgr/forks.go:211 COMPLETED migration {"height": "4333050", "from": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq", "to": "bafy2bzacecbsegqp2rzlx2cvaleo3fdgurw32n2meo7ac23iddtlhsbdvooeq", "duration": 33.25062751}
Max memory usage observed during the migration was 15GiB.
Param finalization: based on time and memory usage of the experiments, we need to settle on the correct number of and epochs for premigration
Given the current numbers one the split store enabled nodes, we could potentially drop the pre-migration epoch to be 60 epochs before the network upgrade. But given that we have not tested this on an archival node, I feel like the default 120 epochs is fine.
Expected durations (premigration and migration) and memory requirements are added to Lotus CHANGELOG.
I have added the expected durations and memory requirements to the Changelog in the v1.30.0-rc1 prep here: #12564
@rjan90 : I pasted in the suggested issue tasks as directed from https://docs.google.com/document/d/1KKJj2COb0vIqAQh-4F7fflJjxiTgGrU9oT9B461nsgg/edit (which I backported to https://docs.google.com/document/d/1-KVWo7O_WwdalherQzvAfU9tWOUizemV_0NNJgAWmPo/edit)
Calibration migration benchmark with Spec: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 128GiB RAM, SSD.
In offline mode following the tutorial here:
migration height 2035235
old cid bafy2bzacedk7jubgu3c33nflomgphmto2ajec7edcvrszuedoenzv6woaibe6
new cid bafy2bzacecza56ewlqcpqcf7oz2nwcveiigxgcbsupuv46qyrza2fp7hejiks
completed round actual (without cache), took 1.25078817s
---
completed premigration, took 1.520957691s
completed round actual (with cache), took 1.07015202s
---
In online mode following the tutorial here:
Pre-Migration in "online-mode":
2024-10-08T10:44:00.029Z WARN statemgr stmgr/forks.go:250 STARTING pre-migration
2024-10-08T10:44:00.075Z INFO fil-consensus filcns/upgrades.go:2887 Creating migration jobs
2024-10-08T10:44:01.289Z INFO fil-consensus filcns/upgrades.go:2887 Done creating 142017 migration jobs after 1.213533295s
2024-10-08T10:44:01.616Z WARN statemgr stmgr/forks.go:263 COMPLETED pre-migration {"duration": 1.586977954}
And the actual migration in "online-mode":
2024-10-08T11:44:46.761Z WARN statemgr stmgr/forks.go:202 STARTING migration {"height": "2035382", "from": "bafy2bzacedfcyxdwxtqkxp3bndnxmbreqavqgjx3rxxfykymxqne2aogh6zte"}
2024-10-08T11:44:46.761Z INFO fil-consensus filcns/upgrades.go:2887 Creating migration jobs
2024-10-08T11:44:47.602Z INFO fil-consensus filcns/upgrades.go:2887 Done creating 142080 migration jobs after 840.562052ms
2024-10-08T11:44:47.798Z WARN statemgr stmgr/forks.go:211 COMPLETED migration {"height": "2035382", "from": "bafy2bzacedfcyxdwxtqkxp3bndnxmbreqavqgjx3rxxfykymxqne2aogh6zte", "to": "bafy2bzacebwatu7bvobn65nkuhgvrn3jvkjyiwrrvim4brbn5knl4vr6k2hym", "duration": 1.036916135}
Closing this ticket as all the tasks has been completed.