prysmaticlabs/prysm

Slow startup - blob cache

Closed this issue · 9 comments

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=6m9.5552129s

Is there a way to speed up this process? Some flags or something I can experiment with?

Windows with version 5.1.2

@Chaz27 can you tell us more about your system? Are you using a SSD? Which one?

When was this node first synced? It sounds like the filesystem was quite slow as prysm traversed the blob storage to remove old blobs.

@prestonvanloon

SSD is a 2TB Samsung 980 pro. I believe the node was first synced in Nov 2022.

I ran a couple more tests today. Rebooting the node after about 12 hours of running resulted in the same issue:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=6m18.895584s

I let the node sync fully, then rebooted it after about 10 seconds and got a totally different result:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=3.4767708s

So it's some kind of clean up after the node has been running for a while? Looking at the SSD performance, it sits at about 11% (20mb/sec) read when it takes that 6min load time.

@Chaz27 to make sure I understand - in these two instance when you use the term reboot:

  • *Rebooting* the node after about 12 hours of running
  • I let the node sync fully, then *rebooted* it after about 10 seconds

Does reboot here mean reboot/restart the computer/Windows, or restarting the beacon node program? If it's the latter, I'm guessing the second fast start could be due to Windows caching filesystem data in memory.

@kasey sorry, both times was simply shutting the beacon node down and starting it again. No reboot of windows in either case.

I did another test just now after my previous comment which was ~2 hours ago. Same result. Clean shutdown of beacon node console window. Load beacon node, ~6min cache warmup. Run for a couple of slots after synced to head, clean shutdown again. Load beacon node, ~3 second cache warm up.

@Chaz27 This might be windows specific behavior. Unfortunately, we are very limited with our knowledge of windows systems. We believe there may be something with the filesystem cache. Perhaps there is something we can do differently in Prysm for windows, but we would need an external contributor with knowledge of windows systems to be able to understand the root cause of the problem and propose a reasonable solution.

I found this page a bit helpful: https://learn.microsoft.com/en-us/windows/win32/fileio/file-caching

I don't think there is much that you can do differently... unless there is some windows setting?? Not sure, sorry!

@prestonvanloon No worries, how much time is normal for cache warm up on Linux?

@Chaz27 I just restarted my personal beacon node.

Oct 24 21:34:25 beacon-chain prysm.sh[60213]: time="2024-10-24 21:34:25" level=info msg="Blob filesystem cache warm-up complete." elapsed=50.003637232s prefix=filesystem  

This machine had 85213 directories in the blob folder.

Ok I found the issue after attempting to create a quick C# console app to replicate the issue. It was windows defender. I excluded the blob directory in the scan settings are got a much better result:

INFO filesystem: Blob filesystem cache warm-up complete. elapsed=32.5777581s

Now, does this open me up to any viruses coming through blobs? 😆

@Chaz27 glad to hear that successful outcome!

Now, does this open me up to any viruses coming through blobs? 😆

Stay skeptical 🕵️