Provide configuration guidelines for BufferedStorageBackend settings
Opened this issue · 3 comments
What problem does your feature solve?
In #4911, we added an option for Horizon to reingest historical ledgers from the datastore using BufferedStorageBackend.
BufferedStorageBackend has a BufferSize
that specifies the number of ledgers to hold in memory and a NumWorkers
to specify the number of parallel download workers. We need to run tests and benchmark to identify the optimal configuration values.
What would you like to see?
Configuration guidelines for BufferedStorageBackend.
What alternatives are there?
Let users find the best settings through experimentation.
Reingestion times for 10,000 ledgers with various configurations of LedgersPerFile
, Buffer Size
, and Num Workers
:
LedgersPerFile | Buffer Size | Num Workers | Time Taken |
---|---|---|---|
1 | 1 | 1 | 34m58s |
1 | 5 | 1 | 30m4s |
1 | 10 | 1 | 27m50s |
1 | 100 | 1 | 20m41s |
1 | 500 | 1 | 19m42s |
1 | 5 | 5 | 17m20s |
1 | 10 | 5 | 17m20s |
1 | 100 | 5 | 17m1s |
1 | 500 | 5 | 16m55s |
1 | 10 | 10 | 18m26s |
1 | 100 | 10 | 22m2s |
1 | 500 | 10 | 19m40s |
1 | 100 | 20 | 18m10s |
1 | 500 | 20 | 19m24s |
100 | 1 | 1 | 18m49s |
100 | 5 | 1 | 17m29s |
100 | 10 | 1 | 17m10s |
100 | 100 | 1 | 18m24s |
100 | 500 | 1 | 17m25s |
100 | 5 | 5 | 16m56s |
100 | 10 | 5 | 16m54s |
100 | 100 | 5 | 17m53s |
100 | 500 | 5 | 18m3s |
100 | 10 | 10 | 19m43s |
100 | 100 | 10 | 19m9s |
100 | 500 | 10 | 20m0s |
100 | 100 | 20 | 18m51s |
100 | 500 | 20 | 19m46s |
1000 | 1 | 1 | 23m13s |
1000 | 5 | 1 | 22m51s |
1000 | 10 | 1 | 22m24s |
1000 | 100 | 1 | 22m52s |
1000 | 500 | 1 | 22m54s |
1000 | 5 | 5 | 22m47s |
1000 | 10 | 5 | 22m54s |
1000 | 100 | 5 | 22m57s |
1000 | 500 | 5 | 24m6s |
1000 | 10 | 10 | 26m0s |
1000 | 100 | 10 | 23m32s |
1000 | 500 | 10 | 23m47s |
1000 | 20 | 10 | 23m39s |
1000 | 500 | 20 | 24m15s |
Summary
-
LedgersPerFile: 1
:- Best performance: Buffer Size: 500, Num Workers: 5
- Time taken: 16m55s
-
LedgersPerFile: 100
:- Best performance: Buffer Size: 10, Num Workers: 5
- Time taken: 16m54s
-
LedgersPerFile: 1000
:- Best performance: Buffer Size: 10, Num Workers: 1
- Time taken: 22m24s
Recommendations
-
For small number of LedgersPerFile (1 LedgersPerFile):
- Use Buffer Size: 500 and Num Workers: 5 for the fastest processing time.
-
For medium number of LedgersPerFile (100 LedgersPerFile):
- Use Buffer Size: 10 and Num Workers: 5 for the fastest processing time.
-
For large number of LedgersPerFile (1000 LedgersPerFile):
- Use Buffer Size: 10 and Num Workers: 1 for the fastest processing time.
- Note: Logically, for larger LedgersPerFile values, 1 worker and a buffer size of 2 should be sufficient. However, I tested with a buffer size of 1 and then directly with 10. I will run a test with 1 worker and a buffer size of 2 and update the results.
Note: Logically, for larger LedgersPerFile values, 1 worker and a buffer size of 2 should be sufficient. However, I tested with a buffer size of 1 and then directly with 10. I will run a test with 1 worker and a buffer size of 2 and update the results.
Additional tests for LedgersPerFile: 1000 with a smaller buffer size did not result in better performance.
- Buffer size: 2, Num workers: 1, Time taken: 23m 58s
- Buffer size: 2, Num workers: 2, Time taken: 23m 52s
Following Tamir's advice, I moved the tests to a dev EC2 instance. The performance was much worse: a test that takes 30 minutes locally for 10,000 ledgers took over 7 hours on the EC2. We found that I/O was the problem so as per Ops suggestion we upgraded to a larger gp3 volume and I moved PostgreSQL to use this new volume. It improved the time to about 1 hour for smaller buffer sizes but larger buffers led to oom errors.
I then modified the tests to use the BufferedStorageBackend directly to download and extract ledgers with varying buffer sizes and parallel workers, without reingesting through Horizon (and thus without PostgreSQL). This cut the processing time to about 20 minutes for 10,000 ledgers, but it’s still much slower compared to 3 minutes locally. This suggests the dev EC2 (t2.medium) are too small for these tests.
I’ve also run the same BufferedStorageBackend tests locally with different settings. The results match those from the previous Horizon-based tests. I’ll review all the data and provide a summary with configuration recommendations.