gcarq/rusty-blockparser

Improving Processing speed

Goro2030 opened this issue · 9 comments

No matter what I do, there's no way to get the "balances" option to process data from the blockchain at more than 24 MegaBYTES per second.

I still know the process itself is single-threaded, and by monitoring the CPU, it is clearly not a CPU bottleneck, even tough it's using around 140% of CPU on a 8 core-16 threads Ryzen 7 3700x (shouldn't it be using up to 100% since it's single threaded?)

Memory for sure is not a bottleneck ...

The next one would be the disk.
I have the whole blockchain on an SSD than can do 532 MBPS in read operations when reading sequentially 16 bits blocks in 8 queues on 1 thread ( as measured by CrystealDiskMark 7 ) ... worse case scenario, if the blockparser is jumping around files non sequentially, the SSD can do read 37 MBPS at a Queue Depth of 1, 1 thread for 4KiB blocks.

My thinking is that the software may be not taking full advantage of doing more sequential reads from the disk?

Also, is the software taking any advantage of having the blockchain indexed ? ( TXINDEX = 1 in bitcoin.conf )

If I would be using python, I would run a profiler to see where cycles are being spent or lost, and try to optimize that a little? Maybe run some subroutines in C++ closer to the hardware?

gcarq commented

Can you pull the latest changes from yesterday? I would be interested how it behaves for you. Especially the changes from #64. With that change all transactions are processed with multiple threads. I did a quick test with simplestats and the disk utilization is now far better with the same amount of memory.

TXINDEX = 1 shouldn't make a difference, only the block index from bitcoin is used right now.

Profiling would be a great addition, but I didn't look much into that. Anything that works with llvm could be used. perf, valgrind, etc. Rust is pretty close to the hardware, it can be compared to C, C++.

gcarq commented

My blockdata is on a 5400 RPM HDD and with the changes mentioned above I max out the IOPS most of the time.

Considering taking the slowness out of the way ( as I'm doing that on an SSD ) due to the IOPS issue, then the issue should be elsewhere ...

I'll take the new version for a spin and revert.

@gcarq , do you want me to test what's committed in master ( 0.8.0 ) , or the draft code in your pull request?

gcarq commented

master branch until latest commit (568d2af). master is a bit more recent than 0.8.0. The performance should be better especially for block heights 300000+

I have no idea how to do a git clone up to a specific commit ... I'm running the code that's in master ( that says 0.8.0 in the TOML file ), and it's running WAY faster already ... went from ingesting the blockchain (balances function) at 24MBPS to 31MBPS, so a 30% improvement! Not sure what you did, but it's working 👍

Looking at CPU usage, I can see that you're running 1 core thread and 8 auxiliary threads. Is the -t option back so I can try to raise that?

gcarq commented

Nice to hear! Sorry for the confusion, I wanted to refer to the HEAD commit in the master branch. I only push the version in the TOML file when I create a new tag, so that can be a bit misleading when talking about branches.

I added rayon to process transactions in parallel, since a lot of time was spent in the tx script evaluation. The code changes for that were minimal, all it does is creating a parallel iterator instead of a single threaded one.

The -t option is not back, rayon automatically decides the number of threads based on the number of CPU cores. Ofc there is a way to define that on a more granular way but I didn't look into that yet. I will tinker a bit first to see where we could benefit from it besides txs.

gcarq commented

Processing speed got a lot better and things changed quite a bit. Closing this for now