Increase performance of crawler

Question

Increase performance of crawler

Closed this issue 3 years ago · 3 comments

I have a theta node running from genesis block and I have the crawler pointed at this node and the crawler is progressing much slower than the node. Right now it is crawling around ~6000 blocks per hour and from a quick math it will take around 11 more weeks to go through the whole blockchain.

It barely uses any resources, I am on kubernetes and the crawler is currently using 0.1 CPU and 120MB of RAM although the requests are set for 1CPU and 2GB of RAM. I am running a big instance of DocDB as the MongoDB requirement trying to squeeze the most of the crawler but didn't make any difference.

Is there something I can do to speed up this? I saw this line on the mongo client implementation and wanted to know better why this max concurrency was set and what value could be acceptable without breaking theta node.

Theta node itself isn't progressing very fast but it is between 2x and 4x faster than the crawler

Answer 1 · 2021-05-20T20:53:44.000Z

You can change this variable to make the crawler process faster. Any number below 50 should be fine.

Answer 2 · 2021-05-20T21:35:09.000Z

You can change this variable to make the crawler process faster. Any number below 50 should be fine.

Can this variable be set on the config file? This way I can build from this repository and I don't have to run my fork

Answer 3 · 2021-06-01T01:43:26.000Z

Thank you for supporting this on the config file @zhenyang-sliver. I am using 50 and it is crawling much faster now!

It did increase a lot but still going slower than the node. What is the reason for going below 50? Only CPU and MEM resources? I wanted it to go a bit faster and thought about changing this to around 150 so it can run as fast as the node fetches new blocks. Right now the node is fetching around 40k blocks per hour and the crawler is around 18k blocks per hour. Before these changes it was crawling around 6k blocks per hour