pg native box/cube indexers have scaling issues
tracyhenry opened this issue · 6 comments
Performance results are showing that at 100M points, the response times of those two indexers are on the order of several minutes.
The current diagnosis is that the spatial indexes built are not clustered. The fix is either using Postgres's cluster command, or sort the data before insertion.
An interesting note:
I was trying to increase maintenance_work_mem
to speed up CREATE INDEX
, but the performance of CREATE INDEX
got even worse, and varied a lot.
After some investigation, I found that it was AUTO VACUUM
workers who were contending the memory with CREATE INDEX
. From this blog post:
"...Note that the system will run as many autovacuum workers at a time as specified by autovacuum_max_workers, and each one will allocate an amount of memory equal to maintenance_work_mem. Currently, it always allocates the full amount; hopefully, that will be fixed in the future. So, if you have the default setting of autovacuum_max_workers = 3, and you set a value like maintenance_work_mem = 10GB, you can and very often will have 30GB of RAM dedicated just to autovacuum, not counting any foreground VACUUM or CREATE INDEX operations that you may run. You can very easily run a small system out of memory this way, and even a large system may have some performance problems...."
In fact, it's this UPDATE query that triggered auto vacuum. And we don't need it: we can just put it in the big SELECT query together with other attributes. After doing that, the performance of CREATE INDEX
got more normal and stable.
Cool, thanks for sharing this! I definitely did not know that. How did you know that the UPDATE query triggers auto vacuum? And did you change both the maintenance_work_mem to be smaller (~1 GB) as the blog post suggests as well as combining the update query with the select query?
I knew that UPDATE triggered autovaccum from this blog post. I also empirically tested: with UPDATE there were lots of autovacuum workers but none when UPDATE is gone.
The blog post suggested a low maintenance_work_mem because of the behavior of autovacuum. Now that we don't have this issue, I think we can set it to a higher value. I set it to 10GB, but I haven't tested what the performance would be with smaller or even bigger values.
@peterg17 sorry, didn't see the latter part of your message.
Yes I did, 2GB. The performance still varied on a 100M dots dataset:
CREATE INDEX took 2301 secs on the top canvas, but 3768 secs on the bottom canvas.
not exactly sure why.
Ok cool, and I assume you just ran perf numbers for the native box indexer? We can apply the same principle of using a lower maintenance_work_mem than before for the native cube indexer too, but i don't think there is a update query in the native cube indexer. Something that's interesting is that I was using 20GB as the value for maintenance_work_mem, but with 3 autovacuum workers that's 60 GB of total memory being used. But the instance only has 26 GB of RAM. So, how does Postgres interpret this impossible number?
Yes I was running 2D, as well as my WIP partition indexer.
I don't really know how the allocation of memory happens if there is not enough. I haven't run that kind of experiment yet. My instance happened to have 30G memory...