postgrespro/vops

Performance of populate function

Mad-Apes opened this issue · 1 comments

Hello,
Through recent practice, I found out that query performance has greatly improved, but populate function consumes a lot of time.
The populate function will take more than 200 seconds when normal table contains 2000,000 rows.
So, what should I do to make it better?

In principle, you can populate data in parallel by spawning several concurrent populate statements with different predicates:

select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 0 and column_int2 < 1');
select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 1 and column_int2 < 2');
...

It will allow to load all CPU cores and perform import up to N times faster.
But please take in account that performing parallel insert will violate sequential order of records in vops_table_gb table which may have negative impact on performance of subsequent select queries.
This was one of the reasons why I have not implemented in VOPS parallel load.