citusdata/cstore_fdw

Unexpectedly bad performance

Spindel opened this issue · 8 comments

cstore_fdw 1.6.0 on postgresql-10

So, I've got a lot of time-series data in a table that looks like this:

spider=> create foreign table history_y2017 (
    itemid bigint not null,
    clock integer not null,
    value numeric(20,0) not null,
    ns integer default 0 not null,
    constraint check_clock CHECK ( clock >=  1483228800 AND clock < 1514764800)
)
SERVER cstore_server OPTIONS(compression 'pglz');
insert into history_y2017 select * from history where clock >= 1483228800 AND clock < 1514764800 order by itemid,clock,ns;
analyze history_y2017;
 spider=> explain analyze verbose select * from history_y2017 where itemid=1464 and clock 1497582092 and clock <= 1497668492;
                                                              QUERY PLAN                                                             
 ------------------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.history_y2017  (cost=0.00..12087188.56 rows=851 width=20) (actual time=995.497..1276.525 rows=1440 loops=1)
    Output: itemid, clock, value, ns
    Filter: ((history_y2017.clock > 1497582092) AND (history_y2017.clock <= 1497668492) AND (history_y2017.itemid = 1464))
    Rows Removed by Filter: 28560
    CStore File: /var/lib/pgsql/10/data/cstore_fdw/16385/17191
    CStore File Size: 3549773603
  Planning time: 0.408 ms
  Execution time: 1278.834 ms
 

Benchmarking it on a select benchmark gives me ~3 tps...

Changing the block_row_count from default to '1000' only serves to make the matter four times worse.

Some slowdown is expected when you have such low block row count.
Reducing block row count causes more disk accesses when writing and reading.

Why do you need to change this ?

That was an experiment, I'm trying to get this query up to at least approximate the time it takes on a normal table with a brin index.

Perhaps the question I should ask is this:

  • A normal table partition, brin index ( blocks per range =16) on this dataset gives a tps of ~ 130. Not too great, but acceptable.
  • The same in cstore_fdw (default settings) gives a tps of 3.1 .

So, is my data layout "wrong" for this problem/ query style, and should I simply look at some other solution for effectively storing timeseries data, or is there something I have missed in either my data layout (sorting, etc) that would improve this?

The data table is approximately 25 gigs uncompressed, compresses down to a rather respectable 3-5 gigabytes in size.
This is a small subset of 1 year data, 2000 sensors of variable time slice.

Basically, we have a "hot" section of data (last few weeks/months) And then colder storage data. Different datatypes in different tables, etc.
Currently I'm measuring integers, since those are most common, (by number) and by size the largest.

Well, we are (in live) already disk IO bound, and hitting massive slowdowns there, because a single seek will blow much of the dcache, etc.

However, the current "comparision" for cstore when it comes to compression on this dataset is a btrfs partition with compression enabled + brin index (pages per block at 16)

This combination uses ~4% more disk than cstore for the same dataset, while being approximately 700x faster for queries.

It's that difference that's really causing me some confusion on the performance here.

So, testing shows that for my workload:

  • Order by the most common selector when inserting (itemid,clock) rather than the natural insert order (clock, itemid)

Causes the difference between extra rows returned to go (for a 24-hour query) from 1828560 to 28560

This change, together with increasing stripe_row_count to 500000 brings the query plan down to <0.3 seconds

So, summarizing the tests:

type data compressed block_row_count stripe_row_count sorting select-tps
cstore_fdw 75GB 9.4 GB 50000 5000000 itemid,clock 11.29
btrfs+brin 75GB 11.9GB ----------------- -------------------- itemid,clock( insert order) 166.52
xfs+brin 75GB 75GB ------------------ -------------------- itemid,clock 187

Benchmark ran:

\set itemid random(1, 5000)
\set stop random(1445802188, 1514922128)
\set interval random(1, 24)
\set start :stop - (3600 * :interval)
\set peek 1514828632
SELECT clock,ns,value from history where itemid = :itemid and clock > :start and clock <= :stop;
SELECT clock,ns,value from history where itemid = :itemid and clock > :peek  order by clock desc limit 1;

So, is there something obvious other than the stripe_row_count that I've missed? I ran tests with it at 750 000 as well, but there wasn't a discernable difference other than a slightly higher compression degree.

Bouncing block_row_count up to 50k increased performance of selects but returned more rows by the filters, so I think that's a mis-optimization on this workload ( We don't usually paginate more than 24-hour blocks due to the current design issues)

These are still synthetic benchmarks, and on datasets that aren't in the same dimension as our production sets, where a single query of this type takes > 60 seconds and completly thrashes the cache and binds up the system in disk page-in/out.

And honestly, since cstore_fdw is available for us without a kernel upgrade, that may be a more useful step anyhow, but I'm still trying to figure out why the performance is below where I'd expect it to be.