citusdata/cstore_fdw

How to make use of parallelization with cstore_fdw?

njanakiev opened this issue · 2 comments

psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
cstore_fdw 1.7.0

Thank you for this project! While using the example in PostgreSQL Columnar Store for Analytic Workloads i wanted to compare the query to using cstore_fdw and without and I noticed that the query was not utilizing parallelization for the following query:

EXPLAIN SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM
    customer_reviews_cstore
WHERE
    product_group = 'Book'
GROUP BY
    title_length_bucket
ORDER BY
    title_length_bucket;

Here is the output:

Sort  (cost=25379.78..25379.78 rows=1 width=44)"
  Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  HashAggregate  (cost=25379.75..25379.77 rows=1 width=44)"
        Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
        ->  Foreign Scan on customer_reviews_cstore  (cost=0.00..25313.65 rows=8813 width=8)"
              Filter: (product_group = 'Book'::text)"
              CStore File: /var/lib/postgresql/12/main/cstore_fdw/16387/74837"
              CStore File Size: 105383392"

For the same table without cstore_fdw I get the following output:

Finalize GroupAggregate  (cost=72554.51..79387.78 rows=25229 width=44)"
  Group Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
  ->  Gather Merge  (cost=72554.51..78441.69 rows=50458 width=44)"
        Workers Planned: 2"
        ->  Sort  (cost=71554.49..71617.56 rows=25229 width=44)"
              Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
              ->  Partial HashAggregate  (cost=69268.39..69709.90 rows=25229 width=44)"
                    Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
                    ->  Parallel Seq Scan on customer_reviews  (cost=0.00..65552.06 rows=495510 width=8)"
                          Filter: (product_group = 'Book'::text)"

Is there some setting that I have missed? I saw that foreign scans are marked as parallel safe in #198, is this supported in PostgreSQL 12?

Data used is from http://examples.citusdata.com/customer_reviews_1998.csv.gz and http://examples.citusdata.com/customer_reviews_1999.csv.gz. Creation of cstore table:

CREATE EXTENSION  cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;

CREATE FOREIGN TABLE customer_reviews_cstore ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[] )
SERVER cstore_server OPTIONS(compression 'pglz');

COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1998.csv' WITH CSV;
COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

Creation of table without cstore:

CREATE TABLE customer_reviews ( 
    customer_id TEXT, 
    review_date DATE, 
    review_rating INTEGER, 
    review_votes INTEGER, 
    review_helpful_votes INTEGER, 
    product_id CHAR(10), 
    product_title TEXT, 
    product_sales_rank BIGINT, 
    product_group TEXT, 
    product_category TEXT, 
    product_subcategory TEXT, 
    similar_product_ids CHAR(10)[]
);
    
COPY customer_reviews FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
COPY customer_reviews  FROM '/home/user/customer_reviews_1999.csv' WITH CSV;

AFAICT, you will have to make use of partitions to see parallelism.