How to make use of parallelization with cstore_fdw?
njanakiev opened this issue · 2 comments
psql (PostgreSQL) 12.3 (Ubuntu 12.3-1.pgdg20.04+1)
cstore_fdw 1.7.0
Thank you for this project! While using the example in PostgreSQL Columnar Store for Analytic Workloads i wanted to compare the query to using cstore_fdw
and without and I noticed that the query was not utilizing parallelization for the following query:
EXPLAIN SELECT
width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
round(avg(review_rating), 2) AS review_average,
count(*)
FROM
customer_reviews_cstore
WHERE
product_group = 'Book'
GROUP BY
title_length_bucket
ORDER BY
title_length_bucket;
Here is the output:
Sort (cost=25379.78..25379.78 rows=1 width=44)"
Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
-> HashAggregate (cost=25379.75..25379.77 rows=1 width=44)"
Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
-> Foreign Scan on customer_reviews_cstore (cost=0.00..25313.65 rows=8813 width=8)"
Filter: (product_group = 'Book'::text)"
CStore File: /var/lib/postgresql/12/main/cstore_fdw/16387/74837"
CStore File Size: 105383392"
For the same table without cstore_fdw
I get the following output:
Finalize GroupAggregate (cost=72554.51..79387.78 rows=25229 width=44)"
Group Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
-> Gather Merge (cost=72554.51..78441.69 rows=50458 width=44)"
Workers Planned: 2"
-> Sort (cost=71554.49..71617.56 rows=25229 width=44)"
Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))"
-> Partial HashAggregate (cost=69268.39..69709.90 rows=25229 width=44)"
Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)"
-> Parallel Seq Scan on customer_reviews (cost=0.00..65552.06 rows=495510 width=8)"
Filter: (product_group = 'Book'::text)"
Is there some setting that I have missed? I saw that foreign scans are marked as parallel safe in #198, is this supported in PostgreSQL 12?
Data used is from http://examples.citusdata.com/customer_reviews_1998.csv.gz and http://examples.citusdata.com/customer_reviews_1999.csv.gz. Creation of cstore table:
CREATE EXTENSION cstore_fdw;
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
CREATE FOREIGN TABLE customer_reviews_cstore (
customer_id TEXT,
review_date DATE,
review_rating INTEGER,
review_votes INTEGER,
review_helpful_votes INTEGER,
product_id CHAR(10),
product_title TEXT,
product_sales_rank BIGINT,
product_group TEXT,
product_category TEXT,
product_subcategory TEXT,
similar_product_ids CHAR(10)[] )
SERVER cstore_server OPTIONS(compression 'pglz');
COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1998.csv' WITH CSV;
COPY customer_reviews_cstore FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
Creation of table without cstore:
CREATE TABLE customer_reviews (
customer_id TEXT,
review_date DATE,
review_rating INTEGER,
review_votes INTEGER,
review_helpful_votes INTEGER,
product_id CHAR(10),
product_title TEXT,
product_sales_rank BIGINT,
product_group TEXT,
product_category TEXT,
product_subcategory TEXT,
similar_product_ids CHAR(10)[]
);
COPY customer_reviews FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
COPY customer_reviews FROM '/home/user/customer_reviews_1999.csv' WITH CSV;
AFAICT, you will have to make use of partitions to see parallelism.