mla/pg_sample

pg_sample doesn't correctly handle generated columns

nicholasd-ff opened this issue · 6 comments

If you create a sample database with a single table as follows:

CREATE TABLE some_numbers(
    id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY
    , important_value int
    , double_value int GENERATED ALWAYS AS (important_value*2) STORED
);

INSERT INTO some_numbers(important_value) (SELECT * FROM generate_series(1,100));

Then dump the table using:

 pg_sample --limit="some_numbers=10%" --file=pg_sample_generated_bug.sql

You get a dump file with the generated columns in, which you can't import because the schema prevents it:

 \i dump_files/pg_sample_generated_bug.sql 
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
SET
SET
SET
SET
SET
ALTER TABLE
psql:dump_files/pg_sample_generated_bug.sql:85: ERROR:  extra data after last expected column
CONTEXT:  COPY some_numbers, line 1: "14        14      28"
ALTER TABLE

Note that this bug doesn't occur with raw pg_dump, which correctly omits the generated column.

Looks like we can omit the generated columns by querying the information schema:

SELECT column_name, is_generated FROM information_schema.columns WHERE table_name='some_numbers'
column_name   | is_generated 
-----------------+--------------
 id              | NEVER
 important_value | NEVER
 double_value    | ALWAYS

I can try and make the change and make a PR if that's useful? I've not written much Perl lately but I think the change would only be a couple of lines around line 696 to extract columns that are not generated:

  # Only extract non-generated columns
  my ($cols_to_copy) = $dbh->selectrow_array(qq{
       SELECT string_agg(column_name, ', ')
       FROM information_schema.columns 
       WHERE table_name=? 
       AND table_schema=?
       AND is_generated='NEVER' 
  }, undef, ($table->table, $table->schema) );
  notice "only copying cols [$cols_to_copy] ";

  $dbh->do(qq{
    CREATE $unlogged TABLE $sample_table AS
            SELECT $cols_to_copy
         FROM ONLY $table
      $tablesample
             WHERE $where
            $order
            $limit
  });
mla commented

Thank you, @nicholasd-ff ! I will try to review your changes tomorrow.

@mla I wrote a test and found a bug, so I made a PR with the proper fix for you. Hope that's OK!

It currently depends on my earlier PR to update the base image, but if that's a problem I can unstack them for you.

Hm, and I've encountered another bug with the referential integrity stuff I thought I'd covered that with the test but obviously not. I'll investigate and update the PR because I think my generated table needs a FK constraint to properly check the functionality.

And the bug is now fixed, and the PR updated.