docker-library/postgres

postgres version 16 onwards is consuming entire disk space within container (possible disk spill is happenning)

Closed this issue · 5 comments

I have created a stored procedure:

CREATE OR REPLACE PROCEDURE TEL1PrepUploadTables(IN Dataset text)
    LANGUAGE PLPGSQL
    AS \$\$
    BEGIN
        EXECUTE format('CREATE TABLE "add_cell_raw_%s" PARTITION OF add_cell_raw FOR VALUES IN (''%s'');',Dataset,Dataset);

        EXECUTE format('CREATE TABLE "del_cell_id_raw_%s" PARTITION OF del_cell_id_raw FOR VALUES IN (''%s'');',Dataset,Dataset);

        EXECUTE format('CREATE OR REPLACE VIEW add_cell_raw_latest AS
        SELECT * FROM "add_cell_raw_%s";',Dataset);

        EXECUTE format('CREATE OR REPLACE VIEW del_cell_id_raw_latest AS SELECT * FROM "del_cell_id_raw_%s";',
            Dataset);
    END;
    \$\$;

in this procedure there is a step to create or replace view add_cell_raw_latest...
which takes data from a raw table add_cell_raw_%s (actual value add_cell_raw_c169cwea01_export_LTECell_v4_20250208073005)...this raw table contains roughly 750 MB

TEL1_WEA_4G=# SELECT pg_size_pretty(pg_table_size('"add_cell_raw_c169cwea01_export_LTECell_v4_20250208073005"'));
 pg_size_pretty
----------------
 737 MB
(1 row)

Now when this CREATE VIEW runs internally postgres runs the following step to create the VIEW

TEL1_WEA_4G=# SELECT pid, query, state, wait_event, wait_event_type FROM pg_stat_activity WHERE state <> 'idle';
 pid |                                                                                                                                        query                                                                                                                                        | state  | wait_event | wait_event_type
-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+------------+-----------------
 173 | COPY "add_cell_raw_c169cwea01_export_LTECell_v4_20250208073005" ("Geometry", "dataset", "DateTime", "UploadTime", "mcc", "mnc", "cid", "cellType", "cellName", "antennaLat", "antennaLon", "antennaOrientation", "antennaOpening", "antennaHorizontalRange", "geoType") FROM STDIN; | active |            |

Now the ISSUE is i have created a kubernetes container using
postgres images which are version 15 or lower and postgres images which are version 16 or higher (with volume allocated to container 50 GB)..

Observation

  1. when I am running container based on postgres image versions 15 or lower the space is being occupied in a reasonable way
wmps-postgres-deployment-b6465cb7b-rj9lw:/# df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 154.9G    102.3G     52.5G  66% /
tmpfs                    64.0M         0     64.0M   0% /dev
/dev/vda1               154.9G    102.3G     52.5G  66% /docker-entrypoint-initdb.d
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/hosts
/dev/vda1               154.9G    102.3G     52.5G  66% /dev/termination-log
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/hostname
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/resolv.conf
shm                      64.0M      1.0M     63.0M   2% /dev/shm
**kadalu:kadalu-storage-pool
                         50.0G      3.8G     46.2G   8% /var/lib/postgresql/data**
  1. HOWEVER same STEP running on postgres image versions 16 or higher leads to immediate disk usage consuming entire 50 GB diskspace ... corrupting the container making it unusable
wmps-postgres-deployment-b6465cb7b-rj9lw:/# df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 154.9G    102.3G     52.5G  66% /
tmpfs                    64.0M         0     64.0M   0% /dev
/dev/vda1               154.9G    102.3G     52.5G  66% /docker-entrypoint-initdb.d
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/hosts
/dev/vda1               154.9G    102.3G     52.5G  66% /dev/termination-log
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/hostname
/dev/vda1               154.9G    102.3G     52.5G  66% /etc/resolv.conf
shm                      64.0M      1.0M     63.0M   2% /dev/shm
**kadalu:kadalu-storage-pool
                         50.0G      50G     0G   100% /var/lib/postgresql/data**

As I said this issue coming from postgres images 16 and onwards... I am reading some articles where to increase the query performance could lead to disk spill of the data in postgres... please look into it and fix it

Can you run \l+ in psql to see the size of the db?

> psql -c "\l+"
      Name      |  Owner   | Encoding | Locale Provider |  Collate   |   Ctype    | Locale | ICU Rules |   Access privileges   |  Size   | Tablespace |                Description                 
----------------+----------+----------+-----------------+------------+------------+--------+-----------+-----------------------+---------+------------+--------------------------------------------
 postgres       | postgres | UTF8     | libc            | en_US.utf8 | en_US.utf8 |        |           |                       | 7475 kB | pg_default | default administrative connection database
 template0      | postgres | UTF8     | libc            | en_US.utf8 | en_US.utf8 |        |           | =c/postgres          +| 7321 kB | pg_default | unmodifiable empty database
                |          |          |                 |            |            |        |           | postgres=CTc/postgres |         |            | 
 template1      | postgres | UTF8     | libc            | en_US.utf8 | en_US.utf8 |        |           | =c/postgres          +| 7321 kB | pg_default | default template for new databases
                |          |          |                 |            |            |        |           | postgres=CTc/postgres |         |            | 
(3 rows)

Image

This is the screen shot of version 17 postgres image... but the database is still in healthy state... i can't give you the same data when it gets corrupted as when the container goes bad and i cannot run this command.

How many time has the corrupted db pod restarted?

I waited

How many time has the corrupted db pod restarted?

I waited for couple of hours for it to restore... But it was in same state