/nix-postgres-docker

notes on running walg w postgres in a container

Primary LanguageShell

nix-postgres-docker

Preface

I’d highly recommend reading the official docs, and only read this doc as supplement on how do some of these things fit in context of wal-g:

While I think I should use a managed pg or one of those baked distributions already, I also think selfhosting pg for my homelab has taught me many lessons that I wouldn’t learn otherwise, so I guess we shall continue doing so “for now”. (that too in a container)

Deploying postgres on containers is definitely a matter of discussion. one of the issues i faced was docker messing up with dynamic shared memory configuration in postgres, but otherwise i’ve been happy running postgres in a container. (i personally have been using nomad so far)

How to deploy, how to add a connection pooler etc. are concerns which are out of the scope of this repo at the moment. I’ll keep this document focused on “how to setup WAL archival using WAL-G” when running postgres in a container.

All of the things mentioned in this document is very much my experience playing around with wal-g and the particular scenario I had to use wal-g in, so most things may change based on what scenario/scale etc you need to use it.

For actual commands that I’ve tested, check the ./Justfile.wal-g.example file.

Table of contents

Context and the problem

To give you the context, I’ve been happily running postgres in my homelab without a backup(who needs one anyway) but then I planned on doing a user facing side project for which I decided, will use the same pg cluster. But now, even though I’ll get only 2 users it’s my duty to ensure that their data remains intact despite of the unfathomable amount of fuckery i am about to unleash on my homelab servers. So, in conclusion, i need a backup policy for the database.

I already have one actually, pg_dump > restic > blackblaze B2, this backs up more than just a db though. But, it is not ideal(various reasons) for a database that’ll have a continuous flow of writes to it. For this, you need what postgres folks call ”continuous archival” (many ways to do it, see aside section below). For sqlite, there’s litestream which does a similar thing pretty brilliantly.

So after doing some comparison of various backup solutions/methods, i decided to do with wal-g. You could actually do it without any 3rd party tool but they definitely help. (See below for comparison)

Previously I’ve been running a custom docker image based on the official postgres docker image to install the required extension. It’d be nice to extend it! In case of wal-g, the running postgres service(archiver (background) process) needs to call wal-g from the archive_command postgres configuration. (This gets triggered every-time there’s a new WAL segment, which can in-turn be triggered by setting archive_timeout in case your db doesn’t make enough commits to generate a new WAL segment but you still want continuous archival). Basically, you need wal-g and postgres in the same runtime and be callable($PATH).

Now to get wal-g running I just need to install wal-g into the docker image(that I have already), that’s a apt-get. Simple right? Yes! There are even some off the shelf community repos online for this exact usecase.

But I am stupid, so I decide to make my life harder. I have rest of my infra wired nicely in nix flakes and I run things on an arm machine and also locally(x86), so I thought to myself it’d be really nice to have “most” things in a flake. Honestly would’ve gone the first way if I knew doing it the nix way would put me in the trenches.

So now finally our problem becomes:

“Create a multi-arch production-ready(walg+compatibility w official image+custom extensions) postgres docker image using NixOS dockerTools AND make sure the auxiliary processes(eg. base backup shipping, cleanup etc.) are running as expected”

Solution

Creating the docker image / Breakdown of postgres docker image.

I initially attempted to base off the official postgres docker image, this was not successful. So the alternative is building the image directly from Nix definitions. This is usually pretty simple, but for postgres things are a bit complicated you need the postgres user, postgres has initdb scripts, the official image makes use of gosu etc which I don’t properly understand.

But it was doable, I did it in two different ways:

  1. Based on official postgres docker image: Most things work nicely, but I am not sure how would I install postgres extensions to this. It will be possible just not super straightforward.
  2. Based on nixpkgs postgres: Installing wal-g, installing other things along with postgres extension to the docker container is pretty straightforward.

See ./flake.nix

Additionally, github actions is setup at .github/workflows to build this image for both amd64 and arm64

Setting up wal-g & references

Gitlab has an amazing runbook for wal-g

My impression on wal-g, while I think it’s great software but it lacks heavily in documentation. I am not sure why such a widely adopted tool has such poor documentation(lot of undocumented cli features). Lot of conceptual things which are important to understand for something like managing backup are not mentioned at all. Fortunately, they’ve a super friendly and nice telegram and the developers hang around and help resolve issues. I’d personally like to work on the docs but I’ll let things sink in for a while.

Command reference

Following are wal-g commands, what they do and when to run them.

contextwal-g commandwhat it doeswhen to run?
base-backupbackup-listList of successful “base backups”, also will have delta backupsmanually, to check (This won’t list anything till your first base backup, even if archive_command is sending in WAL(s))
backup-list --detail --pretty is useful as-well
backup-pushcreate a full $PGDATA backupperiodic(auto), via systemd-timer
backup-push will decide on it’s own when to do delta or full
base backup is always done in reference to a WAL LSN, this is helpful in many ways(restore, deletion)
backup-fetchRestore PostgreSQL data directory from a full backupmanually, when shit hits the fan
deletegarbage (WAL)removes unnecessary WAL(s), leaving only stuff that can be used for restoreperiodic(auto), via systemd-timer
retain (base-backup)Can be FULL or FIND_FULL, deletes certain of “base backups” and dependent WALsperiodic(auto), via systemd-timer (I just keep last 2 full backups: wal-g delete retain FULL 2)
target (base-backup)delete specific base backup, does not delete WALsI don’t run this
before (base-backup)delete specific base backup, before a certain base backupI don’t run this
everythingDeletes everything except parmanent backups (--parmanent)I don’t run this except for debugging
walwal-pushpushes WALs as part of archive_commandnever run manually
wal-fetchfetches WALs as part of restore_commandnever run manually, but only run during restoration. The gitlab runbook mentions that they nolonger use this, but use streaming replication directly
replicationwal-receiveI haven’t explored this yet, but can be an alternative to wal-push
catchup-*I haven’t explored these yet
verifyThere are verify and checksum check related commands aswell

Making base backup

  • This could be done as a sidecar job also in cloud native environments but since i have the machine to myself I like to setup systemd-timer. Other than the timer, I can now ssh into the machine and manually trigger the backup with sudo systemctl start walg-backup-push. The systemd service looks something like:
  • wal-g has an issue that makes it little in-convenient to run base-backups from the host when running postgres in container: wal-g/wal-g#1782
# walg-backup-push will do a full backup and clean unnessary wal files from the backup
systemd.services."walg-backup-push" = {
  script = ''
  set -eu
  export AWS_ENDPOINT=
  export WALG_S3_PREFIX=s3://<bucket_name>/<custom_suffix>
  export AWS_ACCESS_KEY_ID=<agenix+cat>
  export AWS_SECRET_ACCESS_KEY=<agenix+cat>
  export PGPASSWORD=<agenix>

    ${pkgs.wal-g}/bin/wal-g backup-push $WALG_PGDATA && ${pkgs.wal-g}/bin/wal-g delete garbage
  '';
  serviceConfig = {
    Type = "oneshot";
    User = "root";
  };
  environment = {
    WALG_PGDATA = "/var/lib/postgres/data";
    PGHOST = "localhost";
    PGPORT = "5432";
    PGUSER = "postgres";
  };
};
systemd.timers."walg-backup-push" = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "weekly"; # my db updates non-frequently, OK with 1 week worth of WAL
    Persistent = true;
    Unit = "walg-backup-push.service";
  };
};

Running WAL archival

  • Just set archive_command = 'wal-g wal-push %p' and we’re all set. It needs the env vars: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT, WALG_S3_PREFIX if you’re using a s3 backend(blackblaze). You can set them however you want(eg. I set them using nomad env vars)
  • If WAL shipping (archive_command) fails for some reason, WAL files will be kept on the server until the disk is running full!
  • There are a number of cases where the same WAL segment can be pushed more than once, especially after failures where Postgres is not sure that the command completed.

Restoration/Standby

Restoring and standby is pretty involved, I’ve created a separate higher level section for it.

postgres restore related command reference
  • The official docs(pg16.04) are much better reference but just dumping here for an overview.
  • As of pg17 only the date/time and named restore point options are very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.
  • date +'%Y-%m-%d %H:%M:%S.%N' | cut -b 1-23 gives numeric UTC offset
  • pg_controldata -D $PGDATA
  • pg_resetwal : postgresql error PANIC: could not locate a valid checkpoint record - Stack Overflow
ContextNameDescriptionCrash recovery modeStandby mode
restore_commandRequiredOptional, can be combined with Streaming Replication
Recovery Target(“till this”)recovery_targetOnly allowed value is immediate, till consistent stateYes(oneof)N/A
recovery_target_nameif pg_create_restore_point was usedYes(oneof)N/A
recovery_target_timenumeric offset from UTC, time stamp up to which recovery will proceed.Yes(oneof)N/A
recovery_target_lsnIf you know the exact lsnYes(oneof)N/A
recovery_target_xidIf you know the exact tx_idYes(oneof)N/A
recovery_target_inclusiveRelated to recovery_target_[time/lsn/xid] for one-off adjustmentsYes(default:on)N/A
recovery_target_timelineThe recovery_target_* can only belong to one timeline_id.Yes(default:latest wal timeline)N/A
recovery_target_actionWhat happens when restore completes (pause/shutdown/promote). see doc for detailsYes(default:pause)N/A

Cleanup and Maintenance

  • If you make a backup-push with the --parmanent flag(or using backup-mark), it’ll not be picked by the delete command.
  • wal-g delete garbage can be run periodically to remove WAL files which
  • I think if you run wal-g retain then you don’t really need to run garbage but you can if leftovers are there.

Restoration, Crash Recovery and Standby

  • All of these need a base backup or to be configured to use pg_rewind, I personally have not used pg_rewind yet.
  • All of the places where streaming replication can be applied, it can either be sync or async , sync replication is not possible with WAL archival as it operates at commit level and WAL archival works at WAL segment level.
  • .history errors in the logs are expected when doing crash recovery: wal-g/wal-g#628
  • Order of operations matter: Failed Postgres Replication from master to standby
  • hot_standby is now on by default. So if you create a standby server, you’ll be able to do reads on it as archives are fetches. Essentially a read replica.
CombinationSR protocolWAL Archival (PITR) / File based / Log shippingNotesDoubts
Warm StandbyNostandby.signalprimary & standby (archive_command <=> restore_command)
Hot StandbyNostandby.signal + hot_standby=on (default)warm standby + read-only
Warm StandbyYesNoThis is no backup, just replicas (primary_conninfo et al)
Hot StandbyYesNoThis is no backup, just replicas (primary_conninfo et al)
Warm StandbyYesYesThis is ideal (primary_conninfo et al + restore_command)
Hot StandbyYesYesThis is also ideal (primary_conninfo et al + restore_command)
Crash Recovery (Normal)NoYes, use recovery.signalThis is the most basic scenario (restore_command)Do we really need to disable archive_mode here during recovery?
Crash Recovery (w Standby)NoYes, use recovery.signal and use standby.signal bothIn this scenario, standby takes precedence (restore_command)
Crash Recovery (w Standby w SR)YesYesWAL archive + SR + Instant promotionI don’t see this scenario being very useful/intuitive/practical

Standby Servers (Async)

These can be either warm or hot

via Streaming Replication (TODO)

  • I’ve not done this yet
  • You need to set primary_conninfo. restore_command is not used here.

via WAL Archival (PITR)

This section is about running a “standby server” with restore_command + standby.signal

Fetch the base backup
export AWS_ENDPOINT= # optional
export WALG_S3_PREFIX= # eg. s3://<bucket_name>/app_pg
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_REGION= # optional if region already specified in endpoint
export PATH_TO_PGDATA=/srv/data/pg_data

wal-g backup-fetch $PATH_TO_PGDATA LATEST
  • For ~100GB data, it took about 15mins with 30MB/s + extraction
  • This is more of a one time manual thing
  • Find a place where you want to restore the base backup to, this will usually be the $PGDATA of the new postgres server to be restored/standby to be created.
  • Use wal-g backup-fetch to fetch the base backup you want.
  • In case of running postgres in a container
    • You’d want to run backup-fetch on the host to whatever the $PGDATA directory should be. And when that $PGDATA gets mounted into the container make sure its given appropriate permissions etc.
    • In case of using the official postgres image in a non-rootless/rootfull environment, postgres uses gosu, so you can in-fact run backup-fetch as root and you should be fine. Whether this is a good idea? ¯\_(ツ)_/¯
Special Usecase: Restore only with base backup
  • Usually you’d want fresh WAL etc. (that’s the traditional continious backup)
  • But if you wanted more of the pg_restore kind of behavior with wal-g base backup
    • Then even if you’re running in standby mode, you can’t run pg_ctl promote, it’ll give error that not runing in standby ()
  • The restore_command = 'wal-g wal-fetch %f %p' is absolutely useless. as it’s used for continious archival and we not doing that here.
  • If you don’t specify the standby.signal you’d get the following:
    2024-12-10 17:10:48.788 UTC [28] HINT:  If you are restoring from a backup, touch "/srv/data/postgres/recovery.signal" and add required recovery options.
    If you are not restoring from a backup, try removing the file "/srv/data/postgres/backup_label".
    Be careful: removing "/srv/data/postgres/backup_label" will result in a corrupt cluster if restoring from a backup.
        
  • BUT
    • basebackup itself need WAL for restoing too, WAL-G operates same. WAL is used for PITR, but even if you restoring to the time of backup (without PITR), you still need WAL.

So in summary, this usecase is not possible iiuc.

Make changes to postgresql.conf and $PGDATA for standby
  • Add standby.signal inside $PGDATA, this will tell postgres to start in standby mode. If you don’t specify this and yet have a restore_command, pg will safely error out.
  • Update standby settings in postgresql.conf
    --- a/workloads/nomad/pg/postgresql.conf
    +++ b/workloads/nomad/pg/postgresql.conf
    -archive_mode = on		enables archiving; off, on, or always
    +archive_mode = off		# enables archiving; off, on, or always
    -#restore_command = ''		# command to use to restore an archived WAL file
    +restore_command = 'wal-g wal-fetch %f %p'
        
  • DO NOT SET any of the other recovery_* config(s), those are meant to be set during recovery and will mess things up if set when setting up a standby.
  • Start the postgres service(the container if running via container).
  • At this point, it should take a while based on how far ago the base backup was taken/how many WALs it needs to catch up on etc. Once it catches up, it’ll continue to be in standby mode.
  • To continue to be in standby mode means, postgres will keep looking for the next WAL log, which is probably not yet pushed by the primary’s archive_command, as soon as the next segment is available it’ll load that in.
Promote as needed
  • Once you know that the “current” primary is no longer going to push new WAL logs/data, you can simply shutdown the primary and promote the stanby as per need.
  • You can promote a standby by running pg_ctl promote or calling pg_promote(). This will also delete the standby.signal file. If you’re running postgres in an container, exec into the container as the postgres user and run pg_ctl promote
  • After promoting you may want to remove/comment the restore_command config, it’ll not be picked up anyway by postgres but better to comment it out imo.

Standby Servers (Sync) (ONLY via Streaming Replication) (TODO)

  • sync replication happens at commit level.
  • Synchronous standby servers cannot happen without postgres streaming replication protocol, hence won’t work with WAL archival.
  • These can be either warm or hot
  • I’ve not done this yet

Crash Recovery (PITR)

  • PITR, as I understand PITR, it’s is not one single thing. It’s the combination of all the different recovery targets, mechanism to do the backups and the concept of postgres timeline etc. With the concept of timeline ID, it is possible that the same LSN or the same WAL segments exist in multiple timelines.
  • When using different a recovery_* target other than recovery_target and when recovery_target_action, you can use pg_wal_replay_resume() to play/pause the recovery process. Eg. You want to manually check if some data is recovered or no etc. Like using a debugger with steps kind of. But usually this is not needed simply using recovery_target is enough for me.
  • During recovery mode, if the recovery is terminated because of an external error, the server can simply be restarted and it will continue recovery

via Streaming Replication (TODO)

  • I’ve not done this yet
  • I don’t completely understand this scenario because after recovery, the recovery pg instance would implicitly be promoted but doing this via streaming replication means the primary is live. So in which case do we even need this?

via WAL Archival (PITR)

Differences with standby mode/servers

This is similar to standby servers with the following gotchas/differences(there’ll be more ofc):

  • The settings(postgresql.conf) here are only needed for the duration of restore, once done they should be reset.
  • Instead of standby.signal, we’d add recovery.signal
  • Instead of the standby mode behavior, where it stays in “standby mode” until promoted, in archival/crash recovery, as soon as recovery is successful, the postgres server will implicitly get promoted unlike standby mode.
  • Since recovery mode does implicit promotion, you cannot really run multiple recovery, if that’s a usecase you probably need a standby instead of recovery. See this issue for more info.
Fetch the base backup

This is same as done for standby servers

Make changes to postgresql.conf and $PGDATA for recovery
  • Add recovery.signal inside $PGDATA, this will tell postgres to start in recovery mode.
    • The server will enter recovery and will not stop recovery when the end of archived WAL is reached, after which it’ll implicitly promote itself/do what recovery_target_action is set to.
    • The recovery_* configuration control how this behaves (Eg. you can control for an earlier stopping point than end of WAL etc.)
  • Update recovery settings in postgresql.conf
    --- a/workloads/nomad/postgres/postgresql.conf
    +++ b/workloads/nomad/postgres/postgresql.conf
    -archive_mode = on      # enables archiving; off, on, or always
    +archive_mode = off     # enables archiving; off, on, or always
    -#restore_command = ''  # command to use to restore an archived WAL file
    +restore_command = 'wal-g wal-fetch %f %p'
        
  • After adding in these changes, one should be able to successfully recover from WAL archive.

Crash Recovery + Standby (TODO)

From PostgreSQL: Documentation: 16: 20.5. Write Ahead Log:

“To start the server in targeted recovery mode, create a file called recovery.signal in the data directory. If both standby.signal and recovery.signal files are created, standby mode takes precedence. Targeted recovery mode ends when the archived WAL is fully replayed, or when recovery_target is reached. In this mode, the parameters from both this section and Section 20.5.6 will be used.”

This usecase is somewhat tricky to me and the can be combined with others things like streaming replication etc aswell.

Related concepts

Failover

See PostgreSQL: Documentation: 16: 27.3. Failover. I am just dumping important points from there and some peronal notes

  • STONITH (Shoot The Other Node In The Head): If the primary server fails and the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for informing the old primary that it is no longer the primary. This is necessary to avoid situations where both systems think they are the primary, which will lead to confusion and ultimately data loss.
  • PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server.
  • To trigger failover of a log-shipping standby server, run pg_ctl promote or call pg_promote().

Promotion & Timeline

  • Successful promotion will put the server in a new timeline, this is true for both crash recovery(implicit) and standby mode.
  • Each timeline will have its own file naming scheme, so they can’t overlap.
  • Once promotion is done, you cannot go back. Basically, do not promote standby to primary if you want this standby to be able to apply WALs from other primary.
  • There should be only one valid “current” timeline
  • SELECT timeline_id FROM pg_control_checkpoint();
  • For recovery
    • postgres implicitly(and automatically) gets promoted and the recovery.signal file is deleted on successful restore unless recovery_target_action is set to shutdown.
  • For stanby
    • postgres will NOT automatically promote anything, you’d need to manually call pg_ctl promote

Whether to set archive_mode when recovery/standby

There is some confusion around whether to keep archive_mode on when running a recovery server/standby server.

  • To be safe, I unset archive_command or set archive_mode=off for the standby/recovery postgres instance.
    • I don’t want it to be pushing archive_command immediately after getting restored(I’d want to verify things are working etc.)
  • After I verify, I manually re-set archive_command and re-trigger a base-backup from the new replica/recovered instance.
  • I am not sure what’s the best approach is, just something that works for me.
  • Confusion: instance where setting archive_command messed things up, but this discussion recommending against it. (I think it’s talking about disabling archive_command in the primary, whereas in my case I am talking about not having archive_command in the restore/standby instance)

WAL generation

  • WAL segment size is 16MB hardcoded at buildtime
  • What happens when we do archive_mode set to on
    • high iops?

Other notes/references

Aside on Backup ecosystem in Postgres (TODO)

This section is todo, I’ll update later.

Now I observe the postgres community from the sidelines and I don’t think I have any “real” postgres experience, however this is what i’ve figured reading the manual and opinion of other folks on the interwebs.

“pgBackRest is more conservative and kind of reliable, wal-g is more performant, both are much better than Barman”

I won’t mention who said it, my real surname is Barman (check my github fr) and I am not offended.

Replication vs Backup

Everything works except archive_command!

This is a short debugging story, that finally had a happy ending. :)

Creating the plain postgres container image from nixpkgs using dockerTools was simple. It builds off 24.05 which has a postgres version pinned, and it happens to be postgres16.4.x and built using glibc 2.39 but I was already running postgres already using the official debian based docker image. When I deployed the custom image to replace the official image, I got collation version mismatch similar to this issue. I initially thought this was related to locale, but later confirmed it was related the glibc version the postgres binary was compiled with and run from. The usually suggested way out of this is to REINDEX but my tables don’t even have a backup yet so I didn’t want to take any risk whatsoever.

WARNING:  database "<name>" has a collation version mismatch
DETAIL:  The database was created using collation version 2.36, but the operating system provides version 2.39.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE <name> REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

So for this, there were too many ways to fix this. I want to keep things simple, so I just bought back a previous failed attempt, instead of building from nixpkgs, I’ll use fromImage and use the official postgres image as base and install nixpkgs wal-g on top of it.

This initially failed because of this issue (the docker_entrypoint.sh in the official pg image had /usr/bin/env) but we resolved it using dockerTools.buildLayeredImage.fakeRootCommands. So now everything is working!

Old postgres official image is replaced by custom docker image, there’s no collation mismatch, I am able to exec into the container where postgres is running and am able to access the wal-g binary in the $PATH. All G!

Now things were straightforward, I just had to setup the archive_command to wal-g wal-push %p and I am good. So I did that.

I got hit by dreaded boi 127:

archive command failed with exit code 127

This drove me nuts. Initially I thought the forked archiver process was not able to access wal-g somehow. So I tried doing normal cp as the archive_command as that’s the simplest. That was failing too!

Then I tried:

  • archive_command = 'echo test > /tmp/archive_test.txt' : fail
  • archive_command = 'true' : fail!

Now since /bin/true was failing I realized, I might have fu*ked up big time. To confirm that this is in-fact my image, I tried setting the archive_command = 'true' in the official postgres image and it was working. So I was sure that whatever is happening is my doing.

After some googling and claude, I found that you could trigger archive on demand by manually doing a WAL switch using: SELECT pg_switch_wal(); So now could observe the postgres process at syscall level when it happens and not get lost in the sea of syscall. Let’s bring in old friend strace.

  • docker inspect -f '{{.State.Pid}}' <container id> : get the pid of the running postgres container (main process)
  • sudo strace -f -p <pid> -s 1024 -o /tmp/postgres_strace.log
  • Then manually trigger the WAL switch and hence archival_command and exit out of strace.
  • Inspect the log, I see:
    1053745 execve("/bin/sh", ["sh", "-c", "true"], 0xaaaaecdfef10 /* 65 vars */ <unfinished ...>
    1053746 close(3)                        = 0
    1053745 <... execve resumed>)           = -1 ENOENT (No such file or directory)
    1053746 signalfd4(-1, [URG], 8, SFD_CLOEXEC|SFD_NONBLOCK <unfinished ...>
    1053745 exit_group(127)                 = ?
        
  • It’s not able to find true! Upon looking up, /bin/true exists. Hmm.
  • Look closer.
  • It’s not able to find sh! Upon looking up, /bin/sh does not exists :)
  • Now this is the same nix issue as not finding env as previously mentioned.

So I added the fix in similar manner using: ln -sfn "${pkgs.bash}/bin/sh" /bin/sh and things finally started working.

Previous attempts

Creating normal Dockerfile (official docker image)

This image has a extension that i use but doesn’t have wal-g in it.

FROM postgres:16.2-bookworm AS builder

RUN apt-get update \
    && apt-get -y upgrade \
    && apt-get install -y --no-install-recommends \
    curl ca-certificates git\
    build-essential libpq-dev postgresql-server-dev-all
RUN update-ca-certificates

WORKDIR /srv
RUN git clone https://github.com/fboulnois/pg_uuidv7.git .
RUN for v in `seq 16`; do pg_buildext build-$v $v; done

# create tarball and checksums
RUN cp sql/pg_uuidv7--1.5.sql . && TARGETS=$(find * -name pg_uuidv7.so) \
  && tar -czvf pg_uuidv7.tar.gz $TARGETS pg_uuidv7--1.5.sql pg_uuidv7.control \
  && sha256sum pg_uuidv7.tar.gz $TARGETS pg_uuidv7--1.5.sql pg_uuidv7.control > SHA256SUMS

FROM postgres:16.2-bookworm AS runner

COPY --from=builder /srv/pg_uuidv7.tar.gz /srv/SHA256SUMS /srv/
COPY --from=builder /srv/${PG_MAJOR}/pg_uuidv7.so /usr/lib/postgresql/${PG_MAJOR}/lib
COPY --from=builder /srv/pg_uuidv7.control /usr/share/postgresql/${PG_MAJOR}/extension
COPY --from=builder /srv/pg_uuidv7--1.5.sql /usr/share/postgresql/${PG_MAJOR}/extension

Using dockerTools.fromImage (official docker image)

Update! I think i’ve found the issue mentioned in the following text, it’s related to the shebang on top of the entrypoint file but eitheway I have dropped the idea of using fromImage for now and going all in on building it from nix derivations.

Inconsistent treatment of /usr/bin/env in build sandbox vs. NixOS · Issue #1205 · NixOS/nix · GitHub

Update 2! Because I was having difficulty building pg16.2 from nixpkgs, I ended up using this approach as I had an exsiting db using pg16.2 and I faced a collation issue due to glibc version and I didn’t want to REINDEX yet.

# file: flake.nix
let
  # nix run nixpkgs#nix-prefetch-docker -- postgres --image-tag 16.2-bookworm --arch amd64 --os linux
  pg_amd64 = pkgs.dockerTools.pullImage {
    imageName = "postgres";
    imageDigest = "sha256:4aea012537edfad80f98d870a36e6b90b4c09b27be7f4b4759d72db863baeebb";
    sha256 = "1rizfs2f6l834cgym0jpp88g3r3mcrxn9fd58np91ny9fy29zyck";
    finalImageName = "postgres";
    finalImageTag = "16.2-bookworm";
    os = "linux";
    arch = "amd64";
  };
in {
  packages = {
    nix_postgres_docker = pkgs.dockerTools.buildLayeredImage  {
      name = builtins.getEnv "IMAGE_NAME";
      tag = builtins.getEnv "IMAGE_TAG";
      fromImage = pg_amd64; # TODO make conditional
      contents = with pkgs; [ cacert postgresql16Packages.pg_uuidv7 ];
      config = {
        Cmd = ["postgres"];
        entrypoint = [ "docker-entrypoint.sh" ];
      };
    };
  };
};

While this seems like it works from the face of it. Unfortunately or fortunately, it doesn’t. @NoobZ and @ManoftheSea from the unofficial NixOS discord channel talked me out of this and I stopped trying to make this work. If someone is interested to make this work, here is where I got stuck: “no such file or directory”.

λ just docker-build # success
λ just docker-load # success
λ just docker-run # fail
exec /usr/local/bin/docker-entrypoint.sh: no such file or directory
error: Recipe `docker-run` failed on line 36 with exit code 1

I am not sure why this happened, arch is what it should be and base image is a docker image so should not be nixos fsh issues.

Trying to install nixpkgs pg extensions into the correct location in the image would’ve been a real hassle.

To explore later

  • [ ] Streaming replication
  • [ ] Compression of WAL logs, esp when setting archive_timeout before WAL hits 16MB wal segment size. It seems like the the empty WAL segments (due to early archive_timeout consume all 16M)