nix-postgres-docker

Preface

I’d highly recommend reading the official docs, and only read this doc as supplement on how do some of these things fit in context of wal-g:

PostgreSQL: Documentation: 16: 20.5. Write Ahead Log

PostgreSQL: Documentation: 16: 30.5. WAL Configuration

PostgreSQL: Documentation: 16: Chapter 26. Backup and Restore

PostgreSQL: Documentation: 16: 26.3. Continuous Archiving and Point-in-Time Recovery (PITR)

PostgreSQL: Documentation: 16: 27.2. Log-Shipping Standby Servers

PostgreSQL: Documentation: 16: 27.3. Failover

PostgreSQL: Documentation: 16: 27.4. Hot Standby

PostgreSQL: Documentation: 16: 20.6. Replication

PostgreSQL: Documentation: 16: 55.4. Streaming Replication Protocol

PostgreSQL: Documentation: 16: 27.1. Comparison of Different Solutions

While I think I should use a managed pg or one of those baked distributions already, I also think selfhosting pg for my homelab has taught me many lessons that I wouldn’t learn otherwise, so I guess we shall continue doing so “for now”. (that too in a container)

Deploying postgres on containers is definitely a matter of discussion. one of the issues i faced was docker messing up with dynamic shared memory configuration in postgres, but otherwise i’ve been happy running postgres in a container. (i personally have been using nomad so far)

How to deploy, how to add a connection pooler etc. are concerns which are out of the scope of this repo at the moment. I’ll keep this document focused on “how to setup WAL archival using WAL-G” when running postgres in a container.

All of the things mentioned in this document is very much my experience playing around with wal-g and the particular scenario I had to use wal-g in, so most things may change based on what scenario/scale etc you need to use it.

For actual commands that I’ve tested, check the ./Justfile.wal-g.example file.

nix-postgres-docker

Context and the problem

To give you the context, I’ve been happily running postgres in my homelab without a backup(who needs one anyway) but then I planned on doing a user facing side project for which I decided, will use the same pg cluster. But now, even though I’ll get only 2 users it’s my duty to ensure that their data remains intact despite of the unfathomable amount of fuckery i am about to unleash on my homelab servers. So, in conclusion, i need a backup policy for the database.

I already have one actually, pg_dump > restic > blackblaze B2, this backs up more than just a db though. But, it is not ideal(various reasons) for a database that’ll have a continuous flow of writes to it. For this, you need what postgres folks call ”continuous archival” (many ways to do it, see aside section below). For sqlite, there’s litestream which does a similar thing pretty brilliantly.

So after doing some comparison of various backup solutions/methods, i decided to do with wal-g. You could actually do it without any 3rd party tool but they definitely help. (See below for comparison)

Previously I’ve been running a custom docker image based on the official postgres docker image to install the required extension. It’d be nice to extend it! In case of wal-g, the running postgres service(archiver (background) process) needs to call wal-g from the archive_command postgres configuration. (This gets triggered every-time there’s a new WAL segment, which can in-turn be triggered by setting archive_timeout in case your db doesn’t make enough commits to generate a new WAL segment but you still want continuous archival). Basically, you need wal-g and postgres in the same runtime and be callable($PATH).

Now to get wal-g running I just need to install wal-g into the docker image(that I have already), that’s a apt-get. Simple right? Yes! There are even some off the shelf community repos online for this exact usecase.

But I am stupid, so I decide to make my life harder. I have rest of my infra wired nicely in nix flakes and I run things on an arm machine and also locally(x86), so I thought to myself it’d be really nice to have “most” things in a flake. Honestly would’ve gone the first way if I knew doing it the nix way would put me in the trenches.

So now finally our problem becomes:

“Create a multi-arch production-ready(walg+compatibility w official image+custom extensions) postgres docker image using NixOS dockerTools AND make sure the auxiliary processes(eg. base backup shipping, cleanup etc.) are running as expected”

Solution

Creating the docker image / Breakdown of postgres docker image.

I initially attempted to base off the official postgres docker image, this was not successful. So the alternative is building the image directly from Nix definitions. This is usually pretty simple, but for postgres things are a bit complicated you need the postgres user, postgres has initdb scripts, the official image makes use of gosu etc which I don’t properly understand.

But it was doable, I did it in two different ways:

Based on official postgres docker image: Most things work nicely, but I am not sure how would I install postgres extensions to this. It will be possible just not super straightforward.
Based on nixpkgs postgres: Installing wal-g, installing other things along with postgres extension to the docker container is pretty straightforward.

See ./flake.nix

Additionally, github actions is setup at .github/workflows to build this image for both amd64 and arm64

Setting up `wal-g` & references

Gitlab has an amazing runbook for wal-g

https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/patroni/postgresql-backups-wale-walg.md

https://archive.ph/83BLw

My impression on wal-g, while I think it’s great software but it lacks heavily in documentation. I am not sure why such a widely adopted tool has such poor documentation(lot of undocumented cli features). Lot of conceptual things which are important to understand for something like managing backup are not mentioned at all. Fortunately, they’ve a super friendly and nice telegram and the developers hang around and help resolve issues. I’d personally like to work on the docs but I’ll let things sink in for a while.

Command reference

Following are wal-g commands, what they do and when to run them.

context	wal-g command	what it does	when to run?
base-backup	`backup-list`	List of successful “base backups”, also will have delta backups	manually, to check (This won’t list anything till your first base backup, even if `archive_command` is sending in WAL(s))
		`backup-list --detail --pretty` is useful as-well
	`backup-push`	create a full `$PGDATA` backup	periodic(auto), via `systemd-timer`
		`backup-push` will decide on it’s own when to do delta or full
		base backup is always done in reference to a WAL LSN, this is helpful in many ways(restore, deletion)
	`backup-fetch`	Restore PostgreSQL data directory from a full backup	manually, when shit hits the fan
delete	`garbage` (WAL)	removes unnecessary WAL(s), leaving only stuff that can be used for restore	periodic(auto), via `systemd-timer`
	`retain` (base-backup)	Can be `FULL` or `FIND_FULL`, deletes certain of “base backups” and dependent WALs	periodic(auto), via `systemd-timer` (I just keep last 2 full backups: `wal-g delete retain FULL 2`)
	`target` (base-backup)	delete specific base backup, does not delete WALs	I don’t run this
	`before` (base-backup)	delete specific base backup, before a certain base backup	I don’t run this
	`everything`	Deletes everything except parmanent backups (`--parmanent`)	I don’t run this except for debugging
wal	`wal-push`	pushes WALs as part of `archive_command`	never run manually
	`wal-fetch`	fetches WALs as part of `restore_command`	never run manually, but only run during restoration. The gitlab runbook mentions that they nolonger use this, but use streaming replication directly
replication	`wal-receive`	I haven’t explored this yet, but can be an alternative to `wal-push`
	`catchup-*`	I haven’t explored these yet
verify		There are verify and checksum check related commands aswell

Making base backup

This could be done as a sidecar job also in cloud native environments but since i have the machine to myself I like to setup systemd-timer. Other than the timer, I can now ssh into the machine and manually trigger the backup with sudo systemctl start walg-backup-push. The systemd service looks something like:
wal-g has an issue that makes it little in-convenient to run base-backups from the host when running postgres in container: wal-g/wal-g#1782

# walg-backup-push will do a full backup and clean unnessary wal files from the backup
systemd.services."walg-backup-push" = {
  script = ''
  set -eu
  export AWS_ENDPOINT=
  export WALG_S3_PREFIX=s3://<bucket_name>/<custom_suffix>
  export AWS_ACCESS_KEY_ID=<agenix+cat>
  export AWS_SECRET_ACCESS_KEY=<agenix+cat>
  export PGPASSWORD=<agenix>

    ${pkgs.wal-g}/bin/wal-g backup-push $WALG_PGDATA && ${pkgs.wal-g}/bin/wal-g delete garbage
  '';
  serviceConfig = {
    Type = "oneshot";
    User = "root";
  };
  environment = {
    WALG_PGDATA = "/var/lib/postgres/data";
    PGHOST = "localhost";
    PGPORT = "5432";
    PGUSER = "postgres";
  };
};
systemd.timers."walg-backup-push" = {
  wantedBy = [ "timers.target" ];
  timerConfig = {
    OnCalendar = "weekly"; # my db updates non-frequently, OK with 1 week worth of WAL
    Persistent = true;
    Unit = "walg-backup-push.service";
  };
};

Running WAL archival

Just set archive_command = 'wal-g wal-push %p' and we’re all set. It needs the env vars: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ENDPOINT, WALG_S3_PREFIX if you’re using a s3 backend(blackblaze). You can set them however you want(eg. I set them using nomad env vars)
If WAL shipping (archive_command) fails for some reason, WAL files will be kept on the server until the disk is running full!
There are a number of cases where the same WAL segment can be pushed more than once, especially after failures where Postgres is not sure that the command completed.

Restoration/Standby

Restoring and standby is pretty involved, I’ve created a separate higher level section for it.

postgres restore related command reference

The official docs(pg16.04) are much better reference but just dumping here for an overview.

As of pg17 only the date/time and named restore point options are very usable, since there are no tools to help you identify with any accuracy which transaction ID to use.

date +'%Y-%m-%d %H:%M:%S.%N' | cut -b 1-23 gives numeric UTC offset

pg_controldata -D $PGDATA

pg_resetwal : postgresql error PANIC: could not locate a valid checkpoint record - Stack Overflow

Context	Name	Description	Crash recovery mode	Standby mode
	`restore_command`		Required	Optional, can be combined with Streaming Replication
Recovery Target(“till this”)	`recovery_target`	Only allowed value is `immediate`, till consistent state	Yes(oneof)	N/A
	`recovery_target_name`	if `pg_create_restore_point` was used	Yes(oneof)	N/A
	`recovery_target_time`	numeric offset from UTC, time stamp up to which recovery will proceed.	Yes(oneof)	N/A
	`recovery_target_lsn`	If you know the exact lsn	Yes(oneof)	N/A
	`recovery_target_xid`	If you know the exact tx_id	Yes(oneof)	N/A
	`recovery_target_inclusive`	Related to `recovery_target_[time/lsn/xid]` for one-off adjustments	Yes(default:on)	N/A
	`recovery_target_timeline`	The `recovery_target_*` can only belong to one timeline_id.	Yes(default:latest wal timeline)	N/A
	`recovery_target_action`	What happens when restore completes (`pause/shutdown/promote`). see doc for details	Yes(default:pause)	N/A

Cleanup and Maintenance

If you make a backup-push with the --parmanent flag(or using backup-mark), it’ll not be picked by the delete command.
wal-g delete garbage can be run periodically to remove WAL files which
I think if you run wal-g retain then you don’t really need to run garbage but you can if leftovers are there.

Restoration, Crash Recovery and Standby

All of these need a base backup or to be configured to use pg_rewind, I personally have not used pg_rewind yet.

All of the places where streaming replication can be applied, it can either be sync or async , sync replication is not possible with WAL archival as it operates at commit level and WAL archival works at WAL segment level.

.history errors in the logs are expected when doing crash recovery: wal-g/wal-g#628

Order of operations matter: Failed Postgres Replication from master to standby

hot_standby is now on by default. So if you create a standby server, you’ll be able to do reads on it as archives are fetches. Essentially a read replica.

Combination	SR protocol	WAL Archival (PITR) / File based / Log shipping	Notes	Doubts
Warm Standby	No	`standby.signal`	primary & standby (`archive_command` <=> `restore_command`)
Hot Standby	No	`standby.signal` + `hot_standby=on` (default)	warm standby + read-only
Warm Standby	Yes	No	This is no backup, just replicas (`primary_conninfo` et al)
Hot Standby	Yes	No	This is no backup, just replicas (`primary_conninfo` et al)
Warm Standby	Yes	Yes	This is ideal (`primary_conninfo` et al + `restore_command`)
Hot Standby	Yes	Yes	This is also ideal (`primary_conninfo` et al + `restore_command`)
Crash Recovery (Normal)	No	Yes, use `recovery.signal`	This is the most basic scenario (`restore_command`)	Do we really need to disable `archive_mode` here during recovery?
Crash Recovery (w Standby)	No	Yes, use `recovery.signal` and use `standby.signal` both	In this scenario, standby takes precedence (`restore_command`)
Crash Recovery (w Standby w SR)	Yes	Yes	WAL archive + SR + Instant promotion	I don’t see this scenario being very useful/intuitive/practical

Standby Servers (Async)

These can be either warm or hot

via Streaming Replication (TODO)

I’ve not done this yet
You need to set primary_conninfo. restore_command is not used here.

via WAL Archival (PITR)

This section is about running a “standby server” with restore_command + standby.signal

Fetch the base backup

export AWS_ENDPOINT= # optional
export WALG_S3_PREFIX= # eg. s3://<bucket_name>/app_pg
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_REGION= # optional if region already specified in endpoint
export PATH_TO_PGDATA=/srv/data/pg_data

wal-g backup-fetch $PATH_TO_PGDATA LATEST

For ~100GB data, it took about 15mins with 30MB/s + extraction
This is more of a one time manual thing
Find a place where you want to restore the base backup to, this will usually be the $PGDATA of the new postgres server to be restored/standby to be created.
Use wal-g backup-fetch to fetch the base backup you want.
In case of running postgres in a container
- You’d want to run backup-fetch on the host to whatever the $PGDATA directory should be. And when that $PGDATA gets mounted into the container make sure its given appropriate permissions etc.
- In case of using the official postgres image in a non-rootless/rootfull environment, postgres uses gosu, so you can in-fact run backup-fetch as root and you should be fine. Whether this is a good idea? ¯\_(ツ)_/¯

Special Usecase: Restore only with base backup

Usually you’d want fresh WAL etc. (that’s the traditional continious backup)
But if you wanted more of the pg_restore kind of behavior with wal-g base backup
- Then even if you’re running in standby mode, you can’t run pg_ctl promote, it’ll give error that not runing in standby ()
The restore_command = 'wal-g wal-fetch %f %p' is absolutely useless. as it’s used for continious archival and we not doing that here.

If you don’t specify the standby.signal you’d get the following:

2024-12-10 17:10:48.788 UTC [28] HINT:  If you are restoring from a backup, touch "/srv/data/postgres/recovery.signal" and add required recovery options.
If you are not restoring from a backup, try removing the file "/srv/data/postgres/backup_label".
Be careful: removing "/srv/data/postgres/backup_label" will result in a corrupt cluster if restoring from a backup.

BUT
- basebackup itself need WAL for restoing too, WAL-G operates same. WAL is used for PITR, but even if you restoring to the time of backup (without PITR), you still need WAL.

So in summary, this usecase is not possible iiuc.

Make changes to `postgresql.conf` and `$PGDATA` for standby

Add standby.signal inside $PGDATA, this will tell postgres to start in standby mode. If you don’t specify this and yet have a restore_command, pg will safely error out.

Update standby settings in postgresql.conf

--- a/workloads/nomad/pg/postgresql.conf
+++ b/workloads/nomad/pg/postgresql.conf
-archive_mode = on		enables archiving; off, on, or always
+archive_mode = off		# enables archiving; off, on, or always
-#restore_command = ''		# command to use to restore an archived WAL file
+restore_command = 'wal-g wal-fetch %f %p'

DO NOT SET any of the other recovery_* config(s), those are meant to be set during recovery and will mess things up if set when setting up a standby.
Start the postgres service(the container if running via container).
At this point, it should take a while based on how far ago the base backup was taken/how many WALs it needs to catch up on etc. Once it catches up, it’ll continue to be in standby mode.
To continue to be in standby mode means, postgres will keep looking for the next WAL log, which is probably not yet pushed by the primary’s archive_command, as soon as the next segment is available it’ll load that in.
- See this issue for an example of this behavior.

Promote as needed

Once you know that the “current” primary is no longer going to push new WAL logs/data, you can simply shutdown the primary and promote the stanby as per need.
You can promote a standby by running pg_ctl promote or calling pg_promote(). This will also delete the standby.signal file. If you’re running postgres in an container, exec into the container as the postgres user and run pg_ctl promote
After promoting you may want to remove/comment the restore_command config, it’ll not be picked up anyway by postgres but better to comment it out imo.

Standby Servers (Sync) (ONLY via Streaming Replication) (TODO)

sync replication happens at commit level.

Synchronous standby servers cannot happen without postgres streaming replication protocol, hence won’t work with WAL archival.

These can be either warm or hot

I’ve not done this yet

Crash Recovery (PITR)

PITR, as I understand PITR, it’s is not one single thing. It’s the combination of all the different recovery targets, mechanism to do the backups and the concept of postgres timeline etc. With the concept of timeline ID, it is possible that the same LSN or the same WAL segments exist in multiple timelines.

When using different a recovery_* target other than recovery_target and when recovery_target_action, you can use pg_wal_replay_resume() to play/pause the recovery process. Eg. You want to manually check if some data is recovered or no etc. Like using a debugger with steps kind of. But usually this is not needed simply using recovery_target is enough for me.

During recovery mode, if the recovery is terminated because of an external error, the server can simply be restarted and it will continue recovery

via Streaming Replication (TODO)

I’ve not done this yet
I don’t completely understand this scenario because after recovery, the recovery pg instance would implicitly be promoted but doing this via streaming replication means the primary is live. So in which case do we even need this?

via WAL Archival (PITR)

Differences with standby mode/servers

This is similar to standby servers with the following gotchas/differences(there’ll be more ofc):

The settings(postgresql.conf) here are only needed for the duration of restore, once done they should be reset.
Instead of standby.signal, we’d add recovery.signal
Instead of the standby mode behavior, where it stays in “standby mode” until promoted, in archival/crash recovery, as soon as recovery is successful, the postgres server will implicitly get promoted unlike standby mode.
Since recovery mode does implicit promotion, you cannot really run multiple recovery, if that’s a usecase you probably need a standby instead of recovery. See this issue for more info.

Fetch the base backup

This is same as done for standby servers

Make changes to `postgresql.conf` and `$PGDATA` for recovery

Add recovery.signal inside $PGDATA, this will tell postgres to start in recovery mode.
- The server will enter recovery and will not stop recovery when the end of archived WAL is reached, after which it’ll implicitly promote itself/do what recovery_target_action is set to.
- The recovery_* configuration control how this behaves (Eg. you can control for an earlier stopping point than end of WAL etc.)

Update recovery settings in postgresql.conf

--- a/workloads/nomad/postgres/postgresql.conf
+++ b/workloads/nomad/postgres/postgresql.conf
-archive_mode = on      # enables archiving; off, on, or always
+archive_mode = off     # enables archiving; off, on, or always
-#restore_command = ''  # command to use to restore an archived WAL file
+restore_command = 'wal-g wal-fetch %f %p'

After adding in these changes, one should be able to successfully recover from WAL archive.

Crash Recovery + Standby (TODO)

From PostgreSQL: Documentation: 16: 20.5. Write Ahead Log:

“To start the server in targeted recovery mode, create a file called recovery.signal in the data directory. If both standby.signal and recovery.signal files are created, standby mode takes precedence. Targeted recovery mode ends when the archived WAL is fully replayed, or when recovery_target is reached. In this mode, the parameters from both this section and Section 20.5.6 will be used.”

This usecase is somewhat tricky to me and the can be combined with others things like streaming replication etc aswell.

Related concepts

Failover

See PostgreSQL: Documentation: 16: 27.3. Failover. I am just dumping important points from there and some peronal notes

STONITH (Shoot The Other Node In The Head): If the primary server fails and the standby server becomes the new primary, and then the old primary restarts, you must have a mechanism for informing the old primary that it is no longer the primary. This is necessary to avoid situations where both systems think they are the primary, which will lead to confusion and ultimately data loss.
PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby database server.
To trigger failover of a log-shipping standby server, run pg_ctl promote or call pg_promote().

Promotion & Timeline

Successful promotion will put the server in a new timeline, this is true for both crash recovery(implicit) and standby mode.

Each timeline will have its own file naming scheme, so they can’t overlap.

Once promotion is done, you cannot go back. Basically, do not promote standby to primary if you want this standby to be able to apply WALs from other primary.

There should be only one valid “current” timeline
SELECT timeline_id FROM pg_control_checkpoint();
For recovery
- postgres implicitly(and automatically) gets promoted and the recovery.signal file is deleted on successful restore unless recovery_target_action is set to shutdown.
For stanby
- postgres will NOT automatically promote anything, you’d need to manually call pg_ctl promote

Whether to set `archive_mode` when recovery/standby

There is some confusion around whether to keep archive_mode on when running a recovery server/standby server.

To be safe, I unset archive_command or set archive_mode=off for the standby/recovery postgres instance.
- I don’t want it to be pushing archive_command immediately after getting restored(I’d want to verify things are working etc.)
After I verify, I manually re-set archive_command and re-trigger a base-backup from the new replica/recovered instance.
I am not sure what’s the best approach is, just something that works for me.
Confusion: instance where setting archive_command messed things up, but this discussion recommending against it. (I think it’s talking about disabling archive_command in the primary, whereas in my case I am talking about not having archive_command in the restore/standby instance)

WAL generation

WAL segment size is 16MB hardcoded at buildtime
What happens when we do archive_mode set to on
- high iops?

Other notes/references

Aside on Backup ecosystem in Postgres (TODO)

This section is todo, I’ll update later.

Now I observe the postgres community from the sidelines and I don’t think I have any “real” postgres experience, however this is what i’ve figured reading the manual and opinion of other folks on the interwebs.

“pgBackRest is more conservative and kind of reliable, wal-g is more performant, both are much better than Barman”

I won’t mention who said it, my real surname is Barman (check my github fr) and I am not offended.

If pg_dump is not a backup tool, what is? | Lobsters

Replication vs Backup

Everything works except `archive_command`!

This is a short debugging story, that finally had a happy ending. :)

Creating the plain postgres container image from nixpkgs using dockerTools was simple. It builds off 24.05 which has a postgres version pinned, and it happens to be postgres16.4.x and built using glibc 2.39 but I was already running postgres already using the official debian based docker image. When I deployed the custom image to replace the official image, I got collation version mismatch similar to this issue. I initially thought this was related to locale, but later confirmed it was related the glibc version the postgres binary was compiled with and run from. The usually suggested way out of this is to REINDEX but my tables don’t even have a backup yet so I didn’t want to take any risk whatsoever.

WARNING:  database "<name>" has a collation version mismatch
DETAIL:  The database was created using collation version 2.36, but the operating system provides version 2.39.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE <name> REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

So for this, there were too many ways to fix this. I want to keep things simple, so I just bought back a previous failed attempt, instead of building from nixpkgs, I’ll use fromImage and use the official postgres image as base and install nixpkgs wal-g on top of it.

This initially failed because of this issue (the docker_entrypoint.sh in the official pg image had /usr/bin/env) but we resolved it using dockerTools.buildLayeredImage.fakeRootCommands. So now everything is working!

Old postgres official image is replaced by custom docker image, there’s no collation mismatch, I am able to exec into the container where postgres is running and am able to access the wal-g binary in the $PATH. All G!

Now things were straightforward, I just had to setup the archive_command to wal-g wal-push %p and I am good. So I did that.

I got hit by dreaded boi 127:

archive command failed with exit code 127

This drove me nuts. Initially I thought the forked archiver process was not able to access wal-g somehow. So I tried doing normal cp as the archive_command as that’s the simplest. That was failing too!

Then I tried:

archive_command = 'echo test > /tmp/archive_test.txt' : fail
archive_command = 'true' : fail!

Now since /bin/true was failing I realized, I might have fu*ked up big time. To confirm that this is in-fact my image, I tried setting the archive_command = 'true' in the official postgres image and it was working. So I was sure that whatever is happening is my doing.

After some googling and claude, I found that you could trigger archive on demand by manually doing a WAL switch using: SELECT pg_switch_wal(); So now could observe the postgres process at syscall level when it happens and not get lost in the sea of syscall. Let’s bring in old friend strace.

docker inspect -f '{{.State.Pid}}' <container id> : get the pid of the running postgres container (main process)
sudo strace -f -p <pid> -s 1024 -o /tmp/postgres_strace.log
Then manually trigger the WAL switch and hence archival_command and exit out of strace.

Inspect the log, I see:

1053745 execve("/bin/sh", ["sh", "-c", "true"], 0xaaaaecdfef10 /* 65 vars */ <unfinished ...>
1053746 close(3)                        = 0
1053745 <... execve resumed>)           = -1 ENOENT (No such file or directory)
1053746 signalfd4(-1, [URG], 8, SFD_CLOEXEC|SFD_NONBLOCK <unfinished ...>
1053745 exit_group(127)                 = ?

It’s not able to find true! Upon looking up, /bin/true exists. Hmm.
Look closer.
It’s not able to find sh! Upon looking up, /bin/sh does not exists :)
Now this is the same nix issue as not finding env as previously mentioned.

So I added the fix in similar manner using: ln -sfn "${pkgs.bash}/bin/sh" /bin/sh and things finally started working.

Previous attempts

Creating normal Dockerfile (official docker image)

This image has a extension that i use but doesn’t have wal-g in it.

FROM postgres:16.2-bookworm AS builder

RUN apt-get update \
    && apt-get -y upgrade \
    && apt-get install -y --no-install-recommends \
    curl ca-certificates git\
    build-essential libpq-dev postgresql-server-dev-all
RUN update-ca-certificates

WORKDIR /srv
RUN git clone https://github.com/fboulnois/pg_uuidv7.git .
RUN for v in `seq 16`; do pg_buildext build-$v $v; done

# create tarball and checksums
RUN cp sql/pg_uuidv7--1.5.sql . && TARGETS=$(find * -name pg_uuidv7.so) \
  && tar -czvf pg_uuidv7.tar.gz $TARGETS pg_uuidv7--1.5.sql pg_uuidv7.control \
  && sha256sum pg_uuidv7.tar.gz $TARGETS pg_uuidv7--1.5.sql pg_uuidv7.control > SHA256SUMS

FROM postgres:16.2-bookworm AS runner

COPY --from=builder /srv/pg_uuidv7.tar.gz /srv/SHA256SUMS /srv/
COPY --from=builder /srv/${PG_MAJOR}/pg_uuidv7.so /usr/lib/postgresql/${PG_MAJOR}/lib
COPY --from=builder /srv/pg_uuidv7.control /usr/share/postgresql/${PG_MAJOR}/extension
COPY --from=builder /srv/pg_uuidv7--1.5.sql /usr/share/postgresql/${PG_MAJOR}/extension

Using `dockerTools.fromImage` (official docker image)

Update! I think i’ve found the issue mentioned in the following text, it’s related to the shebang on top of the entrypoint file but eitheway I have dropped the idea of using fromImage for now and going all in on building it from nix derivations.

Inconsistent treatment of /usr/bin/env in build sandbox vs. NixOS · Issue #1205 · NixOS/nix · GitHub

Update 2! Because I was having difficulty building pg16.2 from nixpkgs, I ended up using this approach as I had an exsiting db using pg16.2 and I faced a collation issue due to glibc version and I didn’t want to REINDEX yet.

dockerTools allow you pull from another image. So since the official comes with the starter scripts that run initdb and it does a lot of other things. I thought it would be nice to just use it. But it didn’t work out.
See Building on dockerfile-based images - Help - NixOS Discourse

# file: flake.nix
let
  # nix run nixpkgs#nix-prefetch-docker -- postgres --image-tag 16.2-bookworm --arch amd64 --os linux
  pg_amd64 = pkgs.dockerTools.pullImage {
    imageName = "postgres";
    imageDigest = "sha256:4aea012537edfad80f98d870a36e6b90b4c09b27be7f4b4759d72db863baeebb";
    sha256 = "1rizfs2f6l834cgym0jpp88g3r3mcrxn9fd58np91ny9fy29zyck";
    finalImageName = "postgres";
    finalImageTag = "16.2-bookworm";
    os = "linux";
    arch = "amd64";
  };
in {
  packages = {
    nix_postgres_docker = pkgs.dockerTools.buildLayeredImage  {
      name = builtins.getEnv "IMAGE_NAME";
      tag = builtins.getEnv "IMAGE_TAG";
      fromImage = pg_amd64; # TODO make conditional
      contents = with pkgs; [ cacert postgresql16Packages.pg_uuidv7 ];
      config = {
        Cmd = ["postgres"];
        entrypoint = [ "docker-entrypoint.sh" ];
      };
    };
  };
};

While this seems like it works from the face of it. Unfortunately or fortunately, it doesn’t. @NoobZ and @ManoftheSea from the unofficial NixOS discord channel talked me out of this and I stopped trying to make this work. If someone is interested to make this work, here is where I got stuck: “no such file or directory”.

λ just docker-build # success
λ just docker-load # success
λ just docker-run # fail
exec /usr/local/bin/docker-entrypoint.sh: no such file or directory
error: Recipe `docker-run` failed on line 36 with exit code 1

I am not sure why this happened, arch is what it should be and base image is a docker image so should not be nixos fsh issues.

Trying to install nixpkgs pg extensions into the correct location in the image would’ve been a real hassle.

To explore later

[ ] Streaming replication
[ ] Compression of WAL logs, esp when setting archive_timeout before WAL hits 16MB wal segment size. It seems like the the empty WAL segments (due to early archive_timeout consume all 16M)

geekodour/nix-postgres-docker

nix-postgres-docker

Preface

Table of contents

Context and the problem

Solution

Creating the docker image / Breakdown of postgres docker image.

Setting up wal-g & references

Command reference

Making base backup

Running WAL archival

Restoration/Standby

postgres restore related command reference

Cleanup and Maintenance

Restoration, Crash Recovery and Standby

Standby Servers (Async)

via Streaming Replication (TODO)

via WAL Archival (PITR)

Fetch the base backup

Special Usecase: Restore only with base backup

Make changes to postgresql.conf and $PGDATA for standby

Promote as needed

Standby Servers (Sync) (ONLY via Streaming Replication) (TODO)

Crash Recovery (PITR)

via Streaming Replication (TODO)

via WAL Archival (PITR)

Differences with standby mode/servers

Fetch the base backup

Make changes to postgresql.conf and $PGDATA for recovery

Crash Recovery + Standby (TODO)

Related concepts

Failover

Promotion & Timeline

Whether to set archive_mode when recovery/standby

WAL generation

Other notes/references

Aside on Backup ecosystem in Postgres (TODO)

Replication vs Backup

Everything works except archive_command!

Previous attempts

Creating normal Dockerfile (official docker image)

Using dockerTools.fromImage (official docker image)

To explore later

Setting up `wal-g` & references

Make changes to `postgresql.conf` and `$PGDATA` for standby

Make changes to `postgresql.conf` and `$PGDATA` for recovery

Whether to set `archive_mode` when recovery/standby

Everything works except `archive_command`!

Using `dockerTools.fromImage` (official docker image)