Postgres fails to start : Read-only file system (podman)
daufinsyd opened this issue · 10 comments
Hi :)
I tried to deploy dyrector-io on the following system:
podman 3.4.2
go version go1.19.5 linux/amd64
Debian 11
using the following command
DOCKER_HOST=unix:///var/run/podman/podman.sock go/bin/dyo --disable-podman-checks up
(without --disable-podman-checks it fails:
10:43AM FTL Podman command execution error error="exit status 125"
though even with --debug I can't get more verbose output than that)
The pods start but postgres fails with:
initdb: error: could not change permissions of directory "/var/lib/postgresql/data": Read-only file system
An inspect gives the following:
"Mounts": [
{
"Type": "volume",
"Name": "dyrectorio-stack_kratos-postgres-data",
"Source": "/var/lib/containers/storage/volumes/dyrectorio-stack_kratos-postgres-data/_data",
"Destination": "/var/lib/postgresql/data",
"Driver": "local",
"Mode": "",
"Options": [
"nosuid",
"nodev",
"rbind"
],
"RW": false,
"Propagation": "rprivate"
}
],
Could it occurs because of "RW": false ?
Hi @daufinsyd! 👋
Yes that looks suspicious. Lots of things could happen here.
From podman's docs:
By default, the volumes are mounted read-write. See examples.
Out of curiosity: Do you intentionally use a non-rootless setup of podman? I am asking because that somewhat goes against its killer feature.
How your podman installation differs from a default one? What changes were introduced?
Hi @nandor-magyar
Thank you for your kind response :)
I installed podman using the official deb repo for debian 11 (since the podman version from the official repo ist pretty outdated).
Apart from that I didn't changed much, added a registry and that should be about it.
Out of curiosity: Do you intentionally use a non-rootless setup of podman? I am asking because that somewhat goes against its killer feature.
Yes, I didn't had lots of time to setup it up in rootless mode (plus some containers don't like running in non-root mode).
Is there a way to check what dyo tries to do ? the --debug option isn't that much verbose.
I'll try with another VM.
I created a brand new debian 11 and installed podman from the backports.
it fails without --disable-podman-checks
and with the option, it fails a bit later:
5:06PM FTL error="failde to create networks"
I guess podman 3.0.1 is too old for dyo.
With podman 3.4.2 (from stable podman repo: http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_11/)
I got the same RO error for postgres.
I noted that /etc/containers/policy.json for golang is in conflict with the file from podman (containers-common package).
Tho I don't think it's the culprit
{
"default": [
{
"type": "insecureAcceptAnything"
}
],
"transports":
{
"docker-daemon":
{
"": [{"type":"insecureAcceptAnything"}]
}
}
}
What is the minimum version of podman required by dyo ?
Maybe untill a specific version, podman mounts the volumes as RO.
To best of my knowledge podman has to be at least 4.0 with the aardvark-dns module, since we are using some of the introduced networking changes(netavark). I highly recommend to use a more fresh version of podman because security and cool new features etc.
deb https://download.opensuse.org/repositories/home:/alvistack/Debian_11/ /
You can use this repo, I'm using it myself on one of my VPS machines. For the record I did not tried dyrectorio stack with this repo but contains everything we need.
The --disable-podman-checks
is mainly used in our pipeline where podman is not available, and would fail otherwise.
Thank you, indeed that was the issue !
Maybe a comment in the readme or that podman-checks respons with incompatible version would be really helpful :)
May I ask you if you know if the dyrector-io podman network should have the
"options": {
"isolate": "true"
},
? (I'm now fighting with it)
Version checks for Podman as for Docker as well should land in the coming weeks.
I'm not sure if it should or not, but I have the same option in my Podman network and it works just fine.
Please describe what kind of issues are you experiencing in detail, so we can investigate better.
(I just don't want to mix the bugs reports)
It seems that dyo exposed ports can't be reached using the host interface (using the Pod's IP from podman does work).
To be sure I created an httpd Container in the same dyrectorio-stack network and it can be reached.
$ curl localhost:8082
<html><body><h1>It works!</h1></body></html>
But for dyo Containers it doesn't work:
curl localhost:8080 -vi
* Trying ::1:8080...
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
(I tried other ports 3000, 8000 ..., too)
this does work: (I omitted the redirection but it's traefik's page)
$ curl 10.89.0.10:8080 -vi
* Trying 10.89.0.10:8080...
* Connected to 10.89.0.10 (10.89.0.10) port 8080 (#0)
> GET / HTTP/1.1
> Host: 10.89.0.10:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
< Location: http://10.89.0.10:8080/dashboard/
Location: http://10.89.0.10:8080/dashboard/
< Date: Mon, 27 Feb 2023 13:48:40 GMT
Date: Mon, 27 Feb 2023 13:48:40 GMT
< Content-Length: 17
Content-Length: 17
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 10.89.0.10 left intact
I set iptables to allow all:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
(for reference, this are the default created rules, also doesn't work but for httpd)
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
NETAVARK_ISOLATION_1 all -- anywhere anywhere
NETAVARK_FORWARD all -- anywhere anywhere /* netavark firewall plugin rules */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain NETAVARK_FORWARD (1 references)
target prot opt source destination
ACCEPT all -- anywhere 10.89.0.0/24 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.89.0.0/24 anywhere
ACCEPT all -- anywhere 10.88.0.0/16 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.88.0.0/16 anywhere
Chain NETAVARK_ISOLATION_1 (1 references)
target prot opt source destination
NETAVARK_ISOLATION_2 all -- anywhere anywhere
Chain NETAVARK_ISOLATION_2 (1 references)
target prot opt source destination
DROP all -- anywhere anywhere
Ports are exposed on the host:
a4aacd629d9a docker.io/library/traefik:v2.9 --log.level=INFO ... 17 minutes ago Up 17 minutes 0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp dyrectorio-stack_traefik
...
I must say that I'm a bit lost. If it works on you side the issue must be somewhere on my configuration but I really can't see any clues. I would really appreciate your insights.
Btw i'm now on podman
Version: 4.4.1
API Version: 4.4.1
Go Version: go1.20.1
The port 8000 should expose traefik, which seems to work.
* Trying ::1:8080...
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
This message can be misleading, since the connection refused only applies for ipv6. If this is all the inputs you get there, I suggest you should be checking out crux-ui service, which is bound to 3000 by default, and traefik routes almost everything there. One of the exceptions are the crux backend service, which can be checked by curling the /api/status
path, because /api routes to this service (exposed to 5000 and 5001 ports, where 5001 is the API)
If you are more comfortable with docker compose files, you can checkout the PR #483 where you can make a public facing instance with that change.
Yes that was because I used localhost instead of its ipv4 notation.
The stack itself seems fine:
curl http://10.89.0.17:3000/api/status
{"crux":{"status":"operational","version":"0.3.4","lastMigration":"20230220120015_token_constraint"},"kratos":{"status":"operational","version":"v0.11.0"},"database":{"status":"operational","version":"20230220120015"},"app":{"status":"operational","version":"0.3.4"}}
curl http://127.0.0.1:3000/api/status -vi
* Trying 127.0.0.1:3000...
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /api/status HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.74.0
> Accept: */*
>
I'll check the PR you mentioned.
Indeed I'm not knowledgeable about go but with the compose file I can make try and fails if it still doesn't work.
I think we can close the issue since the DB works using more recent podman's version. I can create another one should I find a bug but given it's working for you, the issue must be somewhere on my end.
Anyway, thank you very much for your help !
Allright, do not hesitate to open another issue if you need help or have any question! Good luck!