dyrector-io/dyrectorio

Postgres fails to start : Read-only file system (podman)

daufinsyd opened this issue · 10 comments

Hi :)

I tried to deploy dyrector-io on the following system:

podman 3.4.2
go version go1.19.5 linux/amd64
Debian 11

using the following command

DOCKER_HOST=unix:///var/run/podman/podman.sock go/bin/dyo --disable-podman-checks up

(without --disable-podman-checks it fails:
10:43AM FTL Podman command execution error error="exit status 125"
though even with --debug I can't get more verbose output than that)

The pods start but postgres fails with:

initdb: error: could not change permissions of directory "/var/lib/postgresql/data": Read-only file system

An inspect gives the following:

        "Mounts": [
            {
                "Type": "volume",
                "Name": "dyrectorio-stack_kratos-postgres-data",
                "Source": "/var/lib/containers/storage/volumes/dyrectorio-stack_kratos-postgres-data/_data",
                "Destination": "/var/lib/postgresql/data",
                "Driver": "local",
                "Mode": "",
                "Options": [
                    "nosuid",
                    "nodev",
                    "rbind"
                ],
                "RW": false,
                "Propagation": "rprivate"
            }
        ],

Could it occurs because of "RW": false ?

Hi @daufinsyd! 👋

Yes that looks suspicious. Lots of things could happen here.
From podman's docs:

By default, the volumes are mounted read-write. See examples.

Out of curiosity: Do you intentionally use a non-rootless setup of podman? I am asking because that somewhat goes against its killer feature.

How your podman installation differs from a default one? What changes were introduced?

Hi @nandor-magyar
Thank you for your kind response :)

I installed podman using the official deb repo for debian 11 (since the podman version from the official repo ist pretty outdated).
Apart from that I didn't changed much, added a registry and that should be about it.

Out of curiosity: Do you intentionally use a non-rootless setup of podman? I am asking because that somewhat goes against its killer feature.

Yes, I didn't had lots of time to setup it up in rootless mode (plus some containers don't like running in non-root mode).

Is there a way to check what dyo tries to do ? the --debug option isn't that much verbose.
I'll try with another VM.

I created a brand new debian 11 and installed podman from the backports.
it fails without --disable-podman-checks
and with the option, it fails a bit later:

5:06PM FTL error="failde to create networks"

I guess podman 3.0.1 is too old for dyo.

With podman 3.4.2 (from stable podman repo: http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_11/)

I got the same RO error for postgres.

I noted that /etc/containers/policy.json for golang is in conflict with the file from podman (containers-common package).
Tho I don't think it's the culprit

{
    "default": [
        {
            "type": "insecureAcceptAnything"
        }
    ],
    "transports":
        {
            "docker-daemon":
                {
                    "": [{"type":"insecureAcceptAnything"}]
                }
        }
}

What is the minimum version of podman required by dyo ?
Maybe untill a specific version, podman mounts the volumes as RO.

To best of my knowledge podman has to be at least 4.0 with the aardvark-dns module, since we are using some of the introduced networking changes(netavark). I highly recommend to use a more fresh version of podman because security and cool new features etc.

deb https://download.opensuse.org/repositories/home:/alvistack/Debian_11/ /

You can use this repo, I'm using it myself on one of my VPS machines. For the record I did not tried dyrectorio stack with this repo but contains everything we need.

The --disable-podman-checks is mainly used in our pipeline where podman is not available, and would fail otherwise.

Thank you, indeed that was the issue !

Maybe a comment in the readme or that podman-checks respons with incompatible version would be really helpful :)

May I ask you if you know if the dyrector-io podman network should have the

          "options": {
               "isolate": "true"
          },

? (I'm now fighting with it)

Version checks for Podman as for Docker as well should land in the coming weeks.

I'm not sure if it should or not, but I have the same option in my Podman network and it works just fine.
Please describe what kind of issues are you experiencing in detail, so we can investigate better.

(I just don't want to mix the bugs reports)

It seems that dyo exposed ports can't be reached using the host interface (using the Pod's IP from podman does work).

To be sure I created an httpd Container in the same dyrectorio-stack network and it can be reached.

$ curl localhost:8082
<html><body><h1>It works!</h1></body></html>

But for dyo Containers it doesn't work:

curl localhost:8080 -vi
*   Trying ::1:8080...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>

(I tried other ports 3000, 8000 ..., too)

this does work: (I omitted the redirection but it's traefik's page)

$ curl 10.89.0.10:8080 -vi
*   Trying 10.89.0.10:8080...
* Connected to 10.89.0.10 (10.89.0.10) port 8080 (#0)
> GET / HTTP/1.1
> Host: 10.89.0.10:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
< Location: http://10.89.0.10:8080/dashboard/
Location: http://10.89.0.10:8080/dashboard/
< Date: Mon, 27 Feb 2023 13:48:40 GMT
Date: Mon, 27 Feb 2023 13:48:40 GMT
< Content-Length: 17
Content-Length: 17
< Content-Type: text/plain; charset=utf-8
Content-Type: text/plain; charset=utf-8

<
* Connection #0 to host 10.89.0.10 left intact

I set iptables to allow all:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

(for reference, this are the default created rules, also doesn't work but for httpd)

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
NETAVARK_ISOLATION_1  all  --  anywhere             anywhere
NETAVARK_FORWARD  all  --  anywhere             anywhere             /* netavark firewall plugin rules */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain NETAVARK_FORWARD (1 references)
target     prot opt source               destination
ACCEPT     all  --  anywhere             10.89.0.0/24         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.89.0.0/24         anywhere
ACCEPT     all  --  anywhere             10.88.0.0/16         ctstate RELATED,ESTABLISHED
ACCEPT     all  --  10.88.0.0/16         anywhere

Chain NETAVARK_ISOLATION_1 (1 references)
target     prot opt source               destination
NETAVARK_ISOLATION_2  all  --  anywhere             anywhere

Chain NETAVARK_ISOLATION_2 (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere

Ports are exposed on the host:

a4aacd629d9a  docker.io/library/traefik:v2.9                     --log.level=INFO ...  17 minutes ago  Up 17 minutes  0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp            dyrectorio-stack_traefik
...

I must say that I'm a bit lost. If it works on you side the issue must be somewhere on my configuration but I really can't see any clues. I would really appreciate your insights.

Btw i'm now on podman

Version:      4.4.1
API Version:  4.4.1
Go Version:   go1.20.1

The port 8000 should expose traefik, which seems to work.

*   Trying ::1:8080...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>

This message can be misleading, since the connection refused only applies for ipv6. If this is all the inputs you get there, I suggest you should be checking out crux-ui service, which is bound to 3000 by default, and traefik routes almost everything there. One of the exceptions are the crux backend service, which can be checked by curling the /api/status path, because /api routes to this service (exposed to 5000 and 5001 ports, where 5001 is the API)

If you are more comfortable with docker compose files, you can checkout the PR #483 where you can make a public facing instance with that change.

Yes that was because I used localhost instead of its ipv4 notation.
The stack itself seems fine:

curl http://10.89.0.17:3000/api/status
{"crux":{"status":"operational","version":"0.3.4","lastMigration":"20230220120015_token_constraint"},"kratos":{"status":"operational","version":"v0.11.0"},"database":{"status":"operational","version":"20230220120015"},"app":{"status":"operational","version":"0.3.4"}}
curl http://127.0.0.1:3000/api/status -vi
*   Trying 127.0.0.1:3000...
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /api/status HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.74.0
> Accept: */*
>

I'll check the PR you mentioned.
Indeed I'm not knowledgeable about go but with the compose file I can make try and fails if it still doesn't work.
I think we can close the issue since the DB works using more recent podman's version. I can create another one should I find a bug but given it's working for you, the issue must be somewhere on my end.

Anyway, thank you very much for your help !

Allright, do not hesitate to open another issue if you need help or have any question! Good luck!