grafana/grafana-docker

Can't start Grafana on Kubernetes 1.7.14, 1.8.9, or 1.9.4

dghubble opened this issue ยท 27 comments

Kubernetes just patched releases to enforce that volumes mounted from ConfigMaps are read-only kubernetes/kubernetes#58720 (deplorable that this happened in a patch release). Grafana attempts to chown its data directory which might ordinarily be fine, but users are supposed to mount dashboard configs in there too. As a result, on these Kubernetes clusters, Grafana can't start:

chown: changing ownership of '/var/lib/grafana/dashboards/kubernetes-resource-requests-dashboard.json': Read-only file system
...
xlson commented

Hi @dghubble,
Thanks for reporting this issue. We have been planning on reworking the grafana docker image to not chown the directories as well as support configuring what user to run as. Hoping to get started on it soon.

We've hit this issue a while ago, and are building our own image. I realize the way we build it, might not be working for everyone, but I think there's little work that needs to go into it to do that.

This is where we build our image from: https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/grafana-image

@bergquist we've talked about this one before ๐Ÿ™‚, let's make it happen and get this container not to do this.

For the time being, I rolled Grafana back to v4.6.3 since the set of mounts was different and this issue doesn't occur on Kubernetes v1.9.4. So that was the temporary fix. I'd like to stay on the official grafana image and get to v5.x.y whenever this is resolved.

poseidon/typhoon@c59a9c6

Ran into the same problem. Worked around it by setting the Grafana container command to:
["gosu", "grafana", "/usr/sbin/grafana-server", "--homepath=/usr/share/grafana", "--config=/etc/grafana/grafana.ini", "cfg:default.log.mode=console", "cfg:default.paths.data=/var/lib/grafana", "cfg:default.paths.logs=/var/log/grafana", "cfg:default.paths.plugins=/var/lib/grafana/plugins", "cfg:default.paths.provisioning=/etc/grafana/provisioning"]
(taken from the run.sh script).

If you need a gist of run.sh that works you can use or copy this.

https://gist.github.com/kavehmz/61419af3ddc685b18553c05299d78c9d

(Only if the simpler command that @wieslaw-gat mentioned didn't work for you.)

xlson commented

I'm working on a new image to solve the issues mentioned here as well as some others. The work is happing on this branch: https://github.com/grafana/grafana-docker/tree/image-improvements

@brancz Thanks for sharing your image. I'm using it as a base for re-doing the default image. Regarding user to run Grafana as, do you feel its better to use the nobody user or to create a grafana user with a high (and pinned) id instead. I have yet to try using the slim debian image as a base. Have you had any issues with it?

@wieslaw-gat, @kavehmz: thanks for sharing your workarounds.

xlson commented

I've just merged PR #142 which should act as a temporary fix to the issues with chown:ing while we continue working on the new image (chown errors are ignored). I don't have a Kubernetes cluster setup so I would love to hear if this solves your issues. There is currently no published image on dockerhub with this fix but the next build of master should include it in grafana/grafana:master

@xlson - using grafana/grafana:master resolved the chown issues in my k8s cluster.
Thanks!

xlson commented

We've just released Grafana 5.0.4 with the fix, it's available from Docker Hub (grafana/grafana:5.0.4). In 5.1 we will remove chown completely.

Thanks!

siwyd commented

@xlson Will there be a fix for v4 as well by any chance?

xlson commented

@siwyd We haven't planned for it but if it's requested we might do it. I presume your team haven't upgraded to 5 yet then?

siwyd commented

@xlson No, but it's good to be pushed to do exactly that ;) Thanks for the consideration, but no need to on our account.

@xlson, have you by chance changed your plans and decided to back-port the fix to 4.x? thank you!

xlson commented

@zanitete Not yet, no one else has requested it. Are you stuck not being able to update?

If we were to do that we would definitely want to use semver build metadata https://semver.org/#spec-item-10 to avoid modifying old tags in ways that might break existing deployments

@xlson, let's say the update was not planned in the short term, but if nobody else requested it I can understand.

xlson commented

@DanCech agreed, good suggestion. We really don't want to break any existing installations.

@zanitete I'll look into creating an updated image, not promising anything though given current priorities.

Thanks for the quick feedback! For the moment I built a custom image for 4.3.6 using the latest version of the Dockerfile and seems to work fine. Here are the small changes I made to the build script; if interested I can create a PR. zanitete@70026bd

xlson commented

@zanitete interesting. Could you describe exactly what your use case is? I presumed that you wanted a Docker build of 4.x grafana with the container from 5.0.4 or 5.1. But it seems like what you want is the 5.1 container with the ability to choose id/gid of Grafana at build time?

What I needed was a Docker image with Grafana 4.3.6 that would include the fix for the failing chown command at startup, so that we could migrate to k8s 1.9 without having to migrate (now) to Grafana 5.1

xlson commented

@zanitete okay. How do the changes you have made in that commit play into that issue?

Because the volumes attached to existing deployments of Grafana 4.x expects grafana user/group id to be 104/107 so I needed to override the build args.
The changes to the build script are not strictly needed (I could have build the image without it), but if you want to use it to build a 4.x backward compatible version of the image you need to override the default UID/GID, no?

xlson commented

@zanitete you're quite right. That's not something we want to break when fixing the chown problem. Please send in the PR :)

xlson commented

@zanitete The one thing I'm not entirely sure about is if we really want to use the new container to fix the issue. We might just use the fix from #142 as that is much closer to the original container.

I see, I had the same doubt but I tested the image built with the Dockerfile in master and it seems to work fine (at least for our limited usecases) Would you recommend to checkout https://github.com/grafana/grafana-docker/tree/df72f7243afda7de0fc30d0d10dc00243e152706 and build it from there instead? In this case my PR is not relevant since the build script in master would be used only to build 5.1 images, right?

xlson commented

@zanitete That's the commit we'll use if we make a patch release for any previous versions of Grafana as it is the least likely to cause issues. If the latest container work for you, go for it.

Quite right, we'll skip your PR for now. Thanks though.