grafana/grafana-docker

can't start Grafana 5.1.3 on Kubernetes 1.9.6

asubmani opened this issue Β· 37 comments

I am hitting an issue similar to clossed issue #140

Using AKS. K8 version 1.9.6
using AzureDisk as pvc
Deploying using helm chart: Azure LB svc & pvc gets deployed, but pod deployment fails.

kubectl logs pod/grafanademo-5c4ff67949-pvcrs GF_PATHS_DATA='/var/lib/grafana' is not writable. You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

I can't get into the container to do a chown
I tried pulling grafana:master but same issue.

I am not a container expert so would appreciate if some can point me to a workaround to run the official image in docker, patch it and then a yaml to pull from local folder (if possible)

I have similar issue on GKE (1.10.2). It seems, it should be fixed with fsGroup, or something. Hope devs would help us.

same here on k8s v1.10.1 on AWS and EBS disk

The problem appeared from 5.1.0 version so I deployed the 5.0.0 and it worked.

It works with 5.1.3 when I use a community image monitoringartist/grafana-xxl:latest. Unfortunately I don't know enough Docker to understand what I need to change here.
However I am unable to see the Azure-monitor plugin even after adding the plugin using
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} plugins install grafana-azure-monitor-datasource
Seems I have to build my own image, push it into a pvt repo and then try OR use 5.0.0

Experiencing the same issue on K8S 1.10.2 on Bare Metal (kubespray) with a rook-block pv.

Any maintainers have any suggested steps for further troubleshooting, the log suggests that we're migrating but I have a feeling everyone here is using a fresh install.

This problem only seems to occur when persistence is enabled. What's the story around fixing permissions on the pvc in the case this is enabled? Seems that the image expects that its fine to not run chown if it sets the perms in its build script, but once you volume mount a fresh pvc that doesn't have such ownership, it becomes a problem.

xlson commented

In Grafana 5.1 we switched to a new Docker container where all files are owned by id/gid 472 (the grafana user/group). The container is also started with this id/gid. In previous versions the container started as root, changed ownership of the necessary files to the id/gid the grafana user had in previous versions. It then switched to the grafana user to run the binary.

My guess would be that the problems you are seeing are somehow related to the fact that we no longer start the container as root. If possible I would suggest trying to configure the volumes/disks to be owned by id 472. Unfortunately, I know very little of kubernetes. But I will try to dig into this on my end.

You might want to try the approach outlined here https://serverfault.com/questions/906083/how-to-mount-volume-with-specific-uid-in-kubernetes-pod to be able to set the filesystem permissions on the pvc before the main container starts, or if needed you can use a securityContext to specify which uid/gid grafana should run under https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

Thanks @xlson and @DanCech. FYI I am not a developer and actually new to Docker and Kubernetes :). I was using helm charts to deploy Kubernetes as it is easy.
Since I don't have a .YAML to deploy my grafana container on Kubernetes, I am using the helm chart. I have to figure out a way to convert the official grafana helm chart to include the chown commands you mentioned.
@RyanHartje > Would you have a sample YAML I can reference, preferably one that uses PVC's?

@DanCech I feel like the suggestion of manual outside steps in order to preserve data really diminishes the value of the grafana chart.

If I figure out some way to work that out with helm, I'll open a PR for the chart though. Just wanted to point out that these recent changes make a worse story for preserving Grafana's data.

Perhaps that's not an issue though for chart users, since they can define anything they'd need to persist into their chart config.

@asubmani, DanCech's suggestion above can be done while the grafana container is in its CrashBackOffLoop or whichever failure mode it was in, but I'm just going to use an older version until I or someone else addresses this issue within the chart itself.

@RyanHartje My container is in CrashBackOffLoop. When I try to get inside it, I get the below error.

kubectl exec grafanademo-5c4ff67949-2jwgj -c grafana sh -

error: unable to upgrade connection: container not found ("grafana")

I am trying to chown -R 472:472 in the container, but can't get in as the container doesn't start.

I also added pv.beta.kubernetes.io/gid: "472" in annotations: in the persistence in the values.yaml for the helm chart. My storage/pvc get's deployed successfully but the pod is unable to attach it due to access issues.
Will use 5.0.0 for now.

As I've already mentioned above, setting securityContext seem to help.
After containers: section, try to describe next section like:

securityContext:
    fsGroup: 472

Works for me.

@unb9rn I'm using helm so I edit deployment.yaml inside templates/. I put securityContext after container: section but still get the error :(

For ex:

containers:
    ....
    securityContext:
        fsGroup: 472

You can use the official image with 5.0.0 tag and it will work. I think that there is a bug with persistent data for the newer version

I can no longer reproduce this from the most recent chart.

@RyanHartje I've just reproduced it with the most recent chart (image version is grafana/grafana:5.1.3).

@cmorent any chance you could try installing with my patch here:
https://github.com/ryanhartje/charts/tree/grafana-docker-167 ?

I think this should solve the issue, but I'm not able to confirm since I can't replicate.

What is your storage solution if you don't mind me asking?

@RyanHartje I have the same problem with the latest chart and 5.1.3. My storage solution is rook with ceph.

I'm running into the same issue here. Default stable/grafana chart using a PVC on Azure.

Same issue with PVC using Azure files

@smeeklai I'm using a helm chart, if you add the following;

 securityContext:
    runAsUser: 472
    fsGroup: 472

to the line beneath the pod Spec (the first spec after metadata) in your template. It should work.

I was placing that in the containerSpec.

I opened a PR to resolve this in the chart for helm users:
helm/charts#6428

@ajmulhollan1 Does not seem to do any difference, at least when using "Azure Files".. do you mount your shares with specific parameters ? like gid or uid?

@brondum the 472 above is the grafana user uid:

β–Ά docker run --entrypoint "id" grafana/grafana
uid=472(grafana) gid=472(grafana) groups=472(grafana)

@brondum The defaults for Azure Files have been reported to be too restrictive in the past, maybe setting them to 755/644 for folders/files is possible?

@RyanHartje Thanks for the tip, i have tried with the mount options, but will investigate further :)

feel free to reach out in Kubernetes slack if I can help

@RyanHartje which volume are you making persitent? My persistent volume is mounted at /var but each time grafana pod get's re-created I lose all my data.

@mightwork I was using the grafana chart, which uses /var/lib/grafana
https://github.com/helm/charts/blob/master/stable/grafana/templates/deployment.yaml#L50

Having the same issue using version 5.2.2 in Azure with PVC. Rolling back to 5.0.4 until someone finds a solution.

Try adding this to your deployment:

securityContext:
  runAsUser: 0

It worked for me!

@santiagopoli that makes you run your pod as a privileged user (root). The whole reason this "issue" comes up is because grafana updated their image to follow better security practices such as running as a non priviledged user. While your suggestion functionally works, you're making your grafana instance much more vulnerable in the event of comprise by running as root, instead of the grafana user.

spali commented

Seems to be a general problem when mounting the volume.
mounted empty cifs share with cifs driver, results in

GF_PATHS_DATA='/var/lib/grafana' is not writable.
You may have issues with file permissions, more information here: http://docs.grafana.org/installation/docker/#migration-from-a-previous-version-of-the-docker-container-to-5-1-or-later
mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied

but the plugin dir gets created anyway by grafana... but empty.
no suitable workaround found so far.
every other container I have with the same volume settings for mounting cifs shares works. But they probably haven't hardened their container yet.

reproducible with the following docker stack compose file:

services:
  grafana:
    # Full tag list: https://hub.docker.com/r/grafana/grafana/tags/
    image: grafana/grafana:5.2.2
    environment:
      #GF_INSTALL_PLUGINS: natel-influx-admin-panel,vonage-status-panel,grafana-clock-panel,grafana-simple-json-datasource
      GF_SECURITY_ADMIN_PASSWORD: mypw
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_AUTH_DISABLE_LOGIN_FORM: 'true'
      GF_AUTH_DISABLE_SIGNOUT_MENU: 'true'
      GF_AUTH_ANONYMOUS_ENABLED: 'true'
      GF_AUTH_ANONYMOUS_ORG_NAME: 'Main Org.'
      GF_AUTH_ANONYMOUS_ORG_ROLE: 'Admin'
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.role == manager
      restart_policy:
        condition: on-failure
    volumes:
      - grafana_data:/var/lib/grafana
      - grafana_conf:/etc/grafana
    ports:
      - "3000:3000"

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
  grafana_conf:
    driver: cifs
    driver_opts:
      share: myserver/grafana_conf
      username: myuser
      password: mypw
      domain: mydomain

plugins are commented, but if I enable it, the logs just complains about permission denied due plugin installation.

what I don't understand, even the file system permissions are wrong initially, the plugin folder can be created by grafana but nothing else.

@RyanHartje yes I know its more insecure and not a very good idea, but it seems it's the only β€œsolution” right now when using Persistent Volumes on AWS. I put this solution here because none of the other solutions pictured in this thread worked for me and I think it could help other people.

Having said that, thanks for your comment though, as I forgot to state the security considerations of the workaround in my original comment.

spali commented

got grafana at least to start successfully with:

volumes:
  grafana_data:
    driver: cifs
    driver_opts:
      share: myserver/grafana_data
      username: myuser
      password: mypw
      domain: mydomain
      cifsopts: "uid=472,gid=472,nobrl"

uid,gid to make the files owner by grafana user and group in the container (id 472) resolves the general permission problems.
And second nobrl which resolves a sqlite file locking problem on cifs shares.

Probably you guys can adapt this somehow to your problems in the cloud.

was able to solve the issue by getting inside the previous container and changing permissions to grafana folder
chown -R 472:472 /var/lib/grafana
after that I was able to run the new version

xlson commented

Closing this issue as the Grafana docker image has moved to the main Grafana repository. Now tracked: grafana/grafana#13187