Galaxy Helm Chart (v6)

Galaxy is a data analysis platform focusing on accessibility, reproducibility, and transparency of primarily bioinformatics data. This repo contains a Helm chart for easily deploying Galaxy on top of Kubernetes. The chart allows application configuration changes, updates, upgrades, and rollbacks.

Supported software versions

Kubernetes 1.27+
Helm 3.5+

Kubernetes cluster

You will need kubectl (instructions) and Helm (instructions) installed.

Running Galaxy locally in a dev environment

For testing and development purposes, an easy option to get Kubernetes running is to install Rancher Desktop. Once you have it installed, you will also need to setup an ingress controller. Rancher uses Traefik as the default one, so disable it first by unchecking Enable Traefik from the Kubernetes Settings page. Then deploy the NGINX ingress controller:

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

Dependency charts

This chart relies on the features of other charts for common functionality:

postgres-operator for the database;
galaxy-cvmfs-csi for linking the reference data to Galaxy and jobs based on CVMFS (default).
csi-s3 for linking reference data to Galaxy and jobs based on S3FS (optional/alternative to CVMFS).
rabbitmq-cluster-operator for deploying the message queue.

In a production setting, especially if the intention is to run multiple Galaxies in a single cluster, we recommend installing the dependency charts separately once per cluster. For convenience, we provide a galaxy-deps helm chart that will install all of these general dependencies (often installable cluster-wide) for you. Simply install using helm install --create-namespace -n galaxy-deps galaxy-deps galaxyproject/galaxy-deps.

Installing the chart

Using the chart from the packaged chart repo

The chart is automatically packaged, versioned and uploaded to a helm repository on each accepted PR. Therefore, the latest version of the chart can be downloaded from this repository.

helm repo add cloudve https://raw.githubusercontent.com/CloudVE/helm-charts/master/
helm repo update

Install global dependencies such as the postgres operator.

helm install --create-namespace -n galaxy-deps galaxy-deps galaxyproject/galaxy-deps

Install the chart with the release name my-galaxy. It is not advisable to install Galaxy in the default namespace.

helm install -n my-namespace my-galaxy-release cloudve/galaxy

Using the chart from GitHub repo

Clone this repository:

git clone https://github.com/galaxyproject/galaxy-helm.git

Setup cluster-wide operators and dependencies:

cd galaxy-helm/galaxy-deps
helm dependency update
helm install --create-namespace -n galaxy-deps galaxy-deps .

To install the chart with the release name my-galaxy. See Data persistence section below about the use of persistence flag that is suitable for your Kubernetes environment.

cd ../galaxy
helm dependency update
helm install --create-namespace -n galaxy my-galaxy . --set persistence.accessMode="ReadWriteOnce"

In several minute, Galaxy will be available at /galaxy/ URL of your Kubernetes cluster. If you are running the development Kubernetes, Galaxy will be available at http://localhost/galaxy/ (note the trailing slash).

Uninstalling the chart

To uninstall/delete the my-galaxy deployment, run:

helm delete my-galaxy

If you no longer require cluster-wide operators, you can optionally uninstall them, although, in general, we recommend installing them once and leaving them as is.

helm delete -n galaxy-deps galaxy-deps

Configuration

The following table lists the configurable parameters of the Galaxy chart. The current default values can be found in values.yaml file.

Parameters	Description
`nameOverride`	Override the name of the chart used to prefix resource names. Defaults to `{{.Chart.Name}}` (e.g., `galaxy`)
`fullnameOverride`	Override the full name used to prefix resource names. Defaults to `{{.Release.Name}}-{{.Values.nameOverride}}`
`image.pullPolicy`	Galaxy image pull policy for more info
`image.repository`	The repository and name of the Docker image for Galaxy, searches Docker Hub by default
`image.tag`	Galaxy Docker image tag (generally corresponds to the desired Galaxy version)
`imagePullSecrets`	Secrets used to access a Galaxy image from a private repository
`persistence.enabled`	Enable persistence using PVC
`persistence.size`	PVC storage request for the Galaxy volume, in GB
`persistence.accessMode`	PVC access mode for the Galaxy volume
`persistence.annotations.{}`	Dictionary of annotations to add to the persistent volume claim's metadata
`persistence.existingClaim`	Use existing Persistent Volume Claim instead of creating one
`persistence.storageClass`	Storage class to use for provisioning the Persistent Volume Claim
`persistence.name`	Name of the PVC
`persistence.mountPath`	Path where to mount the Galaxy volume
`useSecretConfigs`	Enable Kubernetes Secrets for all config maps
`configs.{}`	Galaxy configuration files and values for each of the files. The provided value represent the entire content of the given configuration file
`jobs.priorityClass.enabled`	Assign a priorityClass to the dispatched jobs.
`jobs.rules`	Galaxy dynamic job rules. See `values.yaml`
`jobs.priorityClass.existingClass`	Use an existing priorityClass to assign if `jobs.priorityClass.enabled=true`
`refdata.enabled`	Whether or not to mount cloud-hosted Galaxy reference data and tools.
`refdata.type`	`s3csi` or `cvmfs`, determines the CSI to use for mounting reference data. `cvmfs` is the default type for reference data.
`cvmfs.enabled`	Enable use of CVMFS in configs, and deployment of CVMFS Persistent Volume Claims for Galaxy
`cvmfs.pvc.{}`	Persistent Volume Claim to deploy for CVMFS repositories. See `values.yaml` for examples.
`setupJob.ttlSecondsAfterFinished`	Sets `ttlSecondsAfterFinished` for the initialization jobs. See the Kubernetes documentation for more details.
`setupJob.downloadToolConfs.enabled`	Download configuration files and the `tools` directory from an archive via a job at startup
`setupJob.downloadToolConfs.archives.startup`	A URL to a `tar.gz` publicly accessible archive containing AT LEAST conf files and XML tool wrappers. Meant to be enough for Galaxy handlers to startup.
`setupJob.downloadToolConfs.archives.running`	A URL to a `tar.gz` publicly accessible archive containing AT LEAST confs, tool wrappers, and tool scripts but excluding test data. Meant to be enough for Galaxy handlers to run jobs.
`setupJob.downloadToolConfs.archives.full`	A URL to a `tar.gz` publicly accessible archive containing the full `tools` directory, including each tool's test data. Meant to be enough to run automated tool-tests, fully mimicking CVMFS repositories
`setupJob.downloadToolConfs.volume.mountPath`	Path at which to mount the unarchived confs in the each handler (should match path set in the tool confs)
`setupJob.downloadToolConfs.volume.subPath`	Name of subdirectory on Galaxy's shared filesystem to use for the unarchived configs
`setupJob.createDatabase`	Deploy a job to create a Galaxy database from scratch (does not affect subsequent upgrades, only first startup)
`ingress.path`	Path where Galaxy application will be hosted
`ingress.annotations.{}`	Dictionary of annotations to add to the ingress's metadata at the deployment level
`ingress.hosts`	Hosts for the Galaxy ingress
`ingress.canary.enabled`	This will create an additional ingress for detecting activity on Galaxy. Useful for autoscaling on activity.
`ingress.enabled`	Enable Kubernetes ingress
`ingress.tls`	Ingress configuration with HTTPS support
`service.nodePort`	If `service.type` is set to `NodePort`, then this can be used to set the port at which Galaxy will be available on all nodes' IP addresses
`service.port`	Kubernetes service port
`service.type`	Kubernetes Service type
`serviceAccount.annotations.{}`	Dictionary of annotations to add to the service account's metadata
`serviceAccount.create`	The serviceAccount will be created if it does not exist.
`serviceAccount.name`	The serviceAccount account to use.
`rbac.enabled`	Enable Galaxy job RBAC. This will grant the service account the necessary permissions/roles to view jobs and pods in this namespace. Defaults to true.
`webHandlers.{}`	Configuration for the web handlers (See table below for all options)
`jobHandlers.{}`	Configuration for the job handlers (See table below for all options)
`workflowHandlers.{}`	Configuration for the workflow handlers (See table below for all options)
`resources.limits.memory`	The maximum memory that can be allocated.
`resources.requests.memory`	The requested amount of memory.
`resources.limits.cpu`	The maximum CPU that can be alloacted.
`resources.limits.ephemeral-storage`	The maximum ephemeral storage that can be allocated.
`resources.requests.cpu`	The requested amount of CPU (as time or number of cores)
`resources.requests.ephemeral-storage`	The requested amount of ephemeral storage
`securityContext.fsGroup`	The group for any files created.
`tolerations`	Define the `taints` that are tolerated.
`extraFileMappings.{}`	Add extra files mapped as configMaps or Secrets at arbitrary paths. See `values.yaml` for examples.
`extraInitCommands`	Extra commands that will be run during initialization.
`extraInitContainers.[]`	A list of extra init containers for the handler pods
`extraVolumeMounts.[]`	List of volumeMounts to add to all handlers
`extraVolumes.[]`	List of volumes to add to all handlers
`postgresql.enabled`	Enable the postgresql condition in the requirements.yml.
`influxdb.username`	Influxdb user name.
`influxdb.url`	The connection URL to in the `influxdb`
`influxdb.enabled`	Enable the `influxdb` used by the metrics scraper.
`influxdb.password`	Password for the influxdb user.
`metrics.podAnnotations.{}`	Dictionary of annotations to add to the metrics deployment's metadata at the pod level
`metrics.image.repository`	The location of the galay-metrics-scraping image to use.
`metrics.image.pullPolicy`	Define the pull policy, that is, when Kubernetes will pull the image.
`metrics.podSpecExtra.{}`	Dictionary to add to the metrics deployment's pod template under `spec`
`metrics.image.tag`	The image version to use.
`metrics.annotations.{}`	Dictionary of annotations to add to the metrics deployment's metadata at the deployment level
`metrics.enabled`	Enable metrics gathering. The influxdb setting must be specified when using this setting.
`nginx.conf.client_max_body_size`	Requests larger than this size will result in a `413 Payload Too Large`.
`nginx.image.tag`	The Nginx version to pull.
`nginx.image.repository`	Where to obtain the Nginx container.
`nginx.image.pullPolicy`	When Kubernetes will pull the Nginx image from the repository.
`nginx.galaxyStaticDir`	Location at which to copy Galaxy static content in the NGINX pod init container, for direct serving. Defaults to `/galaxy/server/static`

Handlers

Galaxy defines three handler types: jobHandlers, webHandlers, and workflowHandlers. All three handler types share common configuration options.

Parameter	Description
`replicaCount`	The number of handlers to be spawned.
`startupDelay`	Delay in seconds for handler startup. Used to offset handlers and avoid race conditions at first startup
`annotations`	Dictionary of annotations to add to this handler's metadata at the deployment level
`podAnnotations`	Dictionary of annotations to add to this handler's metadata at the pod level
`podSpecExtra`	Dictionary to add to this handler's pod template under `spec`
`startupProbe`	Probe used to determine if a pod has started. Other probes wait for the startup probe. See table below for all probe options
`livenessProbe`	Probe used to determine if a pod should be restarted. See table below for all probe options
`readinessProbe`	Probe used to determine if the pod is ready to accept workloads. See table below for all probe options

Probes

Kubernetes uses probes to determine the state of a pod. Pods are not considered to have started up, and hence other probes are not run, until the startup probes have succeeded. Pods that fail the livenessProbe will be restarted and work will not be dispatched to the pod until the readinessProbe returns successfully. A pod is ready when all of its containers are ready.

Liveness and readiness probes share the same configuration options.

Parameter	Description
`enabled`	Enable/Disable the probe
`initialDelaySeconds`	How long to wait before starting the probe.
`periodSeconds`	How frequently Kubernetes with check the probe.
`failureThreshold`	The number of failures Kubernetes will retry the readiness probe before giving up.
`timeoutSeconds`	How long Kubernetes will wait for a probe to timeout.

Examples

jobHandlers:
  replicaCount: 2
  livenessProbe:
    enabled: false
  readinessProbe:
    enabled: true
    initialDelaySeconds: 300
    periodSecods: 30
    timeoutSeconds: 5
    failureThreshhold: 3

Additional Configurations

Extra File Mappings

The extraFileMappings field can be used to inject files to arbitrary paths in the nginx deployment, as well as any of the job, web, or workflow handlers, and the init jobs.

The contents of the file can be specified directly in the values.yml file with the content attribute.

The tpl flag will determine whether these contents are run through the helm templating engine.

Note: when running with tpl: true, brackets ({{ }}) not meant for Helm should be escaped. One way of escaping is: {{ '{{ mynon-helm-content}}' }}

extraFileMappings:
  /galaxy/server/static/welcome.html:
    applyToWeb: true
    applyToJob: false
    applyToWorkflow: false
    applyToNginx: true
    applyToSetupJob: false
    tpl: false
    content: |
      <!DOCTYPE html>
      <html>...</html>

NOTE for security reasons Helm will not load files from outside the chart so the path must be a relative path to location inside the chart directory. This will change when helm#3276 is resolved. In the interim files can be loaded from external locations by:

Creating a symbolic link in the chart directory to the external file, or
using --set-file to specify the contents of the file. E.g: helm upgrade --install galaxy cloudve/galaxy -n galaxy --set-file extraFileMappings."/galaxy/server/static/welcome\.html".content=/home/user/data/welcome.html --set extraFileMappings."/galaxy/server/static/welcome\.html".applyToWeb=true

Alternatively, if too many .applyTo need to be set, the apply flags can be inserted instead to the extraFileMappings (in addition to the --set-file in the cli) for that file in your values.yaml, with no content: part (as that is done through the --set-file):

extraFileMappings:
  /galaxy/server/static/welcome.html:
    applyToJob: false
    applyToWeb: true
    applyToSetupJob: false
    applyToWorkflow: false
    applyToNginx: false
    tpl: false

Setting parameters on the command line

Specify each parameter using the --set key=value[,key=value] argument to helm install or helm upgrade. For example,

helm install my-galaxy . --set persistence.size=50Gi

The above command sets the Galaxy persistent volume to 50GB.

Setting Galaxy configuration file values requires the key name to be escaped. In this example, we are upgrading an existing deployment.

helm upgrade my-galaxy . --set "configs.galaxy\.yml.brand"="Hello World"

You can also set the galaxy configuration file in its entirety with:

helm install my-galaxy . --set-file "configs.galaxy\.yml"=/path/to/local/galaxy.yml

To unset an existing file and revert to the container's default version:

helm upgrade my-galaxy . --set "configs.job_conf\.xml"=null

Alternatively, any number of YAML files that specifies the values of the parameters can be provided when installing the chart. For example,

helm install my-galaxy . -f values.yaml -f new-values.yaml

To unset a config file in a values file, use the YAML null type:

configs:
  job_conf.xml: ~

Data persistence

By default, the Galaxy handlers store all user data under /galaxy/server/database/ path in each container. This path can be changed via persistence.mountPath variable. Persistent Volume Claims (PVCs) are used to persist the data across deployments. It is possible to specify en existing PVC via persistence.existingClaim. Alternatively, a value for persistence.storageClass can be supplied to designate a desired storage class for dynamic provisioning of the necessary PVCs. If neither value is supplied, the default storage class for the K8s cluster will be used.

For multi-node scenarios, we recommend a storage class that supports ReadWriteMany, such as the nfs-provisioner as the data must be available to all nodes in the cluster.

In single-node scenarios, you must use --set persistence.accessMode="ReadWriteOnce".

Note about persistent deployments and restarts

If you wish to make your deployment persistent or restartable (bring deployment down, keep the state in disk, then bring it up again later in time), you should create PVCs for Galaxy and Postgres and use the persistence.existingClaim variable to point to them as explained in the previous section. In addition, you must set the postgresql.galaxyDatabasePassword variable; otherwise, it will be autogenerated and will mismatch when restoring.

Making Interactive Tools work on localhost

In general, Interactive Tools should work out of the box as long as you have a wildcard DNS mapping to *.its.<host_name>. To make Interactive Tools work on localhost, you can use dnsmasq or similar to handle wildcard DNS mappings for *.localhost.

For linux: Follow the instructions here to configure dnsmasq on Linux: https://superuser.com/a/1718296

For mac:

  $ brew install dnsmasq
  $ cp /usr/local/opt/dnsmasq/dnsmasq.conf.example /usr/local/etc/dnsmasq.conf
  $ edit /usr/local/etc/dnsmasq.conf and set

    address=/localhost/127.0.0.1

  $ sudo brew services start dnsmasq
  $ sudo mkdir /etc/resolver
  $ sudo touch /etc/resolver/localhost
  $ edit /etc/resolver/localhost and set

    nameserver 127.0.0.1

  $ sudo brew services restart dnsmasq

For Debian and Ubuntu:

  $ sudo apt update
  $ sudo apt install -y dnsmasq
  $ sudo cp /etc/dnsmasq.conf /etc/dnsmasq.conf.backup
  $ sudo sh -c 'echo "address=/localhost/127.0.0.1" >> /etc/dnsmasq.conf'
  $ sudo systemctl start dnsmasq
  $ sudo systemctl enable dnsmasq
  $ sudo mkdir -p /etc/resolver
  $ sudo sh -c 'echo "nameserver 127.0.0.1" > /etc/resolver/localhost'
  $ sudo systemctl restart dnsmasq

For RHEL, Fedora, and Rocky Linux:

  $ sudo dnf install dnsmasq -y
  $ sudo cp /etc/dnsmasq.conf /etc/dnsmasq.conf.backup
  $ sudo sh -c 'echo "address=/localhost/127.0.0.1" >> /etc/dnsmasq.conf'
  $ sudo systemctl start dnsmasq
  $ sudo systemctl enable dnsmasq
  $ sudo mkdir -p /etc/resolver
  $ sudo sh -c 'echo "nameserver 127.0.0.1" > /etc/resolver/localhost'
  $ sudo systemctl restart dnsmasq

This should make all *.localhost and *.its.localhost map to 127.0.0.1, and ITs should work with a regular helm install on localhost.

Horizontal scaling

The Galaxy application can be horizontally scaled for the web, job, or workflow handlers by setting the desired values of the webHandlers.replicaCount, jobHandlers.replicaCount, and workflowHandlers.replicaCount configuration options.

Cron jobs

Two Cron jobs are defined by default. One to clean up Galaxy's database and one to clean up the tmp directory. By default, these jobs run at 02:05 (the database maintenance script) and 02:15 (tmp directyory cleanup). Users can change the times the cron jobs are run by changing the schedule field in the values.yaml file:

cronJobs:
  maintenance:
    schedule: "30 6 * * *" # Execute the cron job at 6:30 UTC

or by specifying the schedule on the command line when instaling Galaxy:

# Schedule the maintenance job to run at 06:30 on the first day of each month
helm install galaxy -n galaxy galaxy/galaxy --set cronJobs.maintenance.schedule="30 6 1 * *"

To disable a cron job after Galaxy has been deployed simply set the enabled flag for that job to false:

helm upgrade galaxy -n galaxy galaxy/galaxy --reuse-values --set cronJobs.maintenance.enabled=false

Run a CronJob manually

Cron jobs can be invoked manually with tools such as OpenLens or from the command line with kubectl

kubectl create job --namespace <namespace> <job name> --from cronjob/galaxy-cron-maintenance

This will run the cron job regardless of the schedule that has been set.

Note: the name of the cron job will be {{ .Release.Name }}-cron-<job name> where the <job name> is the name (key) used in the values.yaml file.

CronJob configuration

The following fields can be specified when defining cron jobs.

Name	Definition	Required
enabled	`true` or `false`. If `false` the cron job will not be run. Default is `true`	Yes
schedule	When the job will be run. Use tools such as crontab.guru for assistance determining the proper schedule string	Yes
defaultEnv	`true` or `false`. See the `galaxy.podEnvVars` macro in `_helpers.tpl` for the list of variables that will be defined. Default is `false`	No
extraEnv	Define extra environment variables that will be available to the job	No
securityContext	Specifies a `securityContext` for the job. Typically used to set `runAsUser`	No
image	Specify the Docker container used to run the job	No
command	The command to run	Yes
args	Any command line arguments that should be passed to the `command`	No
extraFileMappings	Allow arbitrary files to be mounted from config maps	No

Notes

If specifying the Docker image both the resposity and tag MUST be specified.

  image:
    repository: quay.io/my-organization/my-image
    tag: "1.0"

The extraFileMappings block is similar to the global extraFileMappings except the file will only be mounted for that cron job. The following fields can be specified for each file.

Name	Definition	Required
mode	The file mode (permissions) assigned to the file	No
tpl	If set to `true` the file contents will be run through Helm's templating engine. Defaults to `false`	No
content	The contents of the file	Yes

See the example cron job included in the values.yaml file for a full example.

Upgrading

From v5 to v6

v6 splits all global dependencies such as the postgres and rabbitbq operators into a separate galaxy-deps chart, in contrast to v5, which had all dependencies bundled in for convenience. This bundling caused problems during uninstallation in particular, because for example, the postgres operator could be uninstalled before postgres itself was uninstalled, leaving various artefacts behind. This made reinstallation particularly tricky, as all such left-over resources had to be cleaned up manually. Therefore, our production installation notes specified installing these dependencies separately anyway. v6 makes this separation explicit by specifically debundling these dependencies into a separate chart.

If upgrading in production scenarios, you may simply omit installing the galaxy-deps chart and continue as usual. If upgrading in development scenarios, there is no straightforward upgrade path. The galaxy chart will have to be uninstalled, the galaxy-deps chart installed, and subsequently, galaxy can be reinstalled.