/fleetiq-adapter-for-agones

For running containerized game servers on Spot instances reliably and safely

Primary LanguagePythonApache License 2.0Apache-2.0

Introduction

This project allows you to run containerized game servers on Spot instances while decreasing the likelihood of Spot interruptions. Interruptions are minimized by using Gamelift FleetIQ which periodically adjusts the instance types used by an AWS Autoscaling Group (ASG) using an algorithm that assesses an instance's viability. Instances with claimed game servers are temporarily protected from termination.

Components

Agones

Agones provides lifecycle management operations for running containerized game servers on Kubernetes. This project was specifically designed to work with Agones running on Amazon EKS or a self-managed Kubernetes cluster running in the AWS Cloud.

The daemonset

The daemonset is an "agent" that runs on worker nodes that have been designated to run containerized game servers, i.e. instances with the role=game-servers label. On EKS, labels can be automatically added to instances by modifying the kubelet parametes in the instance's user data or by modifying the launch template referenced by the ASG for the game server node group.

When the daemonset starts, it immediately registers the instance with Gamelift FleetIQ, runs ClaimGameServer, and calls UpdateGameServer 1x per minute thereafter to maintain the instance's health. It also starts polling a Redis channel for the instance's viability. When an instance's status changes from ACTIVE to DRAINING, the daemon cordons the node to prevent new game servers from being scheduled onto the node. Then it adds a toleration to all allocated game servers. Afterwards, it taints the node, forcing pods that do not have a toleration for the taint, i.e. un-allocated game servers, to be evicted. When the last allocated game server is shutdown, the daemon calls DeregisterGameServer which deregisters the instance from FleetIQ and waits for the instance to be terminated.

The pubsub application

The pubsub application runs a loop that calls DescribeGameServerInstances, parses the results, and publishes the status for each instance to a Redis channel for that instance. Although we could have built the daemon to call DescribeGameServerInstances directly, we chose to use a pub/sub model to avoid exceeded the rate limit for the Gamelift APIs.

The pubsub application supports n game server groups. On startup, the application reads the list of game server groups from the fleetiqconfig ConfigMap.

kind: ConfigMap
apiVersion: v1
metadata:
  name: fleetiqconfig
  namespace: default
data:
  fleetiq.conf: '{"GameServerGroups": [ "agones-game-servers" ]}'

The instructions for installing the pubsub application, along with Redis, can be found here.

The pubsub application and Redis should be installed prior to the gamelift daemon.

Redis

Redis is used to publish InstanceStatus to a channel for each instance. We elected to use Redis instead of SNS to avoid taking a dependency on another AWS service. That said, you can use Redis ElastiCache as your Redis endpoint or you can choose to run it locally in your Kubernetes cluster. The Redis endpoint can be configured by updating the REDIS_URL environment variable for the pubsub application and the daemonset.

Installation

Please follow the instructions in the FleetIQ ESK Agones Integration Guide to install the solution.

We recommend that you build the images for the daemonset and the pubsub application from the Dockerfiles in this repository. Be aware that you will need to update the daemonset and deployment manifests with the appropriate image URIs if you do. Both charts allow you to override the defaults for image and tag with your own values.

Issues

If you have an issue with the Guide or with any of the solution's components, please file an issue.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.