- Introduction
- Architecture
- Prerequisites
- Deployment
- Validation
- Tear Down
- Troubleshooting
- Relevant Material
This guide demonstrates a series of best practices that will allow the user to improve the security of their containerized applications deployed to Kubernetes Engine.
The principle of least privilege is widely recognized as an important design consideration in enhancing the protection of critical systems from faults and malicious behavior. It suggests that every component must be able to access only the information and resources that are necessary for its legitimate purpose. This guide will go about showing the user how to improve a container's security by providing a systematic approach to effectively remove unnecessary privileges.
At their core, containers help make implementing security best practices easier by providing the user with an easy interface to run processes in a chroot environment as an unprivileged user and removing all but the kernel capabilities needed to run the application. By default, all containers are run in the root user namespace so running containers as a non-root user is important.
On occasion, an application will need to access a kernel resource that requires special privileges normally granted only to the root user. However, running the application as a user with root privileges is a bad solution as it provides the application with access to the entire system. Instead, the kernel provides a set of capabilities that can be granted to a process to allow it coarse-grained access to only the kernel resources it needs and nothing more.
Using kernel modules such as AppArmor, Kubernetes provides an easy interface to both run the containerized application as a non-root user in the process namespace and restrict the set of capabilities granted to the process.
This demonstration will deploy five containers in a private cluster:
- A container run as the root user in the container in the Dockerfile
- A container run as a user created in the container in the Dockerfile
- A container that Kubernetes started as a non-root user despite the Dockerfile not specifying it be run as a non-root user
- A container with a lenient AppArmor profile that allows all non-root permissions.
- A container with an AppArmor profile applied to disallow the
/proc/cpuinfo
endpoint from being properly read
Each container will be exposed outside the clusters as an internal load balancer.
The containers themselves are running a simple Go web server with five endpoints. The endpoints differ in terms of the privileges they need to complete the request. A non-root user cannot read a file owned by root. The nobody
user cannot read /proc/cpuinfo
when that privilege is being blocked by AppArmor.
- An endpoint to get the container's hostname
- An endpoint to get the username, UID, and GID of identity running the server
- An endpoint to read a file owned by the
root
user - An endpoint to read a file owned by the
nobody
user - An endpoint to read the
/proc/cpuinfo
file
In order to use the code in this demo you will need access to the following tools:
- A bash, or bash-compatible, shell
- GNU Make 3.x or later
- Access to an existing Google Cloud project with the Kubernetes Engine service enabled
- If you do not have a Google Cloud Platform account you can sign up here and get 300 dollars of free credit on your new account.
- Google Cloud SDK (200.0.0 or later)
- HashiCorp Terraform v0.11.7
- gcloud
- kubectl (comes with gcloud)
- Terraform v0.11.7
- gcloud v206.0.0
- kubectl v1.10.4
- Kubernetes Engine v1.10
The steps below will walk you through using terraform to deploy a Kubernetes Engine cluster that you will then use for exploring multiple types of container security configurations.
Prior to running this demo, ensure you have authenticated your gcloud client by running the following command:
gcloud auth application-default login
Run gcloud config list
and make sure that compute/zone
, compute/region
and core/project
are populated with values that work for you. You can set their values with the following commands:
# Where the region is us-east1
gcloud config set compute/region us-east1
Updated property [compute/region].
# Where the zone inside the region is us-east1-c
gcloud config set compute/zone us-east1-c
Updated property [compute/zone].
# Where the project name is my-project-name
gcloud config set project my-project-name
Updated property [core/project].
This project requires the following Google Cloud Service APIs to be enabled:
compute.googleapis.com
container.googleapis.com
cloudbuild.googleapis.com
In addition, the terraform configuration takes three parameters to determine where the Kubernetes Engine cluster should be created:
project
region
zone
For simplicity, these parameters are to be specified in a file named terraform.tfvars
, in the terraform
directory. To ensure the appropriate APIs are enabled and to generate the terraform/terraform.tfvars
file based on your gcloud defaults, run:
make setup-project
This will enable the necessary Service APIs, and it will also generate a terraform/terraform.tfvars
file with the following keys. The values themselves will match the output of gcloud config list
:
$ cat terraform/terraform.tfvars
project="YOUR_PROJECT"
region="YOUR_REGION"
zone="YOUR_ZONE"
If you need to override any of the defaults, simply replace the desired value(s) to the right of the equals sign(s). Be sure your replacement values are still double-quoted.
Next, apply the terraform configuration with:
# From within the project root, use make to apply the terraform
make tf-apply
When prompted if you want to deploy the plan, review the generated plan and enter yes
to deploy the environment. This will take a few minutes to complete. The following is the last few lines of successful output.
...snip...
google_container_cluster.primary: Still creating... (2m20s elapsed)
google_container_cluster.primary: Still creating... (2m30s elapsed)
google_container_cluster.primary: Still creating... (2m40s elapsed)
google_container_cluster.primary: Still creating... (2m50s elapsed)
google_container_cluster.primary: Still creating... (3m0s elapsed)
google_container_cluster.primary: Still creating... (3m10s elapsed)
google_container_cluster.primary: Still creating... (3m20s elapsed)
google_container_cluster.primary: Still creating... (3m30s elapsed)
google_container_cluster.primary: Still creating... (3m40s elapsed)
google_container_cluster.primary: Creation complete after 3m44s (ID: gke-security-best-practices)
Apply complete! Resources: 7 added, 0 changed, 0 destroyed.
Once that has completed, remote into the bastion instance using SSH:
gcloud compute ssh gke-tutorial-bastion
Apply the manifests for the cluster using the deployment script:
./scripts/deploy.sh
This will take a minute or two to complete. The final output should be similar to:
namespace/apparmor created
configmap/apparmor-profiles created
daemonset.apps/apparmor-loader created
deployment.apps/armored-hello-user created
service/armored-hello-user created
deployment.apps/armored-hello-denied created
service/armored-hello-denied created
deployment.apps/hello-override created
service/hello-override created
deployment.apps/hello-root created
service/hello-root created
deployment.apps/hello-user created
service/hello-user created
...snip...
Service hello-root has not allocated an IP yet.
Service hello-root has not allocated an IP yet.
Service hello-root IP has been allocated
Service hello-user has not allocated an IP yet.
Service hello-user has not allocated an IP yet.
Service hello-user has not allocated an IP yet.
Service hello-user has not allocated an IP yet.
Service hello-user IP has been allocated
Service hello-override IP has been allocated
Service armored-hello-user IP has been allocated
Service armored-hello-denied IP has been allocated
At this point, the environment should be completely set up.
To test all of the services in one command, run the validation script from the scripts directory of the bastion host:
./scripts/validate.sh
This script queries each of the services to get:
- the hostname of the pod being queried
- the username, UID, and GID of the process the pod's web server is running as
- the contents of a file owned by root
- the contents of a file owned by a non-root user
- the first 5 lines of content from
/proc/cpuinfo
The first service, hello-root
, has an output similar to:
Querying service running natively as root
You are querying host hello-root-54fdf49bf7-8bjmm
User: root
UID: 0
GID: 0
You have read the root.txt file.
You have read the user.txt file.
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU @ 2.30GHz
and it clearly shows that it is running as root
and can perform all actions.
The third service, hello-user
, has an output similar to:
Querying service containers running natively as user
You are querying host hello-user-76957b5645-hvfw2
User: nobody
UID: 65534
GID: 65534
unable to open root.txt: open root.txt: permission denied
You have read the user.txt file.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 804 100 804 0 0 156k 0 --:--:-- --:--:-- --:--:-- 196k
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
which shows that it is running as nobody
(65534) and therefore can read user.txt but not root.txt.
The third service, hello-override
, has an output similar to:
Querying service containers normally running as root but overridden by Kubernetes
You are querying host hello-override-7c6c4b6c4-szmrh
User: nobody
UID: 65534
GID: 65534
unable to open root.txt: open root.txt: permission denied
You have read the user.txt file.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 804 100 804 0 0 144k 0 --:--:-- --:--:-- --:--:-- 157k
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
and it shows that the container is running as the nobody
user of id 65534
. Therefore, it can again read the user.txt
file and read from /proc/cpuinfo
.
The fourth service, armored-hello-user
, has an output similar to:
Querying service containers with an AppArmor profile allowing reading /proc/cpuinfo
You are querying host armored-hello-user-5645cd4496-qls6q
User: nobody
UID: 65534
GID: 65534
unable to open root.txt: open root.txt: permission denied
You have read the user.txt file.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 804 100 804 0 0 148k 0 --:--:-- --:--:-- --:--:-- 157k
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
and it shows that the leniently armored container still has the default access of the
nobody
user.
The fifth and final service, armored-hello-denied
, has an output similar to:
Querying service containers with an AppArmor profile blocking the reading of /proc/cpuinfo
You are querying host armored-hello-denied-6fccb988dd-sxhmz
User: nobody
UID: 65534
GID: 65534
unable to open root.txt: open root.txt: permission denied
You have read the user.txt file.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 63 100 63 0 0 12162 0 --:--:-- --:--:-- --:--:-- 12600
unable to open root.txt: open /proc/cpuinfo: permission denied
and it shows that the container is prohibited by AppArmor policy from reading the user.txt
and /proc/cpuinfo
.
To tear down the environment, use :
./scripts/teardown.sh
It's output should look like the following:
daemonset.apps "apparmor-loader" deleted
configmap "apparmor-profiles" deleted
namespace "apparmor" deleted
deployment.apps "armored-hello-user" deleted
service "armored-hello-user" deleted
deployment.apps "armored-hello-denied" deleted
service "armored-hello-denied" deleted
deployment.apps "hello-override" deleted
service "hello-override" deleted
deployment.apps "hello-root" deleted
service "hello-root" deleted
deployment.apps "hello-user" deleted
service "hello-user" deleted
After that script completes, log out of the bastion host and run the following to destroy the environment:
make tf-destroy
After answering yes
, Terraform will destroy the environment and indicate when it has completed:
...snip...
module.network.google_compute_subnetwork.cluster-subnet: Destroying... (ID: us-east1/kube-net-subnet)
google_service_account.admin: Destruction complete after 0s
module.network.google_compute_subnetwork.cluster-subnet: Still destroying... (ID: us-east1/kube-net-subnet, 10s elapsed)
module.network.google_compute_subnetwork.cluster-subnet: Still destroying... (ID: us-east1/kube-net-subnet, 20s elapsed)
module.network.google_compute_subnetwork.cluster-subnet: Destruction complete after 25s
module.network.google_compute_network.gke-network: Destroying... (ID: kube-net)
module.network.google_compute_network.gke-network: Still destroying... (ID: kube-net, 10s elapsed)
module.network.google_compute_network.gke-network: Still destroying... (ID: kube-net, 20s elapsed)
module.network.google_compute_network.gke-network: Destruction complete after 25s
Destroy complete! Resources: 7 destroyed.
Error: Error applying plan:
1 error(s) occurred:
* module.network.google_compute_network.gke-network (destroy): 1 error(s) occurred:
* google_compute_network.gke-network: The network resource 'projects/seymourd-sandbox/global/networks/kube-net' is already being used by 'projects/seymourd-sandbox/global/firewalls/k8s-29e43f3a2accf594-node-hc'
Terraform does not automatically rollback in the face of errors. Instead, your Terraform state file has been partially updated with any resources that successfully completed. Please address the error above and apply again to incrementally change your infrastructure.
Solution: the cluster does not always cleanly remove all of the GCP resources associated with a service before the cluster is deleted. You will need to manually clean up the remaining resources using either the Cloud Console or gcloud.
The credentials that Terraform is using do not provide the necessary permissions to create resources in the selected projects. Ensure that the account listed in gcloud config list
has necessary permissions to create resources. If it does, regenerate the application default credentials using gcloud auth application-default login
.
Terraform occasionally complains about an invalid fingerprint, when updating certain resources. If you see the error below, simply re-run the command.
This is not an officially supported Google product