Task inheriting parent cluster GCP account info
Opened this issue · 6 comments
The results of our task can be seen here: https://hush-house.pivotal.io/teams/PE/pipelines/kibosh/jobs/delete-gke-cluster-and-registry-images/builds/1. Our pipeline creates and deletes a GKE cluster using our service account key provided. On our delete step, we forgot to add the --project
parameter. The result was trying to delete a GKE cluster in the cf-concourse-production project.
We have fixed our pipeline to alway reference the GCP project but, wanted to let the group know still.
Thanks for letting us know @jkjell !
Assuming that this is coming through the metadata server (I might be wrong): as we're able to configure garden properties by making use of environment variables, we're able to configure google's internal metadata server to be blocked from the containers that run as
steps/checks in our machines:
worker:
replicas: 1
env:
- name: CONCOURSE_GARDEN_DENY_NETWORK
value: "169.254.169.254/32"
Regarding permissions granted to those VMs, we're not really able fully remove all of the current permissions as we need those in order to have the provisioning of new disks and
other administrative functions - we could perhaps reduce them, but not make them a full "no permissions granted".
I'll try to validate that soon and follow up with a PR 👍
thx!
Update: we added the deny rule to the workers - we'd need now to ensure that it's indeed blocking and it does what we expect (see
hush-house/deployments/with-creds/worker/values.yaml
Lines 34 to 35 in fb44c29
We confirmed that the deny rule blocks the containers from being able to access the metadata server that the underlying host can access. That rule is applied now to all shared GCP workers on Hush House so this shouldn't happen anymore!
We provisioned a new worker without the rule turned on, and when we ran gcloud info
we could reproduce this behaviour, which explains why without the --project
flag, the gcloud CLI was falling through to the service account credentials used to provision the GKE cluster:
Account: [secret-account-id@developer.gserviceaccount.com]
Project: [cf-concourse-production]
Current Properties:
[core]
project: [cf-concourse-production]
account: [secret-account-id@developer.gserviceaccount.com]
disable_usage_reporting: [True]
After applying the rule, the gcloud info
output is much more locked down:
Account: [None]
Project: [None]
Current Properties:
[core]
disable_usage_reporting: [True]
As of now, running fly execute
with
---
platform: linux
image_resource:
type: registry-image
source:
repository: platforminsightsteam/base-ci-image
run:
path: gcloud
args: ["info"]
shows Project: [cf-concourse-production]
. We intercepted a similar container and were able to curl 169.254.169.254
with no errors. We inspected /proc/$(pgrep gdn)/cmdline
on multiple worker pods, and saw that --deny-network169.254.169.254/32
does indeed appear. Something fishy is going on here.
We have confirmed, as @cirocosta suspected, that concourse/concourse#5159 is the culprit here. We used the following docker-compose.yml:
version: '3'
services:
concourse-db:
image: postgres
environment:
POSTGRES_DB: concourse
POSTGRES_PASSWORD: concourse_pass
POSTGRES_USER: concourse_user
PGDATA: /database
concourse:
# digest:
# before PR sha256:488638b0651e1e6cc884876499970a181ef63f1b2b02b6b9718ca1383c51a0b4
# (https://ci.concourse-ci.org/teams/main/pipelines/concourse/jobs/build-rc-image/builds/86)
# after PR sha256:49837094a16050e64a02e8f100a1992084f89505fdddd98e48aae8aa5355b4b4
# (https://ci.concourse-ci.org/teams/main/pipelines/concourse/jobs/build-rc-image/builds/87)
image: concourse/concourse-rc@<digest>
command: quickstart
privileged: true
depends_on: [concourse-db]
ports: ["8080:8080"]
environment:
CONCOURSE_POSTGRES_HOST: concourse-db
CONCOURSE_POSTGRES_USER: concourse_user
CONCOURSE_POSTGRES_PASSWORD: concourse_pass
CONCOURSE_POSTGRES_DATABASE: concourse
CONCOURSE_EXTERNAL_URL: http://localhost:8080
CONCOURSE_ADD_LOCAL_USER: test:test
CONCOURSE_MAIN_TEAM_LOCAL_USER: test
CONCOURSE_WORKER_BAGGAGECLAIM_DRIVER: overlay
CONCOURSE_GARDEN_DENY_NETWORK: 172.217.1.174/32 # google.com
and ran fly execute
against both versions using this task.yml:
---
platform: linux
image_resource:
type: registry-image
source:
repository: appropriate/curl
run:
path: curl
args: ["google.com"]
Before the PR, we got curl: (7) Failed to connect to google.com port 80: Connection refused
, but after the PR we got
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
Given that the change in concourse/concourse#5159 causes Garden (via kawasaki) to prepend iptables rules, and setting --deny-network
appends them, we feel a bit discouraged when deciding how to address this "leaking GCP metadata" use case. Admittedly, neither @pivotal-jamie-klassen or I know a whole lot about iptables, so maybe there is some place we can configure garden to definitely block traffic to GCP's metadata server while still allowing outbound traffic from containers running in greenhouse (windows containers).