cloudfoundry/cloud_controller_ng

CAPI 1.185.0 - Docker Push bug

ChrisMcGowan opened this issue · 7 comments

Thanks for submitting an issue to cloud_controller_ng. We are always trying to improve! To help us, please fill out the following template.

Issue

When CAPI was upgraded from 1.183.0 to 1.185.0 certain cf push commands using a docker image path with a SHA256 failed. Rolling back to 1.183.0 resolved the issue temporarily for our users.

Context

After moving to latest cf-deployment release which contained CAPI 1.185.0 - pushing a docker image using sha256 would return an error:

failed to create container: running image plugin create: fetching image reference: creating image: parsing url failed: invalid reference format

Steps to Reproduce

While running CAPI 1.185.0 just do a simple push with the following public image using a SHA256 - cf push foo --docker-image gsatts/usagov-2021@sha256:b6a34d1afc391dfff44a43aa10c4a0e80c50f8c11b63df9485aa607320c6e7d2

Expected result

A running image on CF

Current result

it fails to stage:

failed to create container: running image plugin create: fetching image reference: creating image: parsing url failed: invalid reference format

I was able to reproduce this issue with using CAPI 1.183.0 and CAPI 1.185.0 on a bbl test environment. Might be related to the introduction of cloud native buildpacks #3778 (ping @modulo11 @pbusko @c0d1ngm0nk3y @nicolasbender ).

I'll add a warning to the release notes.

@johha @ChrisMcGowan Using the CloudController commit 706b043, we can see that the Cloud Controller successfully sends TaskDefinition request to Diego for staging process (the exact case is also tested here https://github.com/cloudfoundry/cloud_controller_ng/blob/main/spec/unit/lib/utils/uri_utils_spec.rb#L161-L163):

{
  "task_definition": {
    "rootfs": "preloaded:cflinuxfs4",
    "action": {
      "timeout": {
        "action": {
          "emit_progress": {
            "action": {
              "run": {
                "path": "/tmp/lifecycle/builder",
                "args": [
                  "-outputMetadataJSONFilename=/tmp/result.json",
                  "-dockerRef=gsatts/usagov-2021@sha256:b6a34d1afc391dfff44a43aa10c4a0e80c50f8c11b63df9485aa607320c6e7d2"
                ],
                "env": [
                  {
                    "name": "VCAP_APPLICATION",
                    "value": "{\"cf_api\":\"http://localhost\",\"limits\":{\"fds\":16384,\"mem\":1024,\"disk\":1024},\"application_name\":\"foo\",\"application_uris\":[\"foo.customer-app-domain1.com\"],\"name\":\"foo\",\"space_name\":\"test\",\"space_id\":\"68d395be-ebb9-4293-ac14-2ac45af2cdfc\",\"organization_id\":\"b4b0f1ed-b659-4889-81b7-13cd9dc03a2b\",\"organization_name\":\"the-system_domain-org-name\",\"uris\":[\"foo.customer-app-domain1.com\"],\"users\":null,\"application_id\":\"7fc4f327-5f3d-4088-86d9-8dd0ec70edbf\",\"version\":\"f26b9b56-c46c-43ab-953b-f98a94922b2f\",\"application_version\":\"f26b9b56-c46c-43ab-953b-f98a94922b2f\"}"
                  },
                  {
                    "name": "MEMORY_LIMIT",
                    "value": "1024m"
                  },
                  {
                    "name": "VCAP_SERVICES",
                    "value": "{}"
                  }
                ],
                "resource_limits": {
                  "nofile": 42
                },
                "user": "vcap",
                "suppress_log_output": false
              }
            },
            "start_message": "Staging...",
            "success_message": "Staging Complete",
            "failure_message_prefix": "Staging Failed"
          }
        },
        "timeout_ms": 42000
      }
    },
    "disk_mb": 1024,
    "memory_mb": 1024,
    "cpu_weight": 50,
    "privileged": false,
    "log_source": "STG",
    "log_guid": "7fc4f327-5f3d-4088-86d9-8dd0ec70edbf",
    "metrics_guid": "",
    "result_file": "/tmp/result.json",
    "completion_callback_url": "https://api.internal.cf:8182/internal/v3/staging/df485df4-432b-4ff2-878d-fba030c013cb/build_completed?start=false",
    "cached_dependencies": [
      {
        "name": "",
        "from": "http://file-server.service.cf.internal:8080/v1/static/docker_app_lifecycle/docker_app_lifecycle.tgz",
        "to": "/tmp/lifecycle",
        "cache_key": "docker-lifecycle",
        "log_source": ""
      }
    ],
    "legacy_download_user": "vcap",
    "trusted_system_certificates_path": "/etc/cf-system-certificates",
    "network": {
      "properties": {
        "app_id": "7fc4f327-5f3d-4088-86d9-8dd0ec70edbf",
        "container_workload": "staging",
        "org_id": "b4b0f1ed-b659-4889-81b7-13cd9dc03a2b",
        "policy_group_id": "7fc4f327-5f3d-4088-86d9-8dd0ec70edbf",
        "ports": "8080",
        "space_id": "68d395be-ebb9-4293-ac14-2ac45af2cdfc"
      }
    },
    "max_pids": 2048,
    "certificate_properties": {
      "organizational_unit": [
        "organization:b4b0f1ed-b659-4889-81b7-13cd9dc03a2b",
        "space:68d395be-ebb9-4293-ac14-2ac45af2cdfc",
        "app:7fc4f327-5f3d-4088-86d9-8dd0ec70edbf"
      ]
    },
    "image_username": "",
    "image_password": "",
    "log_rate_limit": {
      "bytes_per_second": 1048576
    },
    "metric_tags": {
      "app_id": {
        "static": "7fc4f327-5f3d-4088-86d9-8dd0ec70edbf"
      },
      "app_name": {
        "static": "foo"
      },
      "organization_id": {
        "static": "b4b0f1ed-b659-4889-81b7-13cd9dc03a2b"
      },
      "organization_name": {
        "static": "the-system_domain-org-name"
      },
      "source_id": {
        "static": "7fc4f327-5f3d-4088-86d9-8dd0ec70edbf"
      },
      "space_id": {
        "static": "68d395be-ebb9-4293-ac14-2ac45af2cdfc"
      },
      "space_name": {
        "static": "test"
      }
    }
  },
  "task_guid": "df485df4-432b-4ff2-878d-fba030c013cb",
  "domain": "cf-app-staging"
}

Also, the error looks very similar to the errors thrown by the go-containerregistry Golang library. Could the error be originated from the other components, which were also updated as part of the cf-deployment bump?

I used a bbl environment with cf-d v41.0.0 which comes with capi 1.185.0. There cf push with sha256 fails. After that I only downgraded capi to 1.183.0 and cf push succeeds.

Correct @johha - same config we did. The only add was some manual CCDB schema cleanup on the DB migration scripts that where part of CAPI 1.184/185 so CAPI 1.183 would run - the rest of the cf deployment components in v41.0.0 where left as is.

I could reproduce the problem with the CATs docker_lifecycle test. Fails with capi-release 1.185.0 and succeeds with 1.183.0.
I've proposed https://github.com/cloudfoundry/relint-envs/pull/40 as regression test for the cf-deployment validation.

I created some dev releases and can confirm that the issue is related to the introduction of CNB. With commit 60e06534481d9e584f9490d05aa651d8e751047a just before #3778 cf push foo --docker-image images@sha256:1234 succeeds. After the CNB commits were merged (last one is daffedca6dd499c2b8edef61d396c42c92714353) cf push foo --docker-image images@sha256:1234 fails.

If a tag is used instead of the sha256 reference cf push is successful for all versions.

Couldn't find any differences in the database (package, droplet) between the successful and failing apps. Also the cloud controller logs did not provide any further information.

Fixed with #3889 and shipped with CAPI 1.186.0