openshift/ansible-service-broker

async bind/unbinding fails often when more than one binding created

gtema opened this issue · 4 comments

gtema commented

This form is for bug reports and feature requests. Major features will go through a spec process.

Feature:

Bug:

main.yaml

- name: "Update last operation"
  asb_last_operation:
    description: "0%: Starting"
  when: in_cluster

- name: 'Set facts'
  set_fact:
    cluster: '{{ "openshift" if "openshift" in lookup("k8s", cluster_info="version") else "kubernetes" }}'

- name: 'Include variables based on ansible version'
  include_vars: ansible_26.yml
  when: ansible_version.full is version('2.6', '>=')

- when: apb_action == 'provision'
  block:
    - name: encode bind credentials
      asb_encode_binding:
        fields:
          dummy: dummy

- when: apb_action == 'bind'
  block:

    - name: encode bind credentials
      asb_encode_binding:
        fields:
          dummy: "{{ _apb_provision_creds.dummy }}"
          openstack_keypair_id: "some value"

apb.yaml:

# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
---
version: 1.0
name: openstack-keypair-apb
description: Openstack KeyPair1
bindable: True
async: optional
tags:
  - Openstack
metadata:
  documentationUrl: https://openstack.org
  displayName: Openstack KeyPair (APB)
  longDescription: This deploys KeyPair secrets into the Openshift/K8
  imageUrl: https://www.openstack.org/assets/openstack-logo/2016R/OpenStack-Logo-Mark.svg
  dependencies: []
plans:
  - name: dev
    description: This development plan deploys openstack-keypair-apb
    free: True
    metadata:
      displayName: Development
    parameters:
      - name: os_client_config_clouds
        title: os_client_config
        type: string
        display_type: textarea
        default: |+
          clouds:
            otc:
              auth:
                auth_url: https://iam.eu-de.otc.t-systems.com:443/v3
                project_name: eu-de #required, since otherwise some APIs are not working
                user_domain_name: my_domain
              interface: public
              identity_api_version: 3
      - name: oc_client_config_secure
        title: os_client_config_secure
        type: string
        display_type: textarea
        display_group: secrets
        default: |+
          clouds:
            my_cloud:
              auth:
                username: MY_USER
                password: MY_SECRET
    bind_parameters:
      - name: cloud_name
        title: cloud, configured in the previously given os_client_config
        type: string
        required: true
      - name: keypair_name
        title: Name of the KeyPair
        type: string
        required: true

launch_apb_on_bind: true

What happened:
Creating of further binding sometimes fail:
Unable to update the job state 7bb91355-1079-4fd7-8440-21f30845d38f on the binding cd305ca6-852a-11e8-9736-0242ac11000b. Reason: Conflict - Operation cannot be fulfilled on bundlebindings.automationbroker.io \"cd305ca6-852a-11e8-9736-0242ac11000b\": the object has been modified; please apply your changes to the latest version and try again"

deletion of this binding fails as well. unprovisioning is not possible either

when launch_apb_on_bind:false the same APB is able to produce multiple bindings, unbind, deprovision (with existing bindings), however in that case binding is not invoked at all. According to log provision is invoked for each triggered binding (from UI). Changing this parameter has unpredictable influence on already provisioned service

Deprovisioning fails with following info (when i.e. multiple bindings were created successfully):

time="2018-07-11T17:47:43Z" level=info msg="All Jobs for instance: 231eab91-8532-11e8-985d-0242ac11000b in state:  in progress - \n[]bundle.JobState{}"
time="2018-07-11T17:47:43Z" level=debug msg="Found secret with name e44f9806-8531-11e8-985d-0242ac11000b\n"
time="2018-07-11T17:47:43Z" level=debug msg="Found secret with name 231eab91-8532-11e8-985d-0242ac11000b\n"
time="2018-07-11T17:47:43Z" level=debug msg="get service instance: e44f9806-8531-11e8-985d-0242ac11000b"
time="2018-07-11T17:47:43Z" level=info msg="ASYNC unbinding in progress"
time="2018-07-11T17:47:43Z" level=debug msg="set job state for instance: e44f9806-8531-11e8-985d-0242ac11000b token: 20e7cc51-cc63-49f1-840b-f5212ba552cd"
time="2018-07-11T17:47:44Z" level=error msg="Could not find binding e44f9806-8531-11e8-985d-0242ac11000b associated with job state 0d0aa957-9b25-4ebb-847d-7e82727d7623 - bundlebindings.automationbroker.io \"e44f9806-8531-11e8-985d-0242ac11000b\" not found"
time="2018-07-11T17:47:44Z" level=error msg="Failed to start new job for async unbind\nbundlebindings.automationbroker.io \"e44f9806-8531-11e8-985d-0242ac11000b\" not found"
time="2018-07-11T17:47:44Z" level=error msg="Unknown error: bundlebindings.automationbroker.io \"e44f9806-8531-11e8-985d-0242ac11000b\" not found"

complete log with provision, multiple bind, unbind of a random binding
openshift-automation-service-broker-2-jmmc6.log

What you expected to happen:
ASB is able to create as much binding as required and is able to unbind and deprovision itself.

How to reproduce it:

  • setup workspace with catasb
  • enable launch_apb_on_bind: true in OASB and redeploy
  • push sample APB with described content
  • provision APB
  • create multiple bindings (sometimes even this fails with "object has been modified")
  • try to delete bindings (usually fails with logs above, credential secrets are removed)
  • try to unprovision ServiceInstance (since unbinding does not work - no way to unprovision service)
gtema commented

the same error happened with just one binding

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/close

@jmrodri: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.