uber/cadence-web

Error `Failed to fetch` for cadence-web v3.29.0+ with single K8s cluster setup

pregnor opened this issue · 3 comments

Hello there,

I would like to report a strange behavior which I believe might be either a bug, a user error or a significant behavior change in cadence-web which I would like to have clarified.

While we were testing the cadence-web component update from v3.28.7 (which is working fine) to v3.29.4 we noticed that when we use any of the v3.29.x versions of cadence-web starting from v3.29.0, the Cadence web UI shows no workflow list, but instead displays an error message "Failed to fetch".

Screenshot 2021-11-25 at 15 08 48

After examining the network communication in the browser development tools, there were a couple errors reported during the request of the workflow list, I'm going to list them as a summary here, but I omit the stack trace beyond the first entry, it can be found in the attached log.txt.

Could not find cluster "isActive:true" in crossRegion.clusterOriginList configuration.
getClusterFromClusterList @ get-cluster-from-cluster-list.js:48

http-service.js:63 GET http://localhost:8088/api/domains/pipeline net::ERR_CONNECTION_REFUSED
_callee$ @ http-service.js:63

actions.js:86 Unable to resolve domain configuration for domain = "pipeline" and origin = "http://localhost:8088".
_callee$ @ actions.js:86

http-service.js:63 GET http://localhost:8089/api/cluster net::ERR_CONNECTION_REFUSED
_callee$ @ http-service.js:63

http-service.js:63 GET http://localhost:8089/api/domains/pipeline net::ERR_CONNECTION_REFUSED
_callee$ @ http-service.js:63

http-service.js:63 GET http://localhost:8089/api/feature-flags/domainMetrics net::ERR_CONNECTION_REFUSED
_callee$ @ http-service.js:63

vue.esm.js:628 [Vue warn]: Error in mounted hook (Promise/async): "TypeError: Failed to fetch"

found in

---> <FeatureFlag> at client/components/feature-flag.vue
       <NavigationBar> at client/components/navigation-bar.vue
         <Index> at client/routes/domain/index.vue
           <Domain> at client/containers/domain/component.vue
             <ConnectDomain>
               <CrossRegion> at client/containers/cross-region/component.vue
                 <ConnectCrossRegion>
                   <App> at client/App.vue
                     <Root>
warn @ vue.esm.js:628

vue.esm.js:1897 TypeError: Failed to fetch
    at HttpService._callee$ (http-service.js:63)

http-service.js:63 GET http://localhost:8089/api/domains/pipeline/workflows/open?startTime=2021-10-25T22%3A00%3A00.000Z&endTime=2021-11-25T22%3A59%3A59.999Z net::ERR_CONNECTION_REFUSED
_callee$ @ http-service.js:63

Our environment was a single Kubernetes cluster (tested on both Kind and AWS EKS) where Cadence was installed with our Cadence Helm chart.

In Kind we port forward the localhost:8088 port to the cadence-web pod's 8088 port and check the localhost:8088 URL to see whether the web interface works as expected.
(During the Kind tests when we also tried forwarding the localhost:8089 port to the cadence-web pod's 8088 port and it resolved the issue, the workflow list could be requested and was shown, but this is only a suitable workaround for testing environments.)

On EKS we created an ingress for the Cadence web service and opened it through the ELB host, same result.

After examining the v3.28.7...v3.29.0 comparison, and also noticing the large green label on the attached screenshot stating local - secondary, our first guess was that cadence-web incorrectly believes it runs in a multiple Cadence cluster setup and then the issue might be around the server feature flags recently introduced in #411.

{
    "key": "crossRegion",
    "value": true
  },
  {
    "key": "crossRegion.allowedCrossOrigin",
    "value": true
  },
  {
    "key": "crossRegion.clusterOriginList",
    "value": [
      {
        "clusterName": "primary",
        "origin": "http://localhost:8088"
      },
      {
        "clusterName": "secondary",
        "origin": "http://localhost:8089"
      }
    ]
  },
  1. Are we right to think this configuration change has overridden the default behavior and causes the single Cadence cluster deployment to fail to operate the cadence-web correctly, because of the non-existent secondary Cadence cluster?

  2. Was this an intentional deprecation of the support for using cadence-web with single cluster Cadence deployments?

  3. Could we override/configure the cadence-web component container in any other way than providing it a custom server/feature-flags.json to make it work with single Cadence cluster deployments (like it did in in the past)?

Hi there, I believe it is a mistake in the feature flags and will address this in the next release. Ideally it would be nice to show examples within feature-flags.json in commented out code, however this is not possible with the json format. we will be disabling cross region support by default in order to avoid the above behavior.

Also I do like the idea of being able to feed in a feature-flag config into the app as opposed to overriding the feature-flag.json so I will look into how much work this will be also.

Great to hear both, thank you.