datastax/cql-proxy

Token ring issue

melienherrera opened this issue · 5 comments

Context: Trying to use Astra DB as the backend for Temporal OSS for community users to use with ease. Temporal offers two ways to install their server: helm charts and docker compose. We were successful with helm charts as we unzipped the scb file and did it manually. However, tried cql-proxy with both docker compose and helm charts and ran in the following issue.

Overview: Trying to use cql-proxy to connect with Temporal OSS services. Connection with cql-proxy itself is up and listening as given by listening message:

{"level":"info","ts":1646853171.9829385,"caller":"proxy/proxy.go:194","msg":"proxy is listening","address":"[::]:9042"}

Runs into the follow error and panics.

Error message:

...
temporal                | + echo 'Waiting for Temporal server to start...'
temporal                | Waiting for Temporal server to start...
temporal                | + sleep 1
temporal                | + tctl cluster health
temporal                | + grep SERVING
temporal                | panic: token map different size to token ring: got 0 expected 1
...

Need to figure out this token ring policy issue so that it does not panic and continues instead.

How to reproduce issue:
Quick install and run Temporal server.

Use this config file:
docker-compose-cqlproxy.txt
(uploaded as txt > convert to yaml file)

Use this command:
docker-compose -f docker-compose-cqlproxy.yaml up

Thanks for reporting this. I've also run into this issue. It happens because Temporal configures token-aware routing, but gocql is not robust to there only being a single node in the cluster so it just panics. Most other CQL drivers just print a warning and continue on using something equivalent to round-robin. There are a couple ways this can be fixed:

1) Allow cql-proxy to bind to multiple IPs (with an equally distributed token map) on a single host and add those to the peers table.
2) Allow cql-proxy to run multiple instances and have a --peers flag that puts those entries in the peers table.
3) Add a PR to Temporal to allow for not enabling token-aware. The change would be made here.

The issue is that the remote data_center wasn't used in the system.local table. This is fixed on #88

I think I have an idea to fix this that won't require multiple proxies.

Fixed here: #88

Tested:

I've manually cloned and built Temporal, but this should work just fine in a docker-compose or k8s setup. I'll try that later. Let me know if I can help out with that.

  • Create Astra Cluster
  • Add keyspaces temporal and temporal_visibility in the Astra UI
  • Create token and copy Astra DB ID
  • Start cql-proxy using token and ID
./cql-proxy --astra-token <token> --astra-database-id <id> --bind 127.0.0.1:9042
  • Bootstrap Temporal:
git clone https://github.com/temporalio/temporal
cd temporal
make
./temporal-cassandra-tool --keyspace temporal_visibility setup -version 1.6
./temporal-cassandra-tool --keyspace temporal setup -version 0.0
./temporal-cassandra-tool update -schema-dir schema/cassandra/temporal/versioned/
  • Run Temporal
./temporal-server start

Tested:

I've manually cloned and built Temporal, but this should work just fine in a docker-compose or k8s setup. I'll try that later. Let me know if I can help out with that.

  • Create Astra Cluster
  • Add keyspaces temporal and temporal_visibility in the Astra UI
  • Create token and copy Astra DB ID
  • Start cql-proxy using token and ID
./cql-proxy --astra-token <token> --astra-database-id <id> --bind 127.0.0.1:9042
  • Bootstrap Temporal:
git clone https://github.com/temporalio/temporal
cd temporal
make
./temporal-cassandra-tool --keyspace temporal_visibility setup -version 1.6
./temporal-cassandra-tool --keyspace temporal setup -version 0.0
./temporal-cassandra-tool update -schema-dir schema/cassandra/temporal/versioned/
  • Run Temporal
./temporal-server start

Tested with docker compose! Looks good on my end.

Temporal UI is up and running - no panic/token ring error. Thank you!