Token ring issue
melienherrera opened this issue · 5 comments
Context: Trying to use Astra DB as the backend for Temporal OSS for community users to use with ease. Temporal offers two ways to install their server: helm charts and docker compose. We were successful with helm charts as we unzipped the scb file and did it manually. However, tried cql-proxy with both docker compose and helm charts and ran in the following issue.
Overview: Trying to use cql-proxy to connect with Temporal OSS services. Connection with cql-proxy itself is up and listening as given by listening message:
{"level":"info","ts":1646853171.9829385,"caller":"proxy/proxy.go:194","msg":"proxy is listening","address":"[::]:9042"}
Runs into the follow error and panics.
Error message:
...
temporal | + echo 'Waiting for Temporal server to start...'
temporal | Waiting for Temporal server to start...
temporal | + sleep 1
temporal | + tctl cluster health
temporal | + grep SERVING
temporal | panic: token map different size to token ring: got 0 expected 1
...
Need to figure out this token ring policy issue so that it does not panic and continues instead.
How to reproduce issue:
Quick install and run Temporal server.
Use this config file:
docker-compose-cqlproxy.txt
(uploaded as txt > convert to yaml file)
Use this command:
docker-compose -f docker-compose-cqlproxy.yaml up
Thanks for reporting this. I've also run into this issue. It happens because Temporal configures token-aware routing, but gocql is not robust to there only being a single node in the cluster so it just panics. Most other CQL drivers just print a warning and continue on using something equivalent to round-robin. There are a couple ways this can be fixed:
1) Allow cql-proxy to bind to multiple IPs (with an equally distributed token map) on a single host and add those to the peers table.
2) Allow cql-proxy to run multiple instances and have a --peers
flag that puts those entries in the peers table.
3) Add a PR to Temporal to allow for not enabling token-aware. The change would be made here.
The issue is that the remote data_center
wasn't used in the system.local
table. This is fixed on #88
I think I have an idea to fix this that won't require multiple proxies.
Fixed here: #88
Tested:
I've manually cloned and built Temporal, but this should work just fine in a docker-compose or k8s setup. I'll try that later. Let me know if I can help out with that.
- Create Astra Cluster
- Add keyspaces
temporal
andtemporal_visibility
in the Astra UI - Create token and copy Astra DB ID
- Start cql-proxy using token and ID
./cql-proxy --astra-token <token> --astra-database-id <id> --bind 127.0.0.1:9042
- Bootstrap Temporal:
git clone https://github.com/temporalio/temporal
cd temporal
make
./temporal-cassandra-tool --keyspace temporal_visibility setup -version 1.6
./temporal-cassandra-tool --keyspace temporal setup -version 0.0
./temporal-cassandra-tool update -schema-dir schema/cassandra/temporal/versioned/
- Run Temporal
./temporal-server start
Tested:
I've manually cloned and built Temporal, but this should work just fine in a docker-compose or k8s setup. I'll try that later. Let me know if I can help out with that.
- Create Astra Cluster
- Add keyspaces
temporal
andtemporal_visibility
in the Astra UI - Create token and copy Astra DB ID
- Start cql-proxy using token and ID
./cql-proxy --astra-token <token> --astra-database-id <id> --bind 127.0.0.1:9042
- Bootstrap Temporal:
git clone https://github.com/temporalio/temporal
cd temporal
make
./temporal-cassandra-tool --keyspace temporal_visibility setup -version 1.6
./temporal-cassandra-tool --keyspace temporal setup -version 0.0
./temporal-cassandra-tool update -schema-dir schema/cassandra/temporal/versioned/
- Run Temporal
./temporal-server start
Tested with docker compose! Looks good on my end.
Temporal UI is up and running - no panic/token ring error. Thank you!