[Bug] Operation was canceled when start_workflow
Opened this issue ยท 7 comments
What are you really trying to do?
- Hi team, I am having an issue when trying to
start_workflow
andsignal_workflow
Describe the bug
- It happens when I called method
start_workflow
. Maybe it cant connect to create workflow on temporal and returntemporal_sdk_bridge.RPCError: (1, 'operation was canceled', b'')
- I started 10 workflows but received 6 success and 4 error cancelled
- I want to know why it happens, does it due to network or anything else ? How can I fix that ? E.x: Add retry policy when start_workflow,...
Environment/Versions
- OS and processor: Mac M2
- Temporal Version: ^1.6.0
- Are you using Docker or Kubernetes or building Temporal from source: Using Docker
Additional context
I had check my logs again, this error also happens when I call signal to workflow.
Can you replicate this reliably? If so, can you alter a sample to show how to replicate? And is it against Temporal cloud or self-hosted server? We are releasing a fix in the next couple of days for a similar error at temporalio/sdk-core#807, but we believe that only affected 1.7.0.
Hi @cretz , thanks for your reply, I am using Temporal as self-hosted server and I can't always replicate it, sometime it happened and not. I investigated and assumed that it caused at point in above image. Currently, I added retry when call start_workflow and this error still happen but less than before. About my code, it just sample like this:
- Create a client with connect
temporal_client = await Client.connect(target_host=...,namespace=...)
- Call start_workflow (maybe many calls at the same time)
handler = await temporal_client.start_workflow(workflow, args=[arg], id="workflow_id", task_queue="task_queue")
I am using version 1.6.0 so maybe it not similar to temporalio/sdk-core#807
I am using Temporal as self-hosted server and I can't always replicate it, sometime it happened and not
Even if it takes a minute to replicate, any replication would help us debug.
I am afraid there's not much to go on here. We have many samples/users starting hundreds/thousands of workflows without any issues on self-hosted servers. Can you make sure you're not doing something like accidentally blocking the thread in an async def
call thereby causing asyncio to stop working properly?
Okk @cretz , thank you for your response. I will continue monitor it ๐