SDK throws a timeout exceeded but the transaction can be found in hash scan
konstantinabl opened this issue · 1 comments
Description
Currently, in the relay we experience this issue in the CI and on mainnet. When a user sends a transaction we call transaction.execute and we receive a timeout exceeded error for seemingly "short" transactions, which shouldn't exceed the default timeout of 10 seconds. On one instance the sdk returned a timeout, but afterwards the transaction could be found on hashscan. This leads to user confusion and bad user experience.
Steps to reproduce
The issue is hard to reproduce. It could be found in the CI when acceptance tests are ran and it appears on mainnet.
Failing in CI examples:
In the API Batch 3 raw logs in my PR trying to reproduce the issue -> https://github.com/hashgraph/hedera-json-rpc-relay/actions/runs/11521351852/job/32074803392?pr=3129 (my own fork of the sdk is used here in order to add some more logging)
On mainnet:
https://production.grafana.hedera-ops.com/goto/L4lgN0kNg?orgId=1 -> in these logs you can see the SDK throwing timeout exceeded and then in the relay we throw an error that the transaction execution failed.
[2024-10-03 14:08:13.646 +0000] WARN (consensus-node/84 on mainnet-hashio-6cb89dc686-5wjsn): [Request ID: 64d0e264-3e84-47b9-9e1a-75991fda5422] Fail to execute EthereumTransaction transaction: transactionId=0.0.995584@1727964469.298448417, callerName=eth_sendRawTransaction, status=UNKNOWN(21)
However, when taking the ID you can find it in hashscan
https://hashscan.io/mainnet/transaction/1727964478.235255000
Additional context
No response
Hedera network
mainnet
Version
2.50.0-beta.3
Operating system
None
Hello @konstantinabl,
The team has been working to consistently reproduce this issue.
We explored several possible causes. Initially, we thought that not closing the client after each test could potentially congest the gRPC channel or create different problems like that. Our tests close the client after every test suite and we noticed you don't do the same. To test this, I deployed contracts to the network thousands of times without closing the client, but this approach didn’t yield any conclusive results.
Next, we tried using hedera-local-node's relay and repeatedly redeployed contracts. After that, we directly used hedera-json-rpc-relay and ran a batch of acceptance tests until the error appeared again. Without a consistent way of reproducing this issue debugging seems very challenging.
So far, I’ve encountered two different errors. Most frequently, the error suggests that the transaction isn’t frozen when it should be. Currently, we’re adjusting the relay’s resources in the Docker container (e.g., reducing RAM and CPU cores) to better simulate the CI environment. I’ll try reproducing the issue again. Last time, I reached a dead end, but I may come up with some fresh ideas this time.
I noticed there is another issue in your's repository. I will be glad if the user sends the sendRawTransaction data.