Error when starting the second SN (SMLETHNode: Transferring 100 ETH to local account failed) - MNIST example

Question

Error when starting the second SN (SMLETHNode: Transferring 100 ETH to local account failed) - MNIST example

joaquingarciaatos opened this issue a year ago · 4 comments

joaquingarciaatos commented a year ago

Issue description

issue description: error obtained when starting the second Swarm Network Node.
occurrence - consistent or rare:
error messages: SMLETHNode: Transferring 100 ETH to local account failed
commands used for starting containers: the ones provided in the MNIST example (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md)
docker logs [APLS, SPIRE, SN, SL, SWCI]:
######################################################################

HPE SWARM LEARNING SN NODE

######################################################################

© Copyright 2019-2022 Hewlett Packard Enterprise Development LP

######################################################################
2023-06-22 10:19:21,321 : swarm.blCnt : INFO : Setting up blockchain layer for the swarm node: START
2023-06-22 10:19:22,628 : swarm.blCnt : INFO : Creating Autopass License Provider
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Creating license server
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Setting license servers
2023-06-22 10:19:23,421 : swarm.blCnt : INFO : Acquiring floating license 1100000380:1
2023-06-22 10:19:24,047 : swarm.SN : INFO : Using URL : https://213.227.143.136:30304/is_up
2023-06-22 10:19:24,170 : swarm.SN : INFO : Sentinel Node is UP!
2023-06-22 10:19:43,727 : swarm.SN : INFO : SMLETHNode: Starting GETH ...
2023-06-22 10:22:16,547 : swarm.SN : ERROR : SMLETHNode: Transferring 100 ETH to local account failed
Traceback (most recent call last):
File "", line 1, in
File "start_swarm_sn.py", line 196, in start_swarm_sn.main
File "swarmfactory.py", line 615, in swarmfactory.createBCFullNodeForContainer
File "swarmbcnode.py", line 739, in swarmbcnode.smlethnode.initialize
File "swarmutils.py", line 678, in swarmutils.swarmlogger.emitError
RuntimeError: SMLETHNode: Transferring 100 ETH to local account failed
2023-06-22 10:22:16,556 : swarm.blCnt : WARNING : Releasing license

Swarm Learning Version:

Find the docker tag of the Swarm images ( $ docker images | grep hub.myenterpriselicense.hpe.com/hpe_eval/swarm-learning ): Version 2.0.0

OS and ML Platform

details of host OS: Ubuntu 20.04.6 LTS
details of ML platform used:
details of Swarm learning Cluster (Number of machines, SL nodes, SN nodes): 2 hosts, exactly the same as MNIST example

Quick Checklist: Respond [Yes/No]

APLS server web GUI shows available Licenses? Yes
If Multiple systems are used, can each system access every other system? Yes
Is Password-less SSH configuration setup for all the systems? Yes
If GPU or other protected resources are used, does the account have sufficient privileges to access and use them?
Is the user id a member of the docker group? Yes

Additional notes

Are you running documented example without any modification? Yes, just modifying the IPs of host 1 and host 2
Add any additional information about use case or any notes which supports for issue investigation: All the steps 1-9 and 11 from the README (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md) are followed correctly, but error in step 10 appears. I think the issue is related to "Ethereum" and the creation of the blockchain layer, but I do not have more information about the error.

Answer 1 · 2023-06-26T06:13:29.000Z

Can you check if the systems in which the nodes running are time synchronized?

Answer 2 · 2023-07-13T06:15:30.000Z

@joaquingarciaatos please confirm, whether the issue is resolved post time synchronization using NTP?

Answer 3 · 2023-07-13T06:24:06.000Z

Can you check if the systems in which the nodes running are time synchronized?

Hi! I tried to check the nodes are time synchronized, and they are. But it didn't solve anything about the issue... Do you have any idea about what can be the issue?

Answer 4 · 2023-07-31T09:27:50.000Z

The error might occur due to unsynchronized time between nodes, where even a slight time difference of few milli seconds can cause the issue. To resolve this, you can synchronize the nodes using NTP (Network Time Protocol). Afterward, restart the Docker service and try running the example again.