HewlettPackard/swarm-learning

Error when starting the second SN (SMLETHNode: Transferring 100 ETH to local account failed) - MNIST example

joaquingarciaatos opened this issue · 4 comments

Issue description

  • issue description: error obtained when starting the second Swarm Network Node.
  • occurrence - consistent or rare:
  • error messages: SMLETHNode: Transferring 100 ETH to local account failed
  • commands used for starting containers: the ones provided in the MNIST example (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md)
  • docker logs [APLS, SPIRE, SN, SL, SWCI]:
    ######################################################################

HPE SWARM LEARNING SN NODE

######################################################################

© Copyright 2019-2022 Hewlett Packard Enterprise Development LP

######################################################################
2023-06-22 10:19:21,321 : swarm.blCnt : INFO : Setting up blockchain layer for the swarm node: START
2023-06-22 10:19:22,628 : swarm.blCnt : INFO : Creating Autopass License Provider
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Creating license server
2023-06-22 10:19:23,407 : swarm.blCnt : INFO : Setting license servers
2023-06-22 10:19:23,421 : swarm.blCnt : INFO : Acquiring floating license 1100000380:1
2023-06-22 10:19:24,047 : swarm.SN : INFO : Using URL : https://213.227.143.136:30304/is_up
2023-06-22 10:19:24,170 : swarm.SN : INFO : Sentinel Node is UP!
2023-06-22 10:19:43,727 : swarm.SN : INFO : SMLETHNode: Starting GETH ...
2023-06-22 10:22:16,547 : swarm.SN : ERROR : SMLETHNode: Transferring 100 ETH to local account failed
Traceback (most recent call last):
File "", line 1, in
File "start_swarm_sn.py", line 196, in start_swarm_sn.main
File "swarmfactory.py", line 615, in swarmfactory.createBCFullNodeForContainer
File "swarmbcnode.py", line 739, in swarmbcnode.smlethnode.initialize
File "swarmutils.py", line 678, in swarmutils.swarmlogger.emitError
RuntimeError: SMLETHNode: Transferring 100 ETH to local account failed
2023-06-22 10:22:16,556 : swarm.blCnt : WARNING : Releasing license

Swarm Learning Version:

  • Find the docker tag of the Swarm images ( $ docker images | grep hub.myenterpriselicense.hpe.com/hpe_eval/swarm-learning ): Version 2.0.0

OS and ML Platform

  • details of host OS: Ubuntu 20.04.6 LTS
  • details of ML platform used:
  • details of Swarm learning Cluster (Number of machines, SL nodes, SN nodes): 2 hosts, exactly the same as MNIST example

Quick Checklist: Respond [Yes/No]

  • APLS server web GUI shows available Licenses? Yes
  • If Multiple systems are used, can each system access every other system? Yes
  • Is Password-less SSH configuration setup for all the systems? Yes
  • If GPU or other protected resources are used, does the account have sufficient privileges to access and use them?
  • Is the user id a member of the docker group? Yes

Additional notes

  • Are you running documented example without any modification? Yes, just modifying the IPs of host 1 and host 2
  • Add any additional information about use case or any notes which supports for issue investigation: All the steps 1-9 and 11 from the README (https://github.com/HewlettPackard/swarm-learning/blob/master/examples/mnist/README.md) are followed correctly, but error in step 10 appears. I think the issue is related to "Ethereum" and the creation of the blockchain layer, but I do not have more information about the error.

Can you check if the systems in which the nodes running are time synchronized?

@joaquingarciaatos please confirm, whether the issue is resolved post time synchronization using NTP?

Can you check if the systems in which the nodes running are time synchronized?

Can you check if the systems in which the nodes running are time synchronized?

Hi! I tried to check the nodes are time synchronized, and they are. But it didn't solve anything about the issue... Do you have any idea about what can be the issue?

The error might occur due to unsynchronized time between nodes, where even a slight time difference of few milli seconds can cause the issue. To resolve this, you can synchronize the nodes using NTP (Network Time Protocol). Afterward, restart the Docker service and try running the example again.