Investigate and fix wrong S-Chain discovery status problem
sergiy-skalelabs opened this issue · 1 comments
We detected logically wrong S-Chain discovery situation. First we saw log message about successful 16 out of 16 S-Chain nodes completely discovered:
2023-09-18 15:26:06.072: S-Chain network discovery: Have S-Chain description response about 16 of 16 node(s).
2023-09-18 15:26:06.072: S-Chain network discovery: This S-Chain discovery will finish with 16 of 16 node(s) discovered.
But later we saw information about at least one S-Chain node was discovered partially or not discovered at all:
2023-09-19 13:11:12.116: CRITICAL ERROR: BLS 1/16 public key discovery failed for node #10, node data is: {"httpRpcPort":10131,"httpRpcPort6":0,"httpsRpcPort":10136,"httpsRpcPort6":0,"ip":"34.217.246.35","ip6":"","nodeID":35,"schainIndex":11,"wsRpcPort":10130,"wsRpcPort6":0,"wssRpcPort":10135,"wssRpcPort6":0,"pwaState":{"oracle":{"isInProgress":false,"ts":0},"m2s":{"isInProgress":false,"ts":0},"s2m":{"isInProgress":false,"ts":0},"s2s":{"mapS2S":{"0":{"isInProgress":false,"ts":0}}}}}
2023-09-19 13:11:12.116: RAW/BLS/#10: CRITICAL ERROR: BLS node #10 verify error: error description is: BLS 1/16 public key discovery failed for node #10, stack is:
Error: BLS 1/16 public key discovery failed for node #10
--> discoverPublicKeyByIndex (/ima/agent/bls.mjs:166:15)
--> Module.doVerifyReadyHash (/ima/agent/bls.mjs:2503:29)
--> Module.handleLoopStateArrived (/ima/agent/pwa.mjs:229:26)
--> ObserverServer.self.mapApiHandlers.skale_imaNotifyLoopWork (/ima/agent/loopWorker.mjs:210:21)
--> InWorkerServerPipe._onPipeMessage (/ima/npms/skale-cool-socket/socketServer.mjs:90:73)
--> InWorkerServerPipe.dispatchEvent (/ima/npms/skale-cool-socket/eventDispatcher.mjs:105:22)
--> InWorkerServerPipe.implReceive (/ima/npms/skale-cool-socket/socket.mjs:287:14)
--> InWorkerServerPipe.receive (/ima/npms/skale-cool-socket/socket.mjs:324:14)
--> InWorkerSocketServerAcceptor.receiveForClientPort (/ima/npms/skale-cool-socket/socket.mjs:581:14)
--> Object.onMessage (/ima/npms/skale-cool-socket/socket.mjs:444:29)
2023-09-19 13:11:12.116: RAW/BLS/#10: CRITICAL ERROR: BLS node #10 verify output is:
These 2 log messages are completely incompatible with each other and demonstrating situation which must not happen in real life.
So, S-Chain discovery results may be saved or treated incorrect as successful. This means S-chain discovery code must perform stronger validation of S-Chain node description JSONs came from skale_imaInfo
calls to skaled and also ensure awaiting for S-Chain discovery compete is not done until it's really done.
Can't reproduce