Azure/iot-edge-v1

[V1] Gateway not starting after adding 511+ modules

AnkitPtl opened this issue · 2 comments

Environment :

  • OS : Windows 7 - 64 bit and Debian
  • SDK : 1.1.4
  • Node.js modules

Description :

  • In our system, We have multiple instances of same modules. Gateway is not starting after reaching 511 modules. We also tried to confirm issue with simple sensor and printer modules from sample and it shows same behavior. It calls destroy method of modules with below error. Is there a limit on number of modules one can add to gateway?

Logs :

Error: Time:Wed Sep 12 09:23:50 2018 File:C:\agent_work\2\s\iot-edge\core\src\broker.c Func:start_module Line:343 module receive socket create failed
Error: Time:Wed Sep 12 09:23:50 2018 File:C:\agent_work\2\s\iot-edge\core\src\broker.c Func:Broker_AddModule Line:514 start_module failed
Error: Time:Wed Sep 12 09:23:50 2018 File:C:\agent_work\2\s\iot-edge\core\src\gateway_internal.c Func:gateway_addmodule_internal Line:461 Failed to add module to the gateway's broker.
Error: Time:Wed Sep 12 09:26:00 2018 File:C:\agent_work\2\s\iot-edge\core\src\gateway_createfromjson.c Func:Gateway_CreateFromJson Line:78 Failed to create gateway using lower level library.
Error: Time:Wed Sep 12 09:26:00 2018 File:C:\agent_work\2\s\src\gw\src\main.c Func:main Line:50 An error occurred while creating the gateway.

From the logs, it seems that nanomsg (used by our broker for communications between modules) has run out of sockets, so the code to create a socket for the 512th module fails.

I did some digging, and it looks like nanomsg has a compile-time limit of 512 sockets. Internally, we use one socket to publish messages to all modules, and then one socket per module to receive messages. So the 512th socket is created for the 511th module. I haven’t duplicated your scenario to confirm my theory, but this is likely the cause.

The nanomsg socket limit is configurable at build time, so you could figure out what the real socket limit is on your platforms, then build nanomsg for each platform with the platform-specific limit value. The value is NN_MAX_SOCKETS, and you’d set it as a CMake cache entry on the command line via "-DNN_MAX_SOCKETS=" when you build nanomsg.

No activity, closing. If you've tried to configure nanomsg to an appropriate value for your platform and you're still seeing this problem, please reopen. Thanks!