Improve error message on invalid channel config after init timeout
rbx opened this issue · 1 comments
rbx commented
@davidrohr wrote on Nov 16, 2022:
we had runs failing to start with an Error:
fairmq/Device.cxx: LOG(error) << "could not connect all channels after " << fInitializationTimeoutInS << " attempts";
which comes from FairMQ directly.
[...] could you add additional output printing which channel(s) failed to connect?
rbx commented
There is actually already the info as to which channels are still invalid in the debug logs (although it can be quite spammy for many channels and long init times):
[19:51:18][DEBUG] Validating channel 'data[0]'... INVALID
[19:51:18][DEBUG] invalid channel address: 'unspecified'
[19:51:18][DEBUG] Validating channel 'data[0]'... INVALID
[19:51:18][DEBUG] invalid channel address: 'unspecified'
[19:51:18][ERROR] could not connect all channels after 120 attempts
But I agree that it makes sense to list the channels also with the error. I extend the error to:
[19:51:18][ERROR] could not connect all channels after 120 attempts
[19:51:18][ERROR] following channels are still invalid:
[19:51:18][ERROR] channel: name: data[0], type: push, method: connect, address: unspecified
in #455.