tango-controls/TangoTickets

Not possible to subscribe to events in server initialization phase

reszelaz opened this issue · 7 comments

Hi,

I have the following situation: a device server with two devices each of them of a different class: DeviceOne and DeviceTwo. deviceone-01 of the DeviceOne class is being exported as first one and then the devicetwo-01 of the DeviceTwo class is being exprted. The devicetwo-01 tries to subscribe to some_attribute comming from the deviceone-01 and this is done during the device initialization. This gives me the following error:

DevFailed[
DevError[
    desc = Device dserver/run_server/test is not exported (hint: try starting the device server)
  origin = DeviceProxy::get_corba_name()
  reason = API_DeviceNotExported
severity = ERR]

DevError[
    desc = Failed to execute command_inout on device dserver/run_server/test, command ZmqEventSubscriptionChange
  origin = Connection::command_inout()
  reason = API_CommandFailed
severity = ERR]

DevError[
    desc = Device server send exception while trying to register event
  origin = EventConsumer::connect_event()
  reason = API_DSFailedRegisteringEvent
severity = ERR]
]

One can use the following project to reproduce it:

zreszela@pc255:~/workspace/lib-tango-sample-device$ /usr/lib/tango/tango_admin --add-server run_server/test DeviceOne sys/test/deviceone-01
zreszela@pc255:~/workspace/lib-tango-sample-device$ /usr/lib/tango/tango_admin --add-server run_server/test DeviceTwo sys/test/devicetwo-01
zreszela@pc255:~/workspace/lib-tango-sample-device$ python many_classes/run_server.py test

I think that something like this used to work in Tango7.

Cheers,
Zibi

PS. @bourtemb this is the problem that I described you during the ICALEPCS in Barcelona:) Sorry I couldn't find time earlier to report it...

Hi Zibi,

Thanks for taking the time to create an issue for this.
If you call subscribe_event() method with the stateless parameter (http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/cpp_doc/classTango_1_1DeviceProxy.html#a80c449b725a134b1e9aac6771b70ed5c) set to true,

the event subscription will always succeed, even if the corresponding device server is not running. The keep alive thread will try every 10 seconds to subscribe for the specified event. At every subscription retry, a callback is executed which contains the corresponding exception.

You might miss some events when using this solution of course because the subscription will probably succeed 10 seconds after subscribe_event() was invoked.

Maybe you have the feeling this was working in Tango 7 because you were already using this stateless parameter in some other device servers?

Hoping this helps,
Reynald

After talking with Emmanuel about this use case, he said that for the moment, unless you don't care about being blind to events during 10 seconds, the best is to create a thread in your device server which will attempt a subscribe_event every x ms (x to be defined depending on your use case) and to put the device server in an INIT state until the subscribe call succeeds.

stateless feature is there since Tango 6.1.0.
It seems the admin device has always been the last one to be exported and the subscribe_event method always needed to send a command to the admin device (even when using the notifd). So, unless you were using the stateless parameter, it seems your example should also have failed in Tango 7.

Thanks @bourtemb for the update!

I think I know why I had the wrong impression that it used to work with Tango 7. In Tango 7 we had the following exception raised:

DevFailed[
DevError[
    desc = TRANSIENT CORBA system exception: TRANSIENT_ConnectFailed
  origin = Connection::reconnect
  reason = API_CorbaException
severity = ERR]

DevError[
    desc = Failed to connect to device dserver/sardana/zreszela
  origin = Connection::reconnect
  reason = API_CantConnectToDevice
severity = ERR]

DevError[
    desc = Failed to execute command_inout on device dserver/sardana/zreszela, command EventSubscriptionChange
  origin = Connection::command_inout()
  reason = API_CommandFailed
severity = ERR]

DevError[
    desc = Device server send exception while trying to register event
  origin = EventConsumer::subscribe_event()
  reason = API_DSFailedRegisteringEvent
severity = ERR]
]

While now we get only:

DevFailed[
DevError[
    desc = Device dserver/sardana/zreszela is not exported (hint: try starting the device server)
  origin = DeviceProxy::get_corba_name()
  reason = API_DeviceNotExported
severity = ERR]

DevError[
    desc = Failed to execute command_inout on device dserver/sardana/zreszela, command ZmqEventSubscriptionChange
  origin = Connection::command_inout()
  reason = API_CommandFailed
severity = ERR]

DevError[
    desc = Device server send exception while trying to register event
  origin = EventConsumer::connect_event()
  reason = API_DSFailedRegisteringEvent
severity = ERR]
]

I assume that the stack of errors has changed in between the Tango versions, well actually the original error. Could you please confirm that? In Sardana we are comparing the reasons of errors in order to print the corresponding message to the user. We were simply not interpretting the API_CorbaException. This is just to understand it better. It won't change too much:)

I remember, during our quick chat at ICALEPCS we were wondering if it is ok in the Tango DS model it is possible to read an attribute before the complete server got exported but it is not possible to subscribe to its events? You think that it should stay like this?

Thanks for the hints about the stateless subscribe. In Sardana we are using Taurus. Taurus first tries to subscribe with stateless = False, and if not possible enables the client polling (nothing to do with the Tango polling) and then subscribe with stateless = True. When the reconnection succeeds the polling gets disabled and Taurus relies on the events from now on.

We will think on the most appropriate solution for Sardana. It is somehow delicate cause this attribute is the unique connection point between the Pool and the MacroServer. It exposes all the elements of Pool to the MacroServer and communicates the changes (added, removed or changed elements). We will need to carefully manage the scenarios with multiple Pools connected to the MacroServer and situations where the elements are changed in the Pool(s) right after the server startup. The integrity of this information on the MacroServer side is very critical.

I remember, during our quick chat at ICALEPCS we were wondering if it is ok in the Tango DS model it is possible to read an attribute before the complete server got exported but it is not possible to subscribe to its events? You think that it should stay like this?

I think the reason why it is like that today is technical. To subscribe to events, you need to send a command to the admin device, which is the last one to get exported.
On the principle, I tend to think that if we can read an attribute, we should also be able to subscribe to its events and receive events... We have to see whether this is technically feasible and how easy it would be to change it...
I think this is a topic we could bring at the next kernel conference (next Monday) to get the opinion of different users and developers, and to agree at least on the principle before digging into the code. It would be good to get more details on the history of the project and to understand why it is currently done like that and if there are any strong limitations preventing us to change the current behaviour.

Great! Just let me know if need more information about the particular use case in Sardana. Well, if @tiagocoutinho will be on the meeting, he may still remember the details:)

Just a silly question, and what if we inverse the order of exporting devices, and first export the admin device and then any other device?

We would like to avoid changing the order of exporting the devices because by design it is not so easy to do (The admin device is creating and exporting the other devices in its own init_device() method, and the admin device is exported after the init_device() has been executed) and also to avoid any side effect on SW currently relying on this behaviour (Starter for instance).
One solution coming from offline discussions and from the kernel visioconf meeting, could be to introduce a server_init_hook() method which would be called on each device once the admin device has been exported. In the use case described in this ticket, this could be used to do all the event subscriptions. To be clean, the device server programmer should put the state of the devices to INIT in the init_device() and switch it to ON (or in another meaningful state) at the end of this server_init_hook() method.

Sorry for the late reply and thanks for further clarifications. I have just checked the code again, and for us it would be very easy to adapt the code to this new feature. So it would be great if it appears in Tango!