tango-controls/cppTango

Exception on unsubscription when a client has two proxies for the same attribute

Closed this issue · 13 comments

The example code below creates two attribute proxies to the same attribute, subscribes both of them to the same event, then unsubscribes the first one and destroys its proxy. This apparently unsubscribes the other callback and a Tango exception is thrown during pr2->unsubscribe_event(ev2);. No exception is thrown when this line is moved before the line with delete pr1;.

Happens with Tango 9.2.5a.

#include <tango.h>

class MyCallBack : public Tango::CallBack
{
   protected:
   void push_event(Tango::AttrConfEventData *) final
   {
   }
};

int main(void)
{
   MyCallBack clb1, clb2;
   int ev1, ev2;
   Tango::AttributeProxy *pr1 = new Tango::AttributeProxy("sys/tg_test/1/long_scalar");
   Tango::AttributeProxy *pr2 = new Tango::AttributeProxy("sys/tg_test/1/long_scalar");

   try
   {
      ev1 = pr1->subscribe_event(Tango::ATTR_CONF_EVENT, &clb1);
      ev2 = pr2->subscribe_event(Tango::ATTR_CONF_EVENT, &clb2);
      pr1->unsubscribe_event(ev1);
      delete pr1;
      pr2->unsubscribe_event(ev2);
      delete pr2;
   }
   catch(const Tango::DevFailed &ex)
   {
      Tango::Except::print_exception(ex);
   }
   return 0;
}

Thank you very much for the report.
I can reproduce the problem.
It seems this issue occurs because the line
delete pr1;
is wrongly unsubscribing the events associated to pr2 AttributeProxy in the example you provided.
This case is currently not well handled by the Tango library indeed and could be fixed but this will require some changes in the structure of some classes.

It looks like commit: 3628115 has been causing some problems.
@taurel, do you remember what triggered this change? I didn't find any issue associated to this commit?

Do we really need this unsubscription in the DeviceProxy destructor? Is this to avoid some crashes if the user forgets to unsubscribe before the DeviceProxy object is deleted?

Does it actually make sense to unsubscribe there?
It might be convenient in some cases to create a DeviceProxy object to subscribe to events on some attributes and still receive events for these attributes, even after the DeviceProxy object has been deleted?

@mliszcz , since you're already working on #737, which is related to event subscriptions, could you please also have a look at this issue too?
Your changes might actually have a positive impact on this issue too...

It might be useful to provide a way to unsubscribe to events, even with a DeviceProxy object which is different than the one which was used during the subscribe_event call?

Do we really need this unsubscription in the DeviceProxy destructor? Is this to avoid some crashes if the user forgets to unsubscribe before the DeviceProxy object is deleted?

In long running system, clients could come and go very often. If they all forget to unsubscribe, we wouldn't want that to cause the server use up more and more resources (memory, CPU). I'm not sure how the ZMQ and events mechanisms work, so I don't know if the auto unsubscription is required.

Does it actually make sense to unsubscribe there?
It might be convenient in some cases to create a DeviceProxy object to subscribe to events on some attributes and still receive events for these attributes, even after the DeviceProxy object has been deleted?

I'm not sure that would be useful. Doesn't the DeviceProxy object create the ZMQ client that receives the events? If so, the ZMQ client would disappear when the DeviceProxy object is deleted.

Are you thinking of ZMQ-only client that could just listen on a particular channel and receive events that were initiated by someone else (another DeviceProxy)?

It might be useful to provide a way to unsubscribe to events, even with a DeviceProxy object which is different than the one which was used during the subscribe_event call?

I'm not in favour, but maybe you can provide a good use case? Such a feature would be dangerous - a badly written client could accidentally turn off events for another client.

I've tried to do it accidentally a few times. Creating a PyTango DeviceProxy that subscribes, and storing the IDs somewhere. Then trying to use a different DeviceProxy instance to unsubscribe later. It is something you learn not to do.

Do we really need this unsubscription in the DeviceProxy destructor? Is this to avoid some crashes if the user forgets to unsubscribe before the DeviceProxy object is deleted?

In long running system, clients could come and go very often. If they all forget to unsubscribe, we wouldn't want that to cause the server use up more and more resources (memory, CPU). I'm not sure how the ZMQ and events mechanisms work, so I don't know if the auto unsubscription is required.

In the past (before 2016), we didn't have this auto-unsubscription.
I guess it was introduced maybe because the only way to unsubscribe is currently via the DeviceProxy object which was used to subscribe.
It is like that because there are some maps stored at the DeviceProxy object level. So if you delete this DeviceProxy object, you'll have no way to unsubscribe in the future.

Does it actually make sense to unsubscribe there?
It might be convenient in some cases to create a DeviceProxy object to subscribe to events on some attributes and still receive events for these attributes, even after the DeviceProxy object has been deleted?

I'm not sure that would be useful. Doesn't the DeviceProxy object create the ZMQ client that receives the events? If so, the ZMQ client would disappear when the DeviceProxy object is deleted.

The ZMQ client part is handled by a the ZmqEventConsumer object which is a single instance in the process so I think it does not disappear after you delete the DeviceProxy object.

Are you thinking of ZMQ-only client that could just listen on a particular channel and receive events that were initiated by someone else (another DeviceProxy)?

I'm just thinking about a use case where you would do:

int id = 0;
{ 
    Tango::DeviceProxy my_proxy("sys/tg_test/1");
    id = my_proxy.subscribe_event("ampli",Tango::ATTR_CONF_EVENT,&my_callback);
}
// my_proxy is out of scope and is deleted
// my_callback is still executed if ampli ATTR_CONF_EVENT are received until an unsubscribe_event is invoked with id
// We might need a new object to manage the subscriptions/unsubscriptions and this one should guarantee the uniqueness of the event id
sleep(60);
//
subscription_manager.unsubscribe(id);

It might be useful to provide a way to unsubscribe to events, even with a DeviceProxy object which is different than the one which was used during the subscribe_event call?

I'm not in favour, but maybe you can provide a good use case? Such a feature would be dangerous - a badly written client could accidentally turn off events for another client.

I've tried to do it accidentally a few times.

That's my point. It's quite intuitive to try to do that. You're not the first one to try it (I also did it at some point and I saw some colleagues doing it as well and this can have some very strange side effects) and you won't be the last.

Creating a PyTango DeviceProxy that subscribes, and storing the IDs somewhere. Then trying to use a different DeviceProxy instance to unsubscribe later. It is something you learn not to do.

In the current situation, yes.
I'm just throwing new ideas here for the future versions of Tango. I think it would be nice to be able to do it in the future, or maybe to introduce an EventSubscriptionManager object to handle all the subscriptions and unsubscriptions?
Maybe a bad idea? I just want to trigger a discussion.

The ZMQ client part is handled by a the ZmqEventConsumer object which is a single instance in the process so I think it does not disappear after you delete the DeviceProxy object.

Then it is still in the same process context... even deleting this in a second time will not affect other clients. Clients should do the proper way and unsibscribe events... but I see the reason to force unsubscribing at DeviceProxy destruction. We need to keep this.
But I'm not in favour of allowing a client to unsubscribe events on an arbitrary ID.

Hi all,
Correction for #737 is ready but does not impact this issue which is a separate problem.

I have not investigated this much, but most likely this bug is related to how we track device proxies during event subscription and unsubscription. I added this code to the example:

std::vector<int> pr1ids{};
Tango::ApiUtil::instance()->get_zmq_event_consumer()->get_subscribed_event_ids(pr1->get_device_proxy(), pr1ids);
std::vector<int> pr2ids{};
Tango::ApiUtil::instance()->get_zmq_event_consumer()->get_subscribed_event_ids(pr2->get_device_proxy(), pr2ids);

And surprisingly pr1ids contains {1, 2} and pr2ids is empty.

If you reverse the order of destructor calls, it does not throw any exceptions (pr2 destructor unsubscribes nothing):

pr2->unsubscribe_event(ev2);
delete pr2;
pr1->unsubscribe_event(ev1);
delete pr1;

DeviceProxy in each AttributeProxy is a different instance (with a different address). Maybe we are assigning the second subscription to the wrong proxy because we are subscribing to the same event in the same attribute for the second time. This seems to be easily fixable.

Okay, I did some more checking. Event subscriptions are tracked globally (for all device proxies) in a map structure indexed by event names. In the original bug report event name is common for both subscription: tango://172.17.0.4:10000/sys/tg_test/1/long_scalar.idl5_attr_conf. So this is what happens:

  • when the first proxy subscribes, we create a new entry in this global map, we store the pointer to the DeviceProxy and we store the callback in a list of callbacks,
  • when the second proxy subscribes, we already have a matching entry in subscription map, we do not touch DeviceProxy pointer but we just add new callback to the callback list.

It kind of looks like that internally (pseudocode):

event_callback_map:
    'tango://172.17.0.4:10000/sys/tg_test/1/long_scalar.idl5_attr_conf':
        device: pointer to the first device proxy
        callback_list:
            - {id: 1, callback: first callback },
            - {id: 2, callback: second callback }

If we call unsubscribe_event at this point, it does not matter which device proxy we use (e.g. we may use the 2nd proxy to unsubscribe from the first event and the other way around).

The bug is that when we call the destructor of the second device proxy, it does nothing with the subscriptions because we stored the other device proxy in the global subscription map. But when we delete the first proxy, it unsubscribes from both events.

IMO what we should do is to store pointer to the device proxy in the callback list, and when that proxy gets deleted, we should automatically delete callbacks created via this particular device proxy. But we should still be able to use unsubscribe_event explicitly to remove subscription created by other proxies.

event_callback_map:
    'tango://172.17.0.4:10000/sys/tg_test/1/long_scalar.idl5_attr_conf':
        callback_list:
            - {id: 1, callback: first callback, device: pointer to the first device proxy, },
            - {id: 2, callback: second callback, device: pointer to the second device proxy }

One more important thing. We store the proxy in event_callback_map to give this proxy back to the client in event callback in EventData structure. With current implementation, even if we subscribe using two different proxies, both callbacks will always receive the pointer to the first proxy. This is one more reason why we shall store all device proxies in callback map.

Thanks @mliszcz. I agree with the new data structure, and that each device proxy's destructor only unsubscribes from the events created by that same device proxy instance. Also good that the callback will use the pointer to the correct device proxy instance.

But we should still be able to use unsubscribe_event explicitly to remove subscription created by other proxies.

From previous comments, both @lorenzopivetta and I are not in favour of allowing proxy 1 to cause proxy 2's subscription to be removed (or vice versa). In your example above, the first proxy should only be allowed to unsubscribe from id 1, and the second proxy should only be allowed to unsubscribe from id 2.

But we should still be able to use unsubscribe_event explicitly to remove subscription created by other proxies.

From previous comments, both @lorenzopivetta and I are not in favour of allowing proxy 1 to cause proxy 2's subscription to be removed (or vice versa). In your example above, the first proxy should only be allowed to unsubscribe from id 1, and the second proxy should only be allowed to unsubscribe from id 2.

What you're proposing is a breaking change and clients may depend on that behavior. It is more a feature than a bugfix, thus I wouldn't change it in this PR (also because it is not needed to resolve the problem described in the ticket). Maybe we can have a separate issue for not allowing unsubscribing on arbitrary ids.

What you're proposing is a breaking change and clients may depend on that behavior. It is more a feature than a bugfix, thus I wouldn't change it in this PR (also because it is not needed to resolve the problem described in the ticket). Maybe we can have a separate issue for not allowing unsubscribing on arbitrary ids.

Good point. I agree we don't want breaking changes in cppTango for bug fixes.

In PyTango there is already a check for this - PyTango keeps its own map of subscriptions and proxies can only unsubscribe from their own event subscriptions.

Hi all,

PR with correction for the issue described in first post is ready in #746. The attached program completes successfuly and no exception is thrown.

@ajoubertza, @lorenzopivetta I opened #747 for further discussion whether we shall allow to unsubscribe from arbitrary subscription ids. @bourtemb maybe you can comment as well because you suggested to have the opposite behavior (to allow subscriptions outlive device proxies).

@bourtemb maybe you can comment as well because you suggested to have the opposite behavior (to allow subscriptions outlive device proxies).

I overlooked the following fact when I said that:

One more important thing. We store the proxy in event_callback_map to give this proxy back to the client in event callback in EventData structure. With current implementation, even if we subscribe using two different proxies, both callbacks will always receive the pointer to the first proxy. This is one more reason why we shall store all device proxies in callback map.

We could get some troubles if the callback is using the (destroyed) DeviceProxy pointer provided with the EventData.