tango-controls/TangoTickets

Stop receiving events in Client when subscribed to attributes from different devices

teresanunez opened this issue · 9 comments

Hello,
we have a problem with a client getting events from two attributes of two different tango devices.
After some time the events of one of the attributes are not received any more, but those events are still sent, we can get them subscribing to this attribute from another client. The problem does not
happen if both attributes belongs to the same tango device.
Our client is written in python. The versions we have installed are:

PyTango Version: 9.2.0-2
Tango Version: 9.2.5
zmq Version 4.2.1-4

We are running in debian9 (Debian GNU/Linux 9.5 (stretch)).

The problem could also be reproduced at Alba using:

libtango9: 9.2.5a+dfsg1-2+patch1~bpo9+0~alba+1
python-tango: 9.2.2-1~bpo9+0~alba+1
libzmq5: 4.2.1-4

                                 Regards,
                                                 Teresa

To reproduce the problem you can use this client code:

import time
import PyTango

attr = PyTango.AttributeProxy(
    "tango://pc255.cells.es:10000/test/zreszela/dummyeventgenerator-01/attr")
doublescalar = PyTango.AttributeProxy(
    "tango://pc255.cells.es:10000/sys/tg_test/1/double_scalar")


class PyCallback:

    def push_event(self, event):
        if not event.err:
            print event.attr_name, event.attr_value.value
        else:
            print event.errors


cb = PyCallback()

ev = attr.subscribe_event(PyTango.EventType.CHANGE_EVENT, cb, [])
ev2 = doublescalar.subscribe_event(PyTango.EventType.CHANGE_EVENT, cb, [])

while 1:
    time.sleep(1)

and this dummy event generator DS.

To reproduce it:

  • configure 0.01 absolute event change on sys/tg_test/1/double_scalar attribute with 3 polling period
  • run the dummy event generator DS test with 0.1 period (argin of Start command)
  • start the above client

I reproduced it three times already, after approx: 1000, 2000 and 28000 events pushed by dummy event generator DS.

Only to clarify the conditions: I tested using two attributes from two different Tango Devices from two Tango Servers (and hier the problem occurs) and two atributes from the same tango device (and here does not appear). I have not tested it with two attributes from two tango devices from the same tango Server.

cmft commented

The problem can also be reproduced (using the same conditions reported by @reszelaz ) if the server is run in Tango8 (tango 8.1.2 + patches and PyTango 8.1.4 ) and the client in Tango9.

We could not reproduce the problem if the server is run in Tango7 (7.2.6 + patches) or if both (server and client) are run in old Tango (7 and 8).

Hi,

Since you're using ZMQ 4.2.1, this is probably due to cppTango#444.
We found a workaround in cppTango but this is not available in Tango 9.2.5.
The fix is available in cpptango 9.3.2.
Here is the Pull request with the workaround: cppTango#445
The bug in ZeroMQ was fixed in ZMQ 4.2.2.

Cheers,
Reynald

Thanks @bourtemb for your help! We are already running this test scenario with libzmq 4.2.5. Up to now no problems. We will keep you updated.

@bourtemb, many thanks. We could not start testing another versions at DESY, but we will try
it as soon as possible. It could solve many problems we have just now.

We confirm that the bug gets solved by either:

  • use of libzmq 4.2.5
  • patch libtango9 9.2.5 with PR#445

However we have discovered problems when using libzmq 4.2.5. The sardana test suite reports the following errors:

DevFailed: DevFailed[
DevError[
    desc = Failed to disconnect from event channel!
           Error while trying to unsubscribe the heartbeat ZMQ socket from the channel heartbeat publisher
           ZMQ message: No such file or directory
  origin = ZmqEventConsumer::disconnect_event_channel
  reason = API_ZmqFailed
severity = ERR]
]

The testsuite stresses starts and shutdowns of Tango device servers and subscriptions and unsubscriptions from its attributes.

So we finally went for patching Tango. @teresanunez if you are interested we could share with you the deb package.

I think this issue can be closed. And indeed this bug was very loudly announced on the last Tango meeting - we just forgot about it.. Thanks @bourtemb!

Good!
Thanks for your feedback! I close this issue.
We will have to have a look at the disconnect_event_channel errors you mention when using ZMQ 4.2.5.
@reszelaz , could you please create another issue for that with the steps to reproduce the problem?

Sure! I put it in our internal ALBA backlog and we will get back to you soon.