How to connect DeviceProxy to device launched via MultiDeviceTestContext
DrewDevereux opened this issue · 10 comments
Tests that use MultiDeviceTestContext
will sometimes pass, sometimes error out with a message objecting that environment variable TANGO_HOST
is not defined.
If you set that environment variable, MultiDeviceTestContext
will try to connect to a database at that location. And if you stand up a database there, it will talk to that database!
I can see in the code that the tango database is supposed to be mocked out with a text file. It seems that that mocking isn't always working.
I haven't found a way to fix the problem, but I can make the problem happen every time by putting a time.sleep(5)
at the top of the post init callback. This makes me think that there is some kind of race condition happening.
The issue occurs regardless of whether your asynchronous context is threads or processes, but it is worse for threads.
@DrewDevereux Thanks for the bug report. I think the TANGO_HOST
should only be needed if a DeviceProxy
(or similar) object is instantiated with a Tango name that doesn't include #dbase=no
. If you could post some example code or a link to yours, that would help with the investigation.
See for example the four tests in https://gitlab.com/ska-telescope/lfaa-lmc-prototype/-/blob/master/tests/test_integration.py, all currently skipped. If run, they might succeed, but far more likely they will error out with something like the following. It is the unpredictability that makes me think it is a race condition -- for example I have had tests passing locally, so I push, and then they all fail in CI.
self = <tests.test_integration.TestMccsIntegration object at 0x7f6c054454a8>
def test_master_enable_subarray(self):
"""
Test that a MccsMaster device can enable an MccsSubarray device.
"""
devices_info = [
{
"class": MccsMaster,
"devices": (
{
"name": "low/elt/master",
"properties": {
"MccsSubarrays": ["low/elt/subarray_1",
"low/elt/subarray_2"],
"MccsStations": ["low/elt/station_1"],
"MccsTiles": ["low/elt/tile_47"],
}
},
)
},
{
"class": MccsSubarray,
"devices": [
{
"name": "low/elt/subarray_1",
"properties": {
}
},
{
"name": "low/elt/subarray_2",
"properties": {
}
},
]
}
]
with MultiDeviceTestContext(devices_info) as context:
master = context.get_device("low/elt/master")
subarray_1 = context.get_device("low/elt/subarray_1")
subarray_2 = context.get_device("low/elt/subarray_2")
# check both subarrays are disabled
assert subarray_1.adminMode == AdminMode.OFFLINE
assert subarray_1.State() == DevState.DISABLE
assert subarray_2.adminMode == AdminMode.OFFLINE
assert subarray_2.State() == DevState.DISABLE
# enable subarray 1
> master.EnableSubarray(1)
tests/test_integration.py:66:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py37/lib/python3.7/site-packages/tango/device_proxy.py:244: in f
return dp.command_inout(name, *args, **kwds)
.tox/py37/lib/python3.7/site-packages/tango/green.py:195: in greener
return executor.run(fn, args, kwargs, wait=wait, timeout=timeout)
.tox/py37/lib/python3.7/site-packages/tango/green.py:109: in run
return fn(*args, **kwargs)
.tox/py37/lib/python3.7/site-packages/tango/connection.py:108: in __Connection__command_inout
r = Connection.command_inout_raw(self, name, *args, **kwds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Device(low/elt/master), cmd_name = 'EnableSubarray', cmd_param = 1
def __Connection__command_inout_raw(self, cmd_name, cmd_param=None):
"""
command_inout_raw( self, cmd_name, cmd_param=None) -> DeviceData
Execute a command on a device.
Parameters :
- cmd_name : (str) Command name.
- cmd_param : (any) It should be a value of the type expected by the
command or a DeviceData object with this value inserted.
It can be ommited if the command should not get any argument.
Return : A DeviceData object.
Throws : ConnectionFailed, CommunicationFailed, DeviceUnlocked, DevFailed from device
"""
param = __get_command_inout_param(self, cmd_name, cmd_param)
> return self.__command_inout(cmd_name, param)
E PyTango.DevFailed: DevFailed[
E DevError[
E desc = TANGO_HOST env. variable not set, set it and retry (e.g. TANGO_HOST=<host>:<port>)
E origin = Database::Database
E reason = API_TangoHostNotSet
E severity = ERR]
E
E DevError[
E desc = Failed to execute command_inout on device low/elt/master, command EnableSubarray
E origin = Connection::command_inout()
E reason = API_CommandFailed
E severity = ERR]
E ]
Thanks, I think I see what is happening. The code that fails is instantiating a DeviceProxy
, but the FQDN it is using is the short form, low/elt/subarray_1
. If we use the short form, then the DeviceProxy has to connect to a Tango database to find the hostname and port connection details for the device. Similarly, an FQDN like tango://mydatabase:10000/low/elt/subarray_1
will connect to the Tango database first.
The DeviceProxy objects are not aware that we are running some devices via the MultiDeviceTestContext
. To connect without a database, we need an FQDN like tango://172.17.0.3:48493/low/elt/subarray_1#dbase=no
. In this case we are providing the IP (or hostname) and port for the device itself.
We need to modify the FQDNs that the device under test will use. Conveniently your code gets them from device properties, so we can modify them. However, we have to know what they are before starting the device, since the device only get the properties during init_device
.
The MultiDeviceTestContext
has a method that builds up the nodb device names we would need:
Lines 360 to 363 in df925b5
but we have a "chicken-and-egg" problem. We need an instance of the test context to use the method it, but we need to provide the properties before creating an instance. Thus we end up re-implementing it. We also need to know the hostname/IP and port before instantiating. By default, the test context figures that out and Tango allocates a free port automatically. However, we can provide these details as kwargs.
It will be clearer with example code:
import socket
import tango.test_context
# taken from https://github.com/tango-controls/pytango/blob/df925b55/tests/test_event.py#L30
def get_open_port():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 0))
s.listen(1)
port = s.getsockname()[1]
s.close()
return port
def test_example():
host = tango.test_context.get_host_ip()
port = get_open_port()
def get_nodb_device_access(device_name):
form = 'tango://{0}:{1}/{2}#dbase=no'
return form.format(host, port, device_name)
devices_info = [
...
"name": "low/elt/master",
"properties": {
"MccsSubarrays": [get_nodb_fqdn("low/elt/subarray_1"),
get_nodb_fqdn("low/elt/subarray_2")],
...
]
with MultiDeviceTestContext(devices_info, host=host, port=port) as context:
master = context.get_device("low/elt/master")
...
Flakiness / race condition
As to why it appears flaky, my only guess it that in you local environment you have the TANGO_HOST
variable defined, and sometimes you have a databaseds and a real subarray device running. The test connects to the real device instead of the one launched as by MultiDeviceContext
.
Alternative testing strategy
We could also mock out the DeviceProxy objects and not launch all the dependent devices, which is the approach used here:
https://gitlab.com/ska-telescope/tmc-prototype/-/blob/ceedf59b/tmcprototype/centralnode/test/unit/central_node_test.py
@ajoubertza Interesting!
Another alternative would be to support #dbase=no
in the TANGO_HOST
variable.
For instance, TANGO_HOST=172.17.0.3:48493#dbase=no
would resolve low/elt/subarray_1
to tango://172.17.0.3:48493/low/elt/subarray_1#dbase=no
. Then the DeviceTestContext
could set the proper tango host through os.environ["TANGO_HOST"]
for the duration of the test context.
Also note that pytango might "easily" support this before cpptango using the following check in the __DeviceProxy__init__
method:
tango_host = os.environ.get("TANGO_HOST", "")
tango_host, suffix = tango_host.split("#") if "#" in tango_host else (tango_host, "")
if tango_host and suffix and not is_resolved(device_name):
device_name = resolve(device_name, tango_host) + "#" + suffix
Thanks @vxgmichel - that's a cool idea! It would make testing a lot simpler.
What would the is_resolved
and resolve
methods do? I'm guessing resolve
is just sanitising user input to build up the FQDN reliably. I'm not sure why we would need an is_resolved
check.
Thanks very much @ajoubertza for this code. I confirm that I can get my tests to pass this way. I am uncertain whether this is intended as a workaround until the code is "fixed", or instructions for how to correctly use MultiDeviceTestContext
. If the latter, I think an update to the class autodoc is needed, because the autodoc example makes it look very much more straightforward than this.
Re "flakiness", my experience was not just that the tests were sometimes passing, sometimes failing. There were also times when tests 1, 2 and 4 would pass but 3 would fail, for example. And putting a sleep into the post-init callback would cause all four to fail. So I feel there is still a mystery there... but maybe it will have to remain a mystery for now.
Thanks also for the link to a mocking example. I want to treat the example I gave as a lightweight integration test, so no mocking there. But mocking will come in very useful for unit testing.
What would the
is_resolved
andresolve
methods do? I'm guessingresolve
is just sanitising user input to build up the FQDN reliably. I'm not sure why we would need anis_resolved
check.
This is roughly what I had in mind:
def is_resolved(device_name):
return device_name.startswith("tango://")
def resolve(device_name, tango_host):
return f"tango://{tango_host}/{device_name}"
Although I suspect the actual implementation is going to be a bit trickier than that :)
Things got very messy for me trying to implement this solution into a system where device A passes device B the FQDNs of devices that device B should be talking to... until I realised that I only need the long form (so far) when creating a DeviceProxy.
So I created a pytest fixture in conftest.py
that monkey-patches tango.DeviceProxy
:
class MyDeviceProxy(tango.DeviceProxy):
def _get_open_port():
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("", 0))
s.listen(1)
port = s.getsockname()[1]
s.close()
return port
HOST = get_host_ip()
PORT = _get_open_port()
def _get_nodb_fqdn(self, device_name):
form = 'tango://{0}:{1}/{2}#dbase=no'
return form.format(self.HOST, self.PORT, device_name)
def __init__(self, device_name, *args, **kwargs):
super().__init__(self._get_nodb_fqdn(device_name), *args, **kwargs)
tango.DeviceProxy = MyDeviceProxy # monkey-patch
@pytest.fixture(scope="class")
def tango_multidevice_context():
"""
Creates and returns a TANGO MultiDeviceTestContext object.
"""
devices_info = [
{
"class": MccsMaster,
"devices": (
{
"name": "low/elt/master",
"properties": {
"MccsSubarrays": ["low/elt/subarray_1",
"low/elt/subarray_2"],
"MccsStations": ["low/elt/station_1",
"low/elt/station_2"],
"MccsTiles": ["low/elt/tile_47",
"low/elt/tile_129"],
}
},
)
},
{
"class": MccsSubarray,
"devices": [
{
"name": "low/elt/subarray_1",
"properties": {
}
},
{
"name": "low/elt/subarray_2",
"properties": {
}
}
]
},
{
"class": MccsStation,
"devices": [
{
"name": "low/elt/station_1",
"properties": {
}
},
{
"name": "low/elt/station_2",
"properties": {
}
},
]
},
{
"class": MccsTile,
"devices": [
{
"name": "low/elt/tile_47",
"properties": {
}
},
{
"name": "low/elt/tile_129",
"properties": {
}
},
]
},
]
with MultiDeviceTestContext(devices_info, host=MyDeviceProxy.HOST, port=MyDeviceProxy.PORT) as context:
yield context
and now my tests are lovely and clean again!
I wanted to share this with you in case it leads to another solution.
Thanks for the further updates @DrewDevereux. For now, it is a workaround, but yes, the docstring can be improved. Your mocking approach is nice. You could submit something like that as an example in this repo (see the examples folder). That would make it even clearer than we can do in a docstring.
In future, I'd would like it to work automatically, which is where @vxgmichel's idea comes in handy. Then you wouldn't need the mock.
Hi @ajoubertza. As suggested, I have made a pull request with a MultiDeviceTestContext
pytest fixture, and an example of its use in testing.