ChristianTremblay/BAC0

Discover and WhoIs provide inconsistent results

torablien opened this issue · 22 comments

A call to .discover() or .whois() on BAC0.lite() instances frequently leads to inconsistent results where some networks and devices are missed. This seems to be because the wait below in .whois() is not long enough.

for each in range(100):
time.sleep(1 / 1000)

Extending this timeout to 3 seconds helps, but still is sometimes insufficient depending on the network (also not sure why the for loop for that 100ms wait).

I can repro this by creating a mock BACnet IP device using BAC0.device() on another machine on the network and trying to discover it. At the default 100ms, you can see it frequently miss devices and networks by just re-running .discover() multiple times with INFO logging. A direct .read() or .write() to the device does not seem to have this issue.

i think if you BACnet network is really big and especially MSTP devices the long time.sleeps will help else networks i think can get bogged way down

I can reproduce this on my own home network (~30 connected devices) with just one mock BACnet IP device running on a Raspberry Pi.

I wonder if its a BACnet thing... at least when i was in BACnet contracting for HVAC it was like the norm not to get super consistent results in device discoveries especially when the system was being setup for the first time people had to maybe run it dozens of times to make sure all your devices are pulled into the BAS your setting up.

i play around with this sometimes:

devices = bacnet.whois(global_broadcast=True)
device_mapping = {}
addresses = []
for device in devices:
    if isinstance(device, tuple):
        device_mapping[device[1]] = device[0]
        print("Detected device %s with address %s" % (str(device[1]), str(device[0])))
print(device_mapping)
print((str(len(device_mapping)) + " devices discovered on network."))

But in the aged building i run this in it would not surprise me if i get inconsistent results every once and a while. Like its better to hard code all your known addresses Vs rely on a discover processes

I'm new to BACnet personally but I wrote a similar script and, agreed, seems pretty inconsistent. I'm trying to build something that I can connect to a network, discover all of the addresses, and automatically put them in a DB (effectively hardcoding them). Right now, it seems like I have to either run this multiple times and hope to catch everything or increase the sleep time to start getting consistent results. Any suggestions?

Not doing anything security related but I'm interested in what what you're thinking for device health? Can you share more details? I might be able to build it :)

Listen to latest error code podcast, EP 24: Securing OT Devices In The Field:
https://youtu.be/9K4WY9mF_J4?si=zqTxN1UdoI6JAOFp

This person being interviewed talks about building an application where whatever OT devices your dealing with your communicating with them in their native environment or protocol. I dont think pounding a BAS with global discoveries is the way to go, but only about 25% of it. Like it should be used on some interval, like ever hour or so where then it would be IMHO wise to build a data base of the types of "systems" that application serves.

And someone should be doing there do diligence on the building and systems in side the building... is it a hospital, is it a data center, a school, a research lab?? Is it a variable volume HVAC system or constant volume, water cooled chillers or air cooled chillers??

Where then once someone knows that they can peas meal what sensors are critical to know if the data center chiller plant is operational??

IE., run your auto global discover once and hour, find a "point" in each device which can be simple BACnet Read Request which can tell if the device is still alive,... and then format your read requests in a way that you can validate your HVAC or whatever critical system is "healthy"

IE., an outside air temp sensor is often globally shared across the BAS with supervisory level logic where often there is only 1 hardwired physical OA-T sensor which is often wired to the boiler plant directly; thats a critical sensor in the BAS. Periodically monitoring zone temperatures... as well as air handling unit discharge air sensors or fan status can validate that the equipment is running or ventilating when it should be running. or the piping temp sensors on a central plant to ensure its heating or cooling depending on the season. Its almost common sense knowledge for a big building its not much different than what would you want to monitor in your home while your away??

This looks like a performance issue. To be completely honest, I think bacpypes 3 will help with those. Maybe with the async version of BAC0 we'll see improvements.

This looks like a performance issue. To be completely honest, I think bacpypes 3 will help with those. Maybe with the async version of BAC0 we'll see improvements.

Do you have any suggestions on a workaround or mitigation until that is available?

A recent PR was put in the develop branch to reduce a sleep time in discovery. Can you try and see if this improves your use case ?

Do you mean my PR (#432) which keeps the same sleep time but makes it customizable or something else that would be better to try?

I haven't realized it was you :-)

Guess I'm still foggy from Covid....

One thing you could try with the develop branch is the async version ? You need to install bacpypes 3 and use bac0.async instead of lite.

This maybe related to the issue, my MSTP network is 76800 baud rate, and when running discover() I also get inconsistent results. When i insert in my script time.wait(30), I get better results by also running discover() 4 times. This time I get 99% consistency.
Is it possible to match the baud rate on the MSTP network by adjusting delay time of the BAC0 communication rate? Or is it not related at all?

great question... BACnet is funny like that. Its no suprise that when you do a who_is that is somewhat unpredictable. Quite often when BAS platforms are setup in the field you have to do like a Million "who is's`" where then once all your devices are learned in your often good to go where I think when the BAS platform is "built" or setup there really isnt a need to do any more "who is"... Hopefully that makes sense, I am pretty sure that a BAS when its setup doesnt do "who is" under normal day to day operational modes...

If you are using it for some interesting reasons... your results will get better and better the less often you use it especially on MSTP networks. For example using it greater than one time a minute seems like some application is using it way to much... where your better off using it like once and hour where then in between that hourly time you can just read a sensor value, perform a read request to each device directly to ensure its still communicating..., like that would be fine on sub minute intervals but I think Global Who-Is commands create lots traffic and disruptions especially if the network is already a sensitive one where device communications can be flaky.

Totally makes sense. I am writing a script which would discover all of the devices on an unknown network to build a map of devices in database, which it will use in the future to monitor devices and to issue write commands to them. So indeed, this script is supposed to be used at the beginning to build a full map of devices, I just want to make sure to build a full map first which I can be sure is a full map and not missing any devices. I thought to use discover() several times and append to discovered devices list until there are no new appends, then stop.

yeah it maybe nice to run that indefinitely but not very often... In BAS contracting typically when the platform is setup like I was saying it may take hundreds discovery processes or your lucky if its just a handful...

Totally makes sense. I am writing a script which would discover all of the devices on an unknown network to build a map of devices in database, which it will use in the future to monitor devices and to issue write commands to them. So indeed, this script is supposed to be used at the beginning to build a full map of devices, I just want to make sure to build a full map first which I can be sure is a full map and not missing any devices. I thought to use discover() several times and append to discovered devices list until there are no new appends, then stop.

This is exactly my usecase as well (though primarily for BACnet IP networks). I've done something similar that runs repeatedly until it sees no changes after N consecutive runs. Extending the sleep time using the sleep arg I added in (#432) helped significantly, even though it doesn't address the root of this issue.

I am only a beginner in python or any other programming language, excuse if it is in some way redundant or incorrect. This works for me though to populate a full device list with only two runs of this script (IP is not in it, and path for db creation is generalized):
`#!/usr/bin/env python

load libraries

import BAC0
import sqlite3
import time
from bacpypes.core import enable_sleeping

enable_sleeping()

start the bacnet comm (BAC0 and bacpypes)

bacnet = BAC0.lite(ip='xxx.xxx.xxx.xxx/22')

def find_devices():
enable_sleeping()
bacnet.discover()
time.sleep(10.0)
enable_sleeping()
bacnet.discover()
time.sleep(10.0)
enable_sleeping()
bacnet.discover()
time.sleep(10.0)
enable_sleeping()
bacnet.devices
time.sleep(10.0)
print(bacnet.devices)

writing result to sql database:

def writeSqlDB():
# connect to SQL database and setting cursor for writing:
conn = sqlite3.connect("/home/bacnet/program/program/sqliteDB.db")
cur = conn.cursor()
# creating initial table, if not already done:
cur.execute("CREATE TABLE IF NOT EXISTS bacnetNWdevices(deviceName,vendorName, nwDevice TEXT UNIQUE, deviceLong)")
for item in bacnet.devices:
cur.execute("INSERT OR IGNORE INTO bacnetNWdevices VALUES (?, ?, ?, ?)",
(
item[0],item[1],item[2],str(item[3]) #last item in bacnet.devices is tuple, converting it to string
)
)
conn.commit()

def main():
find_devices()
writeSqlDB()

if name == "main":
main()`

Don't use time.sleep

You stop the execution of code... loosing all the answers you could receive in the mean time.

Define a recurring task.

By using a Jupyter Notebook, or python REPL (ipython REPL do not work)
If you find time to test that would be nice to see if this improves discovery

Be sure to install the develop version of BAC0 and run :

import asyncio
import BAC0
bacnet = BAC0.Async(ip='yourIP/subnet') #ex. BAC0.Async(ip='192.168.211.208/24')

d = bacnet.discover()

read and write action must be awaited ex.

res = await bacnet.read("303:7 device 5007 objectList")

Thanks for following up

This issue had no activity for a long period of time. If this issue is still required, please update the status or else, it will be closed. Please note that an issue can be reopened if required.