IoTHub client stuck in endless 'Unexpected disconnect' state
MPTGits opened this issue · 4 comments
Context
- Library version used: 2.12.0
- OS and version used: Windows 10
- Python version: 3.10.9
- pip version: 2.3.11
- list of installed packages:
alabaster 0.7.12
anaconda-client 1.11.2
anaconda-navigator 2.4.0
anaconda-project 0.11.1
anyio 3.5.0
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astroid 2.14.2
astropy 5.2.2
astroquery 0.4.6
asttokens 2.0.5
atomicwrites 1.4.0
attrs 22.1.0
Automat 20.2.0
autopep8 1.6.0
azure-core 1.26.4
azure-iot-device 2.12.0
azure-servicebus 7.11.4
azure-storage-blob 12.16.0
Babel 2.11.0
backcall 0.2.0
backports.functools-lru-cache 1.6.4
backports.tempfile 1.0
backports.weakref 1.0.post1
bcrypt 3.2.0
beautifulsoup4 4.11.1
binaryornot 0.4.4
black 22.6.0
bleach 4.1.0
bokeh 2.4.3
boltons 23.0.0
Bottleneck 1.3.5
brotlipy 0.7.0
certifi 2022.12.7
cffi 1.16.0
chardet 4.0.0
charset-normalizer 2.0.4
click 8.0.4
cloudpickle 2.0.0
clyent 1.2.2
colorama 0.4.6
colorcet 3.0.1
comm 0.1.2
conda 23.3.1
conda-build 3.24.0
conda-content-trust 0.1.3
conda-pack 0.6.0
conda-package-handling 2.0.2
conda_package_streaming 0.7.0
conda-repo-cli 1.0.41
conda-token 0.4.0
conda-verify 3.4.2
configobj 5.0.8
constantly 15.1.0
contourpy 1.0.5
cookiecutter 1.7.3
cryptography 39.0.1
cssselect 1.1.0
cycler 0.11.0
cytoolz 0.12.0
daal4py 2023.0.2
dask 2022.7.0
datashader 0.14.4
datashape 0.5.4
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
deprecation 2.1.0
diff-match-patch 20200713
dill 0.3.6
distributed 2022.7.0
docstring-to-markdown 0.11
docutils 0.18.1
EasyProcess 1.1
entrypoint2 1.1
entrypoints 0.4
ephem 4.1.4
et-xmlfile 1.1.0
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.9.0
flake8 6.0.0
Flask 2.2.2
flit_core 3.6.0
fonttools 4.25.0
fsspec 2022.11.0
future 0.18.3
gensim 4.3.0
glob2 0.7
greenlet 2.0.1
h5py 3.7.0
HeapDict 1.0.1
holoviews 1.15.4
html5lib 1.1
huggingface-hub 0.10.1
hvplot 0.8.2
hyperlink 21.0.0
idna 3.4
imagecodecs 2021.8.26
imageio 2.26.0
imagesize 1.4.1
imbalanced-learn 0.10.1
importlib-metadata 4.11.3
incremental 21.3.0
inflection 0.5.1
iniconfig 1.1.1
intake 0.6.7
intervaltree 3.1.0
ipykernel 6.19.2
ipython 8.10.0
ipython-genutils 0.2.0
ipywidgets 7.6.5
isodate 0.6.1
isort 5.9.3
itemadapter 0.3.0
itemloaders 1.0.4
itsdangerous 2.0.1
janus 1.0.0
jedi 0.18.1
jellyfish 0.9.0
Jinja2 3.1.2
jinja2-time 0.2.0
jmespath 0.10.0
joblib 1.1.1
jplephem 2.21
json5 0.9.6
jsonpatch 1.32
jsonpointer 2.1
jsonschema 4.17.3
jupyter 1.0.0
jupyter_client 7.3.4
jupyter-console 6.6.2
jupyter_core 5.2.0
jupyter-server 1.23.4
jupyterlab 3.5.3
jupyterlab-pygments 0.1.2
jupyterlab_server 2.19.0
jupyterlab-widgets 1.0.0
keyring 23.4.0
kiwisolver 1.4.4
lazy-object-proxy 1.6.0
libarchive-c 2.9
llvmlite 0.39.1
locket 1.0.0
lxml 4.9.1
lz4 3.1.3
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.7.0
matplotlib-inline 0.1.6
mccabe 0.7.0
menuinst 1.4.19
mistune 0.8.4
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
mock 4.0.3
MouseInfo 0.1.3
mpmath 1.2.1
msgpack 1.0.3
mss 9.0.1
multipledispatch 0.6.0
munkres 1.1.4
mypy-extensions 0.4.3
navigator-updater 0.3.0
nbclassic 0.5.2
nbclient 0.5.13
nbconvert 6.5.4
nbformat 5.7.0
nest-asyncio 1.5.6
networkx 2.8.4
nltk 3.7
notebook 6.5.2
notebook_shim 0.2.2
numba 0.56.4
numexpr 2.8.4
numpy 1.23.5
numpydoc 1.5.0
openpyxl 3.0.10
packaging 22.0
paho-mqtt 1.6.1
pandas 1.5.3
pandocfilters 1.5.0
panel 0.14.3
param 1.12.3
paramiko 2.8.1
parsel 1.6.0
parso 0.8.3
partd 1.2.0
pathlib 1.0.1
pathspec 0.10.3
patsy 0.5.3
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 22.3.1
pkginfo 1.9.6
platformdirs 2.5.2
plotly 5.9.0
pluggy 1.0.0
ply 3.11
pooch 1.4.0
poyo 0.5.0
prometheus-client 0.14.1
prompt-toolkit 3.0.36
Protego 0.1.16
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
PyAutoGUI 0.9.54
pycodestyle 2.10.0
pycosat 0.6.4
pycparser 2.21
pycryptodomex 3.18.0
pyct 0.5.0
pycurl 7.45.1
pydantic 1.10.7
PyDispatcher 2.0.5
pydocstyle 6.3.0
pyephem 9.99
pyerfa 2.0.0
pyflakes 3.0.1
PyGetWindow 0.0.9
pygit2 1.13.2
Pygments 2.11.2
PyHamcrest 2.0.2
PyJWT 2.4.0
pylint 2.16.2
pylint-venv 2.3.0
pyls-spyder 0.4.0
PyMsgBox 1.0.9
PyNaCl 1.5.0
pyodbc 4.0.34
pyOpenSSL 23.0.0
pyparsing 3.0.9
pyperclip 1.8.2
PyQt5 5.15.7
PyQt5-sip 12.11.0
PyQtWebEngine 5.15.4
PyRect 0.2.0
pyrsistent 0.18.0
pyscreenshot 3.1
PyScreeze 0.1.29
pyserial 3.5
pysmi 0.3.4
pysnmp 4.4.12
PySocks 1.7.1
pytest 7.1.2
python-dateutil 2.8.2
python-lsp-black 1.2.1
python-lsp-jsonrpc 1.0.0
python-lsp-server 1.7.1
python-slugify 5.0.2
python-snappy 0.6.1
pytoolconfig 1.2.5
pytweening 1.0.7
pytz 2022.7
pyviz-comms 2.0.2
pyvo 1.4.1
PyWavelets 1.4.1
pywin32 305.1
pywin32-ctypes 0.2.0
pywinpty 2.0.10
PyYAML 6.0
pyzmq 23.2.0
QDarkStyle 3.0.2
qstylizer 0.2.2
QtAwesome 1.2.2
qtconsole 5.4.0
QtPy 2.2.0
queuelib 1.5.0
regex 2022.7.9
requests 2.28.1
requests-file 1.5.1
requests-toolbelt 0.9.1
requests-unixsocket 0.3.0
rope 1.7.0
Rtree 1.0.1
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
ruamel-yaml-conda 0.17.21
scikit-image 0.19.3
scikit-learn 1.2.1
scikit-learn-intelex 20230228.214818
scipy 1.10.0
Scrapy 2.8.0
seaborn 0.12.2
Send2Trash 1.8.0
service-identity 18.1.0
setuptools 65.6.3
sgp4 2.22
sip 6.6.2
six 1.16.0
sklearn 0.0.post5
skyfield 1.48
slack-sdk 3.23.0
smart-open 5.2.1
sniffio 1.2.0
snowballstemmer 2.2.0
sortedcontainers 2.4.0
soupsieve 2.3.2.post1
Sphinx 5.0.2
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.0
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
spyder 5.4.1
spyder-kernels 2.4.1
SQLAlchemy 1.4.39
stack-data 0.2.0
statsmodels 0.13.5
sympy 1.11.1
tables 3.7.0
tabulate 0.8.10
TBB 0.2
tblib 1.7.0
tenacity 8.0.1
terminado 0.17.1
text-unidecode 1.3
textdistance 4.2.1
threadpoolctl 3.2.0
three-merge 0.1.1
tifffile 2021.7.2
tinycss2 1.2.1
tldextract 3.2.0
tokenizers 0.11.4
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.1
toolz 0.12.0
torch 1.12.1
tornado 6.1
tqdm 4.64.1
traitlets 5.7.1
transformers 4.24.0
twirl 0.1.3
Twisted 22.2.0
twisted-iocpsupport 1.0.2
typing_extensions 4.4.0
ujson 5.4.0
Unidecode 1.2.0
urllib3 1.26.14
w3lib 1.21.0
watchdog 2.1.6
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
Werkzeug 2.2.2
whatthepatch 1.0.2
wheel 0.38.4
widgetsnbextension 3.5.2
win-inet-pton 1.1.0
wincertstore 0.2
wrapt 1.14.1
xarray 2022.11.0
xlwings 0.29.1
yapf 0.31.0
zict 2.1.0
zipp 3.11.0
zope.interface 5.4.0
zstandard 0.19.0
Description of the issue
When I try to open an IoTHub conenction and send a message to the IoTHub device to be routed to the correct servecicebus queue I sometimes get the following error(the log that says:
This actually ends up sending hundred of duplicates of the message to the servevicebus queue. The IotHub client is used to send messages every 1-2minutes and every time I send a message I follow this steps:
- I create an instance of the IoTHub client
- I send the message to the IoTHub router
- I shutdown the client
This issue seems to be happening really often now, about a week ago it was happening only from time to time, but in the last 6-7 days it happens very often, maybe more than a couple of times a day.
I've checked and I am sure there isn't any other device that connects with the same connection string and I shutdown each client before I try to create a new one and use it.
I've also tried checking the device twin to see if it is hanging on status 'Connected' when there were no IotHub instances created and it was not, the status in the device twin was 'Disconnected' which is correct
Is there any solution to this to make sure this never happens apart from migrating to using the servicebus queue directly instead of the IoTHub message routing functionality?
Code sample exhibiting the issue
from azure.iot.device import IoTHubDeviceClient, Message
import time
CONNECTION_STRING = "Your IoT Hub Device Connection String"
def run():
for i in range(20):
try:
print(f"Iteration {i+1}: Creating IoTHubDeviceClient")
client = IoTHubDeviceClient.create_from_connection_string(CONNECTION_STRING)
client.connect()
print("Sending message")
message = Message(f"Test message {i+1}")
# We use the streamType property to determine to what queue to route the message to
message .application_properties = {
"tornado-warning": "yes",
"streamType": "results"
}
client.send_message(message)
print(f"Message {i+1} sent")
print("Shutting down IoTHubDeviceClient")
client.shutdown()
except Exception as e:
print(f"Encountered exception: {e}")
finally:
time.sleep(3)
if __name__ == "__main__":
run()
Console log of the issue
The only log indicating the issue that start appearing all over the application logs:
"Unexpected disconnection" occurs when the connection between the client and the IoT Hub is lost. If you're seeing an uptick in the last week, it likely has more to do with your particular network environment and IoT Hub configuration than it does the SDK itself (2.12 was released over a year and a half ago, and has had no notable connectivity issues we know of at this time)
You may want to try upgrading to the newly released SDK 2.13 just to be safe, but the fixes we added there likely won't change what you're experiencing.
My first instinct would be to make sure that you aren't using any duplicate client IDs/connection strings, but it sounds like you've already checked and ruled that out as a possibility. Double check if you can - this is the most common reason connection failure like this happens.
Assuming it's not that, we'll need to take a look at the logs to see if there's something deeper going on - you can enable logging in your application by adding the following lines:
import logging
logging.basicConfig(level=logging.DEBUG)
That said, most likely this is an issue on the Hub/cloud side rather than the SDK, so there may not be much to see here. But if you can get us some logs, we'll take a look, and see if we can't at least get a hint of what's going on.
"Unexpected disconnection" occurs when the connection between the client and the IoT Hub is lost. If you're seeing an uptick in the last week, it likely has more to do with your particular network environment and IoT Hub configuration than it does the SDK itself (2.12 was released over a year and a half ago, and has had no notable connectivity issues we know of at this time)
You may want to try upgrading to the newly released SDK 2.13 just to be safe, but the fixes we added there likely won't change what you're experiencing.
My first instinct would be to make sure that you aren't using any duplicate client IDs/connection strings, but it sounds like you've already checked and ruled that out as a possibility. Double check if you can - this is the most common reason connection failure like this happens.
Assuming it's not that, we'll need to take a look at the logs to see if there's something deeper going on - you can enable logging in your application by adding the following lines:
import logging logging.basicConfig(level=logging.DEBUG)
That said, most likely this is an issue on the Hub/cloud side rather than the SDK, so there may not be much to see here. But if you can get us some logs, we'll take a look, and see if we can't at least get a hint of what's going on.
Hi, we actually had to migrate to using the servicebus queue that we were having the messages routed to in order to hotfix this as it was causing some issue in production. It seems like using the service bus queue directly is way safer to avoid this kind of issue, do you think there might be and cons? On the logging side I tried to trace back some old logs but the only thing I found is I get "Cannot connect to IoT Hub", when I get this kind of bad connectivity I re-try a couple of times and then just fail the request, I suspect this might be due to a network issue.
My line of thought is that we might get a bad network connectivity in the middle of having an open IoTHub connection, and when we try to disconnect from it if the network is still bad we fail to disconnect properly and that leaves us with a hanging connection and on the next loop when we open a new connection and the network connection is back to good this two connections start to clash, do you think this is a possible scenario in our case?
It's hard to say without logs, but that could be the beginning of an explanation. The Hub should eventually kill the connection on its end, but there would be a delay before that happens. However, when you get a good connection and connect to the IoTHub again, that should allow the new connection to be established while booting the old connection (which was already dead), so I don't think it's a sufficient explanation (assuming you're connecting with the same client ID).
Close due to inactivity