[Technical Question] How to implement a manual Start/Stop mechanism on the IOT Device Client Re-Connection Sample
gagwithgaffer opened this issue · 8 comments
Hello @drwill-ms , hope all is well.
Using the latest re-connection sample for the device client, I'm struggling to figure out how to re-start the connection without it throwing exceptions. In my project, I need to be able to start and stop the connection to Azure Hub on demand, the sample is based upon a console project but my project is a web app so i provide buttons on the web page to start/stop the process.
The notes in the code sample readme say "If you want to perform more operations on the device client, you should dispose and then re-initialize the client."
I have tried this by adding another method call for closing the client within the reconnection sample class. I've then tried multiple different combinations of actions within this method, such as closing the client then disposing it. Disposing the client throws an exception, if i simply close the client connection alone, I can then start up the process again but get exceptions further down the code sample in the InitializeAndSetupClientAsync method call where it checks whether the s_deviceClient is null.
Scenario 1 - Closing the client but not disposing it.
public async Task CloseConnection() { try { await s_deviceClient.CloseAsync(CancellationToken.None); } catch (Exception ex) { var error = ex.ToString(); } }
Scenario 2 - Closing the client then disposing it at the same time.
I have less luck trying this, because I get an exception when trying to dispose the client.
I wandered whether this may be something to do with tha fact that each time I call the CloseConnection method, I'm calling this from an async Task method which is essentially running from a new thread from the task pool and is not necessarily referring to the same client instance? but the fact that the s_deviceClient is created as a static volatile in the reconnection sample class makes me think I should be referring to the same device client instance when tryng to dispose it.
Since our last teams call back in August I had noticed some significant changes to the code sample (namely being your approach on the new retry policy mechanism) which looks like you were able to remove a lot of the code used in the older sample, so was keen to adopt the newer sample but just struglling to implement a start/stop mechanism correctly, thanks
Hi @gagwithgaffer and welcome back.
With our v1 client, if you close you should dispose to avoid memory leaks. It looks like our client is throwing a Null Reference Exception, which is terrible. I'll look into that to see if I can repro.
Note: In our v2 client, still under development but getting close, one is able to close and then open the client again, reducing the need to dispose it. We think this is a terrific change for scenarios like yours. You can learn more in discussion #2874.
Considering what you are trying to do, it is going to be a little more complicated, because not only do you have the connection status callback to ensure if your connection is dropped when you don't want it to be, it reconnects your client (even if that means close/dispose/init). Independently, you want to close/dispose your client without reinitializing when the device is done doing work and should go to sleep for a while (is that right?).
I've found this code causes an NRE.
try
{
var client = DeviceClient.CreateFromConnectionString("redacted");
await client.OpenAsync().ConfigureAwait(false);
await client.CloseAsync().ConfigureAwait(false);
await client.CloseAsync().ConfigureAwait(false);
client.Dispose();
}
catch (Exception ex)
{
await Console.Out.WriteLineAsync(ex.Message);
}
at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.d__36.MoveNext()
d__0.MoveNext()
at Microsoft.Azure.Devices.Client.InternalClient.d__56.MoveNext()
at IotClientDisposeNre.Program.
@gagwithgaffer which SDK version are you using?
Hi David,
Great to hear from you!
I'm have the following pakages in my .NET Core 7 WebApp project:
NuGet Microsoft.Azure.Devices.Client 1.41.3
Microsoft.Azure.Devices.Provisioning.Client 1.19.2
Microsoft.Azure.Devices.Provisioning.Security.Tpm 1.14.2
Microsoft.Azure.Devices.Provisioning.Transport.Amqp 1.16.4
So answer to your question on the start/stop requirements, ultimately I want the user to be able to indendantly start/stop the device connection to Azure Hub when they have a web page of the device open in their local browser, but then under normal operations, the web app project simply runs the connection to azure hub as a background service with the appropriate retry mechanisms kicking in i.e. like an unattended device.
If we imagine my IOT device is more like an integrations server, installed on site behind the customer's firewall. A maintenance enginer on site may want to temporarily start/stop the devcie connection to Azure (maybe they are testing something at the time or want to stop excess messages from hitting the cloud) Then finally when they are finshed with their work and leave the site, they would re-connect the device back to Azure and simply leave it running, allowing the automated retry mechanisms to run under its own steam.
I've managed to find your code sample for the V2_005 SDK so will set to work on this now and see what I can come up with, the V2 looks better suited for my environment as it's closer to the setup you advised me back in August i.e. creating callback handlers for message receive / twin updates etc.
I've done some fiddling with the V1 sample and have dervied a way to start/stop the connection on demand without throwing errors (not fully proven under all network error scenarios, but so far seems OK)
One of the annomolies discovered is when disposing the device client instance without setting it to null before starting the reconnection sample back up again, cuases an exception in the IF Statement, found in the InitializeAndSetupClientAsync() method. Disposing the client instance doesnt automatically set it to null and therefore the IF statement always returns true:
Below are the changes I made in order to get the sample to start/stop independantly from normal operations:
Only changes in this method was more for being able to use the TPM instead of connection strings:
[Update using ver.2-005 Preview SDK with latest Reconnection Sample]
I believe I've got the start/stop mechanism working OK with ver.2 SDK using the latest reconnection code sample.
The adjustments to the sample for ver.2 required a few changes but so far I can achieve the following test routines:
- Manually Start/Stop the device client connection. (multiple times over)
- Manually Start the connection, pull the network lead, retries kick in automatically, then restore the network, messages start sending again. Then maunally Stop the connection.
- Manually Start the connection, pull the network lead, retries kick in automatically, then I chose to abort the connection whilst the network is down (aborting the connection is essentially stopping it manually, but whilst it's running the retry mechanism) Once I restore the network, I manually Start the connection again, messages start sending again.
The new ConnectionStatus / ConnectionStatusChangeReason / RecommendedAction found in the latest code sample was very useful for testing to see what flags are raised under the various different conditions. I did have to modify ConnectionStatusChangeHandlerAsync method a little bit with some trial and error before I could perform the above test routines without throwing any unexpected/unhandled exceptions. Similiar to ver.1 SDK, I do have to set the s_deviceClient to null when manually stopping the connection, otherwise it borks the next time with some exceptions.
The only observation I'm not entirely sure on yet is a short while after manually stopping the connection, console messages are produced in the retry policy when it realises the semaphore has been disposed. I'm hoping this isnt an issue since the retry policy is purposefully catching this and nothing else seems to flag as an issue:
When stopping the connection, I immediately log a custom message: Web App IOT SERVER with id [1277d19b-aea1-4746-983c-53c45c83bd89] and name [IOT Server 1] has been closed gracefully.
Then shortly afterwards the retry policy will log the following:
Web App IOT SERVER with id [1277d19b-aea1-4746-983c-53c45c83bd89] and name [IOT Server 1] retry requested 1 but failed criteria for retry [with System.ObjectDisposedException:The semaphore has been disposed.], so giving up.
This above sequence has not proven any issues for me in how I wanted the connection to operate, its either connected to the hub or its not, depending on the network or whether the user has decided it to run or not.
I've put the whole reconnection sample into a transient C# interface, then call the Start/Stop methods from the .NET controller via buttons on a web page. Within the ConnectionStatusChangeHandlerAsync method I'm then calling a SignalR Hub in order to pass the status updates back to the web page for live monitoring.
The key part missing for me currently was the discovery that TPM device authentication doesn't seem to be supported yet on the curent preview SDK. hopefully this comes soon :)
Below is the code changes I made to the sample in order to get it to work with the current behaviour:
public async Task CloseDeviceConnection()
{
try
{
// Cancelling the cancellation token stops all processes responsible for
// connection to the Azure IOT Hub. The Device Client is also closed from
// this cancellation token, see method InitializeAndSetupClientAsync()
s_appCancellation!.Cancel();
if (s_deviceClient != null)
{
// You should dispose to avoid memory leaks
await s_deviceClient.DisposeAsync();
// If we dont set the client to null, exceptions
// are thrown when re-starting the device client.
// See conditional IF Statement inside method
// InitializeAndSetupClientAsync()
s_deviceClient = null!;
}
}
catch (Exception ex)
{
LogData = _ohiAppSource + " with id " + LogEventProperties.DeviceId + " and name " + LogEventProperties.DeviceName + " failed to close the device client connection to the Azure IOT Hub. Error returned with message [" + ex.Message + "]";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Error", "IOT Servers", "Disconnect from Azure Hub", LogEventProperties.DeviceId!, LogData, ex);
throw; // throw the error back to the client browser so the user can see the requuest to close the connection failed.
}
}
public async Task DeviceConnectionStartup()
{
s_appCancellation = new CancellationTokenSource();
try
{
await InitializeAndSetupClientAsync(s_appCancellation.Token);
await SendMessagesAsync(s_appCancellation.Token);
}
catch (OperationCanceledException) { } // User canceled the operation
catch (Exception ex)
{
LogData = _ohiAppSource + " with id " + LogEventProperties.DeviceId + " and name " + LogEventProperties.DeviceName + " encountered an unrecoverable exception, user action is required, so exiting: \n{ex}";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Error", "IOT Servers", "Azure Hub Connection Error", LogEventProperties.DeviceId!, LogData, ex);
s_appCancellation.Cancel();
}
if (s_deviceClient != null)
{
await s_deviceClient!.DisposeAsync();
}
// Do not use, original code sample but throws exceptions
// when manually re-starting the device client connection.
//s_initSemaphore.Dispose();
//s_appCancellation.Dispose();
}
// It is not generally a good practice to have async void methods, however, IotHubDeviceClient.ConnectionStatusChangeHandlerAsync() event handler signature
// has a void return type. As a result, any operation within this block will be executed unmonitored on another thread.
// To prevent multi-threaded synchronization issues, the async method InitializeClientAsync being called in here first grabs a lock before attempting to
// initialize or close the device client instance; the async method GetTwinAndDetectChangesAsync is implemented similarly for the same purpose.
private async void ConnectionStatusChangeHandlerAsync(ConnectionStatusInfo connectionInfo)
{
ConnectionStatus status = connectionInfo.Status;
ConnectionStatusChangeReason reason = connectionInfo.ChangeReason;
Console.WriteLine();
Console.WriteLine($"Connection status changed: status={status}, reason={reason}, recommendation={connectionInfo.RecommendedAction}");
var signalRMessage = new IotDeviceConnectionStatus
{
Id = LogEventProperties.DeviceId,
Name = LogEventProperties.DeviceName,
ConnectionStatus = status.ToString(),
ChangeReason = reason.ToString()
};
IndexModel.AzureIotHubConnectionStatus = status.ToString();
IndexModel.AzureIotHubConnectionStatusChangeReason = reason.ToString();
// Custom code.
if (status == ConnectionStatus.Connected
& reason == ConnectionStatusChangeReason.ConnectionOk)
{
// ----- SignalR & UI Updates -----
await _iotDeviceStatusHubContext.Clients.All.SendAsync("newMessage", signalRMessage);
// ----- Logging -----
LogData = _ohiAppSource + " with id [" + LogEventProperties.DeviceId + "] and name [" + LogEventProperties.DeviceName + "] changed its connection status to [CONNECTED]; all operations will be carried out as normal.";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Information", "IOT Servers", "Connection Status Changed", "Hub Connection Connected", LogData, null!);
// -------------------
}
// In our case, we can operate with more than 1 shared access key and attempt to fall back to a secondary.
// We'll disregard the SDK's recommendation and attempt to connect with the second one.
if (status == ConnectionStatus.Disconnected
&& reason == ConnectionStatusChangeReason.BadCredential
&& _deviceConnectionStrings!.Count > 1)
{
// When getting this reason, the current connection string being used is not valid.
// If we had a backup, we can try using that.
_deviceConnectionStrings.RemoveAt(0);
Console.WriteLine($"The current connection string is invalid. Trying another.");
await InitializeAndSetupClientAsync(s_appCancellation!.Token);
return;
}
// Otherwise, we follow the SDK's recommendation.
switch (connectionInfo.RecommendedAction)
{
case RecommendedAction.OpenConnection:
// We will only attempt to reconnect if the
// cancellation token has not been cancelled.
// If the token has been called it means the user
// has purposely chosen to abort the connection process
if (!s_appCancellation!.IsCancellationRequested)
{
Console.WriteLine($"Following recommended action of reinitializing the client.");
await InitializeAndSetupClientAsync(s_appCancellation!.Token);
}
else // Token is cancelled, forcefully close the connection.
{
if (s_deviceClient != null)
{
// You should dispose to avoid memory leaks
await s_deviceClient.DisposeAsync();
// If we dont set the client to null, exceptions
// are thrown when re-starting the device client.
// See conditional IF Statement inside method
// InitializeAndSetupClientAsync()
s_deviceClient = null!;
}
// ----- SignalR & UI Updates -----
await _iotDeviceStatusHubContext.Clients.All.SendAsync("newMessage", signalRMessage);
// ----- Logging -----
LogData = _ohiAppSource + " with id [" + LogEventProperties.DeviceId + "] and name [" + LogEventProperties.DeviceName + "] has aborted retrying the connection. The connection was aborted by the user.";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Warning", "IOT Servers", "Connection Status Changed", "Hub Connection Aborted", LogData, null!);
// -------------------
}
break;
case RecommendedAction.PerformNormally:
// Call GetTwinAndDetectChangesAsync() to retrieve twin values from the server once the connection status changes into Connected.
// This can get back "lost" twin updates in a device reconnection from status like Disconnected_Retrying or Disconnected.
await GetTwinAndDetectChangesAsync();
Console.WriteLine("The client has retrieved twin values after the connection status changes into CONNECTED.");
break;
case RecommendedAction.WaitForRetryPolicy:
// ----- SignalR & UI Updates -----
await _iotDeviceStatusHubContext.Clients.All.SendAsync("newMessage", signalRMessage);
// ----- Logging -----
LogData = _ohiAppSource + " with id [" + LogEventProperties.DeviceId + "] and name [" + LogEventProperties.DeviceName + "] has lost its connection with the Azure IOT Hub and is retying.";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Warning", "IOT Servers", "Connection Status Changed", "Hub Connection Retrying", LogData, null!);
// -------------------
break;
case RecommendedAction.Quit:
if (status == ConnectionStatus.Closed
& reason == ConnectionStatusChangeReason.ClientClosed)
{
// ----- SignalR & UI Updates -----
await _iotDeviceStatusHubContext.Clients.All.SendAsync("newMessage", signalRMessage);
// ----- Logging -----
LogData = _ohiAppSource + " with id [" + LogEventProperties.DeviceId + "] and name [" + LogEventProperties.DeviceName + "] has been closed gracefully.";
Console.WriteLine();
Console.WriteLine(LogData);
await _logExtensionRepo.WriteLogEvent<DeviceReconnectionRepo>("Warning", "IOT Servers", "Connection Status Changed", "Hub Connection Closed", LogData, null!);
// -------------------
}
s_appCancellation!.Cancel();
break;
}
}
@gagwithgaffer we appreciate your passion and enthusiasm, as well as feedback about the samples and v2 interface changes.
As for this comment about TPM:
The key part missing for me currently was the discovery that TPM device authentication doesn't seem to be supported yet on the curent preview SDK. hopefully this comes soon :)
We don't foresee adding TPM authentication back. There are security concerns on some platforms given the way in which the TPM is used that make it potentially worse than SAS auth. Instead, it is recommended first to use certificates and if not then connection strings.
Oh gosh, no TPM :( that turns my world upside down. erm OK. Challenge with certs is they expire and are cumbersome for any end user to use, most customers using IOT products propably wont even understand what a certificate is, let alone how to use one.
Well I guess i have a new challegne to think about, I'll spend some time researching it all. Thanks
I'm sorry to disappoint you. We don't love losing TPM but we also want customers to be secure.
If we replaced this it would likely be with certificates stored in a TPM, so still certificates in the end.