pothosware/SoapySDRPlay3

SDRPlay 3.08 ABI breakage

gvanem opened this issue · 12 comments

The latest API from SDRPlay (released 5 days) has a change that causes a command like SoapySDRUtil.exe --find to crash
deep inside SoapySDR::Device::enumerate()'s destructor. The call-stack is huge.

These are diffs for sdrplay_api.h:

--- SDRplay-1.4.2/API/inc/sdrplay_api.h       2019-12-18 10:59:18
+++ SDRplay-1.41-beta2/API/inc/sdrplay_api.h  2021-08-21 10:51:16
@@ -21,7 +21,7 @@

 // Application code should check that it is compiled against the same API version
 // sdrplay_api_ApiVersion() returns the API version
-#define SDRPLAY_API_VERSION                   (float)(3.07)
+#define SDRPLAY_API_VERSION                   (float)(3.08)

 // API Constants
 #define SDRPLAY_MAX_DEVICES                   (16)
@@ -145,6 +145,7 @@
     unsigned char hwVer;
     sdrplay_api_TunerSelectT tuner;
     sdrplay_api_RspDuoModeT rspDuoMode;
+    unsigned char valid;
     double rspDuoSampleFreq;
     HANDLE dev;
 } sdrplay_api_DeviceT;
@@ -186,7 +187,7 @@
 typedef sdrplay_api_ErrT        (*sdrplay_api_Uninit_t)(HANDLE dev);
 typedef sdrplay_api_ErrT        (*sdrplay_api_Update_t)(HANDLE dev, sdrplay_api_TunerSelectT tuner, sdrplay_api_ReasonForUpdateT reasonForUpdate, sdrplay_api_ReasonForUpdateExtension1T reasonForUpdateExt1);
 typedef sdrplay_api_ErrT        (*sdrplay_api_SwapRspDuoActiveTuner_t)(HANDLE dev, sdrplay_api_TunerSelectT *tuner, sdrplay_api_RspDuo_AmPortSelectT tuner1AmPortSel);
-typedef sdrplay_api_ErrT        (*sdrplay_api_SwapRspDuoDualTunerModeSampleRate_t)(double *currentSampleRate);
+typedef sdrplay_api_ErrT        (*sdrplay_api_SwapRspDuoDualTunerModeSampleRate_t)(double *currentSampleRate, double newSampleRate);
 typedef sdrplay_api_ErrT        (*sdrplay_api_SwapRspDuoMode_t)(sdrplay_api_DeviceT *currDevice, sdrplay_api_DeviceParamsT **deviceParams,
                                                                 sdrplay_api_RspDuoModeT rspDuoMode, double sampleRate, sdrplay_api_TunerSelectT tuner,
                                                                 sdrplay_api_Bw_MHzT bwType, sdrplay_api_If_kHzT ifType, sdrplay_api_RspDuo_AmPortSelectT tuner1AmPortSel);
@@ -217,7 +218,7 @@
     _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_Uninit(HANDLE dev);
     _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_Update(HANDLE dev, sdrplay_api_TunerSelectT tuner, sdrplay_api_ReasonForUpdateT reasonForUpdate, sdrplay_api_ReasonForUpdateExtension1T reasonForUpdateExt1);
     _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_SwapRspDuoActiveTuner(HANDLE dev, sdrplay_api_TunerSelectT *currentTuner, sdrplay_api_RspDuo_AmPortSelectT tuner1AmPortSel);
-    _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_SwapRspDuoDualTunerModeSampleRate(HANDLE dev, double *currentSampleRate);
+    _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_SwapRspDuoDualTunerModeSampleRate(HANDLE dev, double *currentSampleRate, double newSampleRate);
     _SDRPLAY_DLL_QUALIFIER sdrplay_api_ErrT        sdrplay_api_SwapRspDuoMode(sdrplay_api_DeviceT *currDevice, sdrplay_api_DeviceParamsT **deviceParams,
                                                                               sdrplay_api_RspDuoModeT rspDuoMode, double sampleRate, sdrplay_api_TunerSelectT tuner,
                                                                               sdrplay_api_Bw_MHzT bwType, sdrplay_api_If_kHzT ifType, sdrplay_api_RspDuo_AmPortSelectT tuner1AmPortSel);

The sdrplay_api_DeviceT structure is now 1 byte longer (unsigned char valid).

Perhaps the SoapySDRPlayX code could check it's size at runtime to account for this breakage?

@gvanem - thanks for reporting the problem; since my main development platform is Linux, I haven't had the chance to test the new version of the API (I just checked the downloads section on SDRplay's web site, and for Linux they still have version 3.07).

Anyway, I think that your scenario should have at least triggered this warning (https://github.com/pothosware/SoapySDRPlay3/blob/master/sdrplay_api.cpp#L49):

SoapySDR_logf(SOAPY_SDR_WARNING, "sdrplay_api version: '%.3f' does not equal build version: '%.3f'", ver, SDRPLAY_API_VERSION);

If you don't mind please check your logs to see if that warning is there before SoapySDRUtil crashes; if it is, I think I should change it from a simple warning to a fatal error, to avoid running with an API that doesn't match the version that was used at compile time.

Unfortunately I don't think that checking for the size of that structure at runtime is possible in C/C++, since the sizeof operator is evaluated at compile time (I think), and therefore a check on the size of that structure would always return true; another problem that I can see is if the API gets changed in a way that maintains the size of the structure (for instance, imagine another version of the API where the new field valid is placed after the field rspDuoSampleFreq, instead of before); in this case it would be harder to detect and address the problem.

If you do find that warning above in your logs, please let me know, and tonight after work I'll make the change to the SoapySDRPlay3 driver to handle that as a fatal error instead.

Franco

If you don't mind please check your logs to see if that warning is there before SoapySDRUtil crashes

So where does Soapy write log-files? The only thing I have from that old build is:

ModLoad: 64a60000 64a8b000   F:\gv\dx-radio\SoapySDR\lib\SoapySDR\modules0.8\SdrPlay3Support.dll
ModLoad: 64a50000 64a5d000   f:\gv\VC_2019\bin\sdrplay_api.dll
ModLoad: 64390000 646b2000   F:\gv\dx-radio\SoapySDR\lib\SoapySDR\modules0.8\UHDSupport.dll
ModLoad: 648e0000 64a49000   F:\gv\dx-radio\SoapySDR\lib\SoapySDR\modules0.8\volk.dll
ModLoad: 648d0000 648dc000   F:\gv\dx-radio\SoapySDR\lib\SoapySDR\modules0.8\VolkConverters.dll
sdrplay_api_Opensdrplay_api: sdrplay_api_Open: ERROR: Could not open file mapping object[ERROR]
sdrplay_api_Open() Error: sdrplay_api_Fail
[ERROR] Please check the sdrplay_api service to make sure it is up. If it is up, please restart it.
(36c8.2d28): C++ EH exception - code e06d7363 (first chance)
(36c8.2d28): C++ EH exception - code e06d7363 (first chance)
[ERROR] SoapySDR::Device::enumerate(sdrplay) sdrplay_api_Open() failed
(36c8.2d28): C++ EH exception - code e06d7363 (first chance)
(36c8.2d28): C++ EH exception - code e06d7363 (first chance)
Error making device: no available RSP devices found
SoapySDR::unloadModules(): unloadModules()
ModLoad: 74980000 7498f000   C:\WINDOWS\SysWOW64\kernel.appcore.dll
(36c8.2d28): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=19930520 ebx=1d035693 ecx=00d59fa8 edx=64a833bc esi=65259510 edi=00d59f48
eip=65259575 esp=008ff948 ebp=008ff95c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
MSVCP140!`anonymous namespace'::_ExceptionPtr_normal::_Destroy+0x65:
65259575 837a0400        cmp     dword ptr [edx+4],0  ds:002b:64a833c0=????????

The above is in the destructor. I think maybe the reason for the crash was that the SDRPlay_APIservice was not started.
Hard to tell.

@gvanem - I see the message you posted shows that the error occurs in sdrplay_api_Open(), so I think it is in the constructor; as you suggested, it is probably due to the fact that the service sdrplay_api_service is not running for some reason.

The errors/warnings should be displayed on the console/terminal when you run SoapySDRUtil, but they require that the service is up first.

Franco

As I mentioned this morning, I just created a new branch called API-version-mismatch (https://github.com/pothosware/SoapySDRPlay3/tree/API-version-mismatch).
That only code change in that branch is that an API version mismatch between the API version at compile time and the actual API version at runtime is flagged as a fatal error.

Please give it a try when you have sometime and, if it solves your problem, I'll push this simple change into the master branch.

Franco

Please give it a try when you have sometime and

Tried it. I did this:

  • Copied the old sdrplay_api.dll v 3.07 to my PATH (the new sdrplay_api.dll v 3.08 not on PATH).
  • Deleted SdrPlay2Support.dll from my SOAPY_SDR_PLUGIN_PATH thinking that it's interfering.
  • Ran cdb -c g SoapySDRUtil.exe --find (MS console debugger). Still is a crash at the same place:
...
Found Rafael Micro R820T tuner
[ERROR] ApiVersion Error: sdrplay_api_ServiceNotResponding    << !! before your patch
sdrplay_api_Close(cd8.298c): C++ EH exception - code e06d7363 (first chance)
(cd8.2888): C++ EH exception - code e06d7363 (first chance)
[ERROR] SoapySDR::Device::enumerate(sdrplay) ApiVersion() failed
Found device 0
  driver = netsdr
  label = RFspace NetSDR SN 31351F39
  name = AFEDRI-SDR
  netsdr = 10.0.0.50:50000
  serial = 31351F39

Found device 1
  driver = rtlsdr
  label = Generic RTL2832U OEM :: 00000001
  manufacturer = Silver
  product = RTL2838UHIDIR
  serial = 00000001
  tuner = Rafael Micro R820T
SoapySDR::unloadModules(): unloadModules()
(cd8.2888): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=19930520 ebx=3003c06b ecx=0e68b0d8 edx=645033fc esi=64d49510 edi=0e68b078
eip=64d49575 esp=012ff310 ebp=012ff324 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
MSVCP140!`anonymous namespace'::_ExceptionPtr_normal::_Destroy+0x65:
64d49575 837a0400        cmp     dword ptr [edx+4],0  ds:002b:64503400=????????
64d49575 837a0400        cmp     dword ptr [edx+4],0  ds:002b:64503400=????????
0:000> kp
ChildEBP RetAddr
012ff324 64d4948d MSVCP140!`anonymous namespace'::_ExceptionPtr_normal::_Destroy(void)+0x65
012ff330 64d490a9 MSVCP140!std::_Ref_count_base::_Decref(void)+0x1e
(Inline) -------- MSVCP140!std::_Ptr_base<_EXCEPTION_RECORD const >::_Decref+0xf
(Inline) -------- MSVCP140!std::shared_ptr<_EXCEPTION_RECORD const >::{dtor}+0xf
012ff33c 64d4985d MSVCP140!std::shared_ptr<_EXCEPTION_RECORD const >::`scalar deleting destructor'(void)+0x12
*** WARNING: Unable to verify checksum for F:\gv\dx-radio\SoapySDR\bin\SoapySDR.dll
012ff348 64de1120 MSVCP140!__ExceptionPtrDestroy(void * _Ptr = 0x0d0e77e4)+0xd
012ff358 64ddf5f6 SoapySDR!std::exception_ptr::~exception_ptr(void)+0x10

Seems not related to the API mismatch. But this:

static SoapySDR::Registry registerSDRPlay("sdrplay", &findSDRPlay, &makeSDRPlay, SOAPY_SDR_ABI_VERSION);

And yes, the service (sdrplay_apiService.exe) is running contrary to the code sdrplay_api_ServiceNotResponding above.

I'll try to build an unoptimized _DEBUG version of everything.

PS. here is the complete edited call-stack: crash.txt
Some issue with cache; in SoapySDR::Device::enumerate()?

@gvanem - thanks for giving it a try.

This evening I booted my PC into Windows 10 (I have a dual boot PC that most of the time I use under Linux), ran all the latest Window updates, and then ran the scenario that I think you have there:

  • first I installed the old SDRplay API and HW driver (version 3.07)
  • under that version I compiled and installed SoapySDR and the module SoapySDRPlay3 (using the API-version-mismatch branch), and ran the following two commands:

Microsoft Windows [Version 10.0.19043.1165]
(c) Microsoft Corporation. All rights reserved.

C:\Users\franco>SoapySDRUtil.exe --info
######################################################

Soapy SDR -- the SDR abstraction library

######################################################

Lib Version: v0.8.1-g1cf5a539
API Version: v0.8.0
ABI Version: v0.8
Install root: C:\Program Files\SoapySDR
Search path: C:\Program Files\SoapySDR/lib/SoapySDR/modules0.8
Module found: C:\Program Files\SoapySDR/lib/SoapySDR/modules0.8/sdrPlaySupport.dll (0.3.0)
Available factories... sdrplay
Available converters...

  • CF32 -> [CF32, CS16, CS8, CU16, CU8]
  • CS16 -> [CF32, CS16, CS8, CU16, CU8]
  • CS32 -> [CS32]
  • CS8 -> [CF32, CS16, CS8, CU16, CU8]
  • CU16 -> [CF32, CS16, CS8]
  • CU8 -> [CF32, CS16, CS8]
  • F32 -> [F32, S16, S8, U16, U8]
  • S16 -> [F32, S16, S8, U16, U8]
  • S32 -> [S32]
  • S8 -> [F32, S16, S8, U16, U8]
  • U16 -> [F32, S16, S8]
  • U8 -> [F32, S16, S8]

C:\Users\franco>SoapySDRUtil.exe --probe
######################################################

Soapy SDR -- the SDR abstraction library

######################################################

Probe device
[INFO] devIdx: 0
[INFO] hwVer: 4
[INFO] rspDuoMode: 0
[INFO] tuner: 1
[INFO] rspDuoSampleFreq: 0.000000


-- Device identification

driver=SDRplay
hardware=RSPdx
sdrplay_api_api_version=3.070000
sdrplay_api_hw_version=4


-- Peripheral summary

Channels: 1 Rx, 0 Tx
Timestamps: NO
Other Settings:
* RF Gain Select - RF Gain Select
[key=rfgain_sel, default=4, type=string, options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)]
* IQ Correction - IQ Correction Control
[key=iqcorr_ctrl, default=true, type=bool]
* AGC Setpoint - AGC Setpoint (dBfs)
[key=agc_setpoint, default=-30, type=int, range=[-60, 0]]
* BiasT Enable - BiasT Control
[key=biasT_ctrl, default=true, type=bool]
* RfNotch Enable - RF Notch Filter Control
[key=rfnotch_ctrl, default=true, type=bool]
* DabNotch Enable - DAB Notch Filter Control
[key=dabnotch_ctrl, default=true, type=bool]


-- RX Channel 0

Full-duplex: NO
Supports AGC: YES
Stream formats: CS16, CF32
Native format: CS16 [full-scale=32767]
Antennas: Antenna A, Antenna B, Antenna C
Corrections: DC removal
Full gain range: [0, 66] dB
IFGR gain range: [20, 59] dB
RFGR gain range: [0, 27] dB
Full freq range: [0.001, 2000] MHz
RF freq range: [0.001, 2000] MHz
CORR freq range: MHz
Sample rates: 0.0625, 0.096, 0.125, 0.192, 0.25, ..., 6, 7, 8, 9, 10 MSps
Filter bandwidths: 0.2, 0.3, 0.6, 1.536, 5, 6, 7, 8 MHz

As you can see, no problems here.

I then downloaded the latest API for Windows from SDRplay (version 3.08), installed it on top of the previous one and ran SoapuSDRUtil.exe --probe again (without recompiling or rebuilding anything), and this time I received the error message about the mismatching API version:

C:\Users\franco>SoapySDRUtil.exe --probe
######################################################

Soapy SDR -- the SDR abstraction library

######################################################

Probe device
[ERROR] sdrplay_api version: '3.080' does not equal build version: '3.070'
[ERROR] Please rebuild the SoapySDRPlay module using SDRplay API version 3.080.
[ERROR] SoapySDR::Device::enumerate(sdrplay) API version mismatch
Error probing device: no available RSP devices found

This is exactly the behavior I would expect to see in the scenario that you have there, where the version of the SDRplay API at compile time does not match the version of the SDRplay API at run time.

Franco

This is exactly the behavior I would expect to see in the scenario that you have there

Yes, yes. I've no problem with that code. But here are my steps once more:

  • Copied the v3.07 sdrplay_api.dll to my PATH.
  • Ran cdb -c g SoapySDRUtil.exe --probe, which reports the ABI error okay:
  • [ERROR] sdrplay_api version: '3.070' does not equal build version: '3.080'
  • But then the crash at exit; inside Factory.cpp.
  • CDB's call-stack clearly mentions:
    SoapySDR!`SoapySDR::Device::enumerate::`2::`dynamic atexit destructor for 'cache''(void)

But the crash seems to occur in the static destructor of cache. I'm not so into the C++/STL code.
So the author of this ought to come forward and explain. @guruofquality Josh, are there?

I think maybe the core issue is that the sdrplay_api instance is a singleton.
And this cache and the order of destructions are undefined when a throw std::runtime_error() is done.
But again, I'm not so into the C++/STL code.

Edit: The old 3.7 sdrplay_api.dll uses MSVCR90.dll. The new 3.08 uses VCRUNTIME140.dll. Related?
And replacing those throw calls with a trace + return, I got no crashes?!

@gvanem - thanks for the detailed analysis of this problem.

Based on your conclusions, I think you should be able to observe the same behavior even when the sdrplay_api_service is not running (even in the case when the version of the SDRplay API at compile time matches the runtime version of the SDRplay API), since in that case the SoapySDRPlay module will throw a similar runtime error (see here: https://github.com/pothosware/SoapySDRPlay3/blob/master/sdrplay_api.cpp#L38).

As per using an exception to signal a fatal error in the SoapySDRPlay driver, we had a discussion about it in the past - see this comment: #16 (comment) and related comments in the same thread.

Finally I am not sure what exactly we are trying to "fix" here: since having a different version of the SDRplay API at runtime is already a fatal unrecoverable error, which requires rebuilding the SoapySDRPlay3 module, once the module throws a runtime error, and the client program that invokes it exits/aborts, there's not much else that can be done IMHO.

Franco

Finally I am not sure what exactly we are trying to "fix" here:

Fix the horrid C++ code to avoid a crash?
Here is the patch I used to fix this:

--- a/SoapySDRPlay.hpp 2021-06-30 14:13:02
+++ b/SoapySDRPlay.hpp 2021-08-28 15:31:42
@@ -356,6 +356,7 @@

     private:
         static float ver;
+        static bool valid;
         sdrplay_api();

     public:
--- a/sdrplay_api.cpp 2020-12-31 17:47:40
+++ b/sdrplay_api.cpp 2021-08-28 15:31:07
@@ -25,36 +25,43 @@
 #include "SoapySDRPlay.hpp"

 float SoapySDRPlay::sdrplay_api::ver = 0.0;
+bool  SoapySDRPlay::sdrplay_api::valid = false;

 // Singleton class for SDRplay API (only one per process)
 SoapySDRPlay::sdrplay_api::sdrplay_api()
 {
     sdrplay_api_ErrT err;
+
     // Open API
     err = sdrplay_api_Open();
     if (err != sdrplay_api_Success) {
         SoapySDR_logf(SOAPY_SDR_ERROR, "sdrplay_api_Open() Error: %s", sdrplay_api_GetErrorString(err));
         SoapySDR_logf(SOAPY_SDR_ERROR, "Please check the sdrplay_api service to make sure it is up. If it is up, please restart it.");
-        throw std::runtime_error("sdrplay_api_Open() failed");
+        return;
     }

     // Check API versions match
     err = sdrplay_api_ApiVersion(&ver);
     if (err != sdrplay_api_Success) {
-        SoapySDR_logf(SOAPY_SDR_ERROR, "ApiVersion Error: %s", sdrplay_api_GetErrorString(err));
+        SoapySDR_logf(SOAPY_SDR_ERROR, "sdrplay_api_ApiVersion() Error: %s", sdrplay_api_GetErrorString(err));
         sdrplay_api_Close();
-        throw std::runtime_error("ApiVersion() failed");
+        return;
     }
     if (ver != SDRPLAY_API_VERSION) {
-        SoapySDR_logf(SOAPY_SDR_WARNING, "sdrplay_api version: '%.3f' does not equal build version: '%.3f'", ver, SDRPLAY_API_VERSION);
+        SoapySDR_logf(SOAPY_SDR_ERROR, "sdrplay_api version: '%.3f' does not equal build version: '%.3f'", ver, SDRPLAY_API_VERSION);
+        sdrplay_api_Close();
+        return;
     }
+    valid = true;
 }

 SoapySDRPlay::sdrplay_api::~sdrplay_api()
 {
-    sdrplay_api_ErrT err;
+    if (!valid)
+       return;
+
     // Close API
-    err = sdrplay_api_Close();
+    sdrplay_api_ErrT err = sdrplay_api_Close();
     if (err != sdrplay_api_Success) {
         SoapySDR_logf(SOAPY_SDR_ERROR, "sdrplay_api_Close() failed: %s", sdrplay_api_GetErrorString(err));
     }

Works for me. Josh needs to comment on what's the correct thing to do.

@gvanem - first of all I apologize that my comment earlier about the problem to fix came out too strong.

I am afraid that your proposed changes would create more problems than they are trying to fix.

Imagine the scenario where the service sdrplay_api_service is down (no API version difference), and therefore the API call sdrplay_api_Open() fails: in that case your code for the constructor would just return without throwing any error, and therefore the rest of the SoapySDRPlay3 module would believe that the constructor succeeded, and try to call the many other SDRplay API functions with the consequences you can imagine.

For instance, take a look at this piece of code from the source file Registration.cpp (lines 45-48):

   SoapySDRPlay::sdrplay_api::get_instance();
   sdrplay_api_LockDeviceApi();
   sdrplay_api_DeviceT rspDevs[SDRPLAY_MAX_DEVICES];
   sdrplay_api_GetDevices(&rspDevs[0], &nDevs, SDRPLAY_MAX_DEVICES);

if the call to sdrplay_api_Open() were to fail for some reason, but the constructor (and therefore the get_instance() method) were to return, then the code would try to call the API functions sdrplay_api_LockDeviceApi() first and then sdrplay_api_GetDevices(), and so on, which would all fail.

Also you may want to look at this topic in the C++ FAQ about the recommended way to handle constructors that fail: https://isocpp.org/wiki/faq/exceptions#ctors-can-throw

Franco

Okay so my fix is wrong. Then some other C++ expert could propose something better.