Use of uninitialized variables after commit 099256f
johanhedin opened this issue · 8 comments
The 099256f commit cause use of uninitialized variables in some cases and introduce potential thread synchronization issues.
Explanation:
A call to airspy_close()
will call airspy_stop_rx()
. airspy_stop_rx()
calls kill_io_threads()
and the code in kill_io_threads()
will join the two internal worker threads. The problem is that the threads are only created by a call to airspy_start_rx()
. So a use case like this:
airspy_open_sn()
airspy_version_string_read()
airspy_close()
(i.e. using the lib without start streaming) will cause the use of uninitialized variables here:
airspyone_host/libairspy/src/airspy.c
Lines 532 to 533 in d309c71
On for example a Raspberry Pi 4 with latest Raspberry Pi OS (Bookworm) the calls to pthread_join()
with uninitialized thread handles does not cause anything to happen but on for example a x86 machine with Fedora 39 pthread_join()
will segfault the program.
Before 099256f device->streaming
guarded against this but the changes by the commit alters the behavior.
I also have a strong feeling (the closest to "knowing" when it comes to thread synchronization...) that the change from device->stop_requested = true
to device->streaming = false
in the worker threads can/will cause other race like problems.
The old code had a clean separation between what thread set what variable. device->stop_requested
was only set from inside the worker threads to indicate a USB problem or stop by request from callback and device->streaming
was only set from the thread calling the high level api (airspy_{open,start_rx,stop_rx...}) to intentionally start/stop streaming.
I tried to come up with a simple pull request to fix this but it felt like going down several threaded rabbit holes...
I don't know what parts of the commit that fixed the Windows libpthread issue and what parts that cleaned up the streaming logic (or if they are the same), but I would suggest going back to the old way regarding device->stop_requested/device->streaming
to fix this if possible.
Did you read the actual code and go through the "rabbit hole" or do you just "think" it should be "better" this way?
More precisely, did you read this line?
https://github.com/airspy/airspyone_host/blob/master/libairspy/src/airspy.c#L832
Did you read the actual code and go through the "rabbit hole" or do you just "think" it should be "better" this way? More precisely, did you read this line?
https://github.com/airspy/airspyone_host/blob/master/libairspy/src/airspy.c#L832
Realized the wording "uninitialized" is wrong on my side. I should have expressed it as "calling ptherad_jonin() with a thread id of zero causes segfault".
I am yet to see a segfault with that code. pthread_join returns ESRCH with thread_id 0, which is perfectly fine.
The following program segfaults on CentOS 6, CentOS 7 and Fedora 39 but works on Raspberry Pi OS (Bookworm and Buster). So not all pthread implementations on all platforms handles a thread id of 0 to pthread_join()
as ESRCH.
#include <pthread.h>
int main(int argc, char **argv) {
pthread_t thread = 0;
pthread_join(thread, NULL);
return 0;
}
Compiled with:
$ gcc -o out segfault.c -lpthread
Can't reproduce on any up to date Linux on any CPU I tested (x86, x86_64, ARMHF, AARCH64.) Maybe you just highlighted a problem in your distro? I'm sure you can find more broken Linux stuff if you bring a distro from the 90s.
Yet, the "fix" you proposed still doesn't work because of the many possible ways libusb can fail. Someone has to go through that rabbit hole and do the actual investigation work. Maybe a simple check can work for the fossilized distros. Or maybe we should exclude them to keep a sane code base.
Try it: https://onlinegdb.com/VQarzj8G-
The reference to the old CentOS distros was just to show that the behavior of pthread_join()
has been consistent for a very long time in the RHEL-line of distros. Fedora 39 is a recent distro and the issue I'm raising is not about fossilized distros.
As far as I can read, passing anything other than a joinable thread id as the first argument to pthread_join()
is undefined behavior. Distros/implementations have apparently made different choices how to handle this (Debian/Ubuntu: return ESRCH, RHEL-line: segfault) . Nothing odd about that but code using pthread should be written with this in mind. In this case not calling pthread_join()
with a invalid/NULL thread id or a thread id that was previously joined.
I'll create a PR with a check.
This should be it.
Tnx.