Segv when dtor called on too big channelImpl deferred list
Closed this issue · 1 comments
Hello,
This is one of my first public issues, so I'm sorry if it's not correctly formatted or presented.
Describe the bug
In a multi-threaded producer-consumer case (all amqp-cpp code is executed in consummer), when enqueuing lots of tasks thus lots of deferred objects are registered, if the channelImpl destructor is called before they are handled and free'd, there can be a stack overflow.
Expected behavior and actual behavior
The destruction of the deferred list should not throw a stack-overflow / segv error.
Sample code
I am using uvw (c++ wrapper around libuv) and have the following setup :
- a "module" thread that can send tasks (std::function) to the event loop (and wake it up)
- a "communicator" thread (libuv loop) that when waked up (queue or socket event) handles all enqueued tasks then the event
When the module enqueues ~1000000 publishes the communicator tries to handle it but mainly dequeues a lot of tasks before handling a socket event. Then the module does some work and sends a stop loop task to the communicator. The module waits for the communicator to dequeue, stop the loop and the thread. Once the communicator thread is stopped the module exits.
The crash occurs at this point: since the defered list is destroyed, the whole deferred list is recursively destroyed. And since the list is very very big in this particular case, a stack overflow happens.
To validate this theory I added the following code in include/amqp/Deferred.h's destructor to destroy the list iteratively by increasing the reference count of the smart pointer next to the next of the current item and disasociating the next element from the list to avoid the previous recursion issue :
virtual ~Deferred()
{
// report to the finalize callback
if (_finalizeCallback) _finalizeCallback();
# code aded begins here
while (_next) {
if (_next.use_count() == 1) {
auto surNext = _next->_next;
_next->_next = nullptr;
_next = surNext;
}
}
# code aded ends here
}
Here is my project link to the specific commit that generated the issue (for reproductability) :
https://github.com/Dominique57/amqp-libuv/tree/228fc7be4548ff38ad82a3643e3f932d7b33b9fb
Since it's a very specific use-case I don't know if it warrants the presented fix (especially using "use_count").
Also i wanted to give props to the maintainers and contributors of the library, it's a very cool project that is very fun to work with !
The objects are not thread safe. You cannot run your event loop in one thread, and at the same time call methods on a connection or channel object from a different thread. You should do everything in the same thread.