boostorg/fiber

unbuffered_channel push hangs

BenKaufmann opened this issue · 1 comments

Starting with boost-version 1.75.0 (more precisely since the merge of #261), the following simple example hangs because fiber p1 never returns from the push(12) call:

#include <boost/fiber/operations.hpp>                                           
#include <boost/fiber/condition_variable.hpp>                                   
#include <boost/fiber/unbuffered_channel.hpp>                                   
                                                                                
#include <stdio.h>                                                              
                                                                                
int main()                                                                      
{                                                                               
  boost::fibers::unbuffered_channel<int> channel;                               
  boost::fibers::condition_variable      cond;                                  
  boost::fibers::mutex                   mutex;                                 
  boost::fibers::fiber                   consumer(boost::fibers::launch::dispatch, [&]() {
    for (int v; channel.pop(v) == boost::fibers::channel_op_status::success; )  
    {                                                                           
      boost::this_fiber::yield();                                               
      cond.notify_one();                                                        
    }                                                                           
  });                                                                           
  boost::fibers::fiber                   p1(boost::fibers::launch::post, [&]() {
    printf("PF: push(12)\n");                                                   
    channel.push(12);                                                           
    printf("PF: done\n");                                                       
  });                                                                           
  printf("MF: push(22)\n");                                                     
  channel.push(22);                                                             
  printf("MF: wait\n");                                                         
  std::unique_lock lock(mutex);                                                 
  cond.wait(lock);                                                              
  printf("MF: join PF\n");                                                      
  p1.join();                                                                    
  channel.close();                                                              
  consumer.join();                                                              
}

The problem seems to be due to an outdated waker in unbuffered_channel::push():

  • Once fiber p1 enters unbuffered_channel::push(12), it immediately creates a slot and waker with epoch E1.
  • The call to try_push() then returns false because the consumer has not yet popped the 22
  • Hence, waiting_producers_.suspend_and_wait( lk, active_ctx); is called, which creates another waker with epoch E2
  • The consumer fiber now pops the first value thereby scheduling p1 from the waiting_producers_. It then boost::this_fiber::yield()s back to p1.
  • Fiber p1 now again calls try_push(), which succeeds and p1 is suspended until the slot is popped. However, the waker that is stored in the pushed slot is already outdated at this point!
  • Eventually, the consumer pops the slot that was pushed by p1 and calls s->w.wake(); on it. However, given that the waker is outdated, p1 is never scheduled and hangs forever.

As far as I can see, one could fix the issue by delaying the creation of the waker that is associated with a pushed slot.
I.e. one would create the slot without a waker and instead call s.w = active_ctx->create_waker(); only right before the call to try_push().

olk commented

ty