brunocodutra/ring-channel

Can't get consistent behavior

najamelan opened this issue · 2 comments

Hi, I'm running some integration tests with ring_channel in it, and I can't get a consistent result here unless I add a sleep:

use ring_channel::*;
use futures::{prelude::*};
use std::{num::NonZeroUsize, thread, sync::{ Arc, Mutex } };

#[async_std::main]
//
async fn main()
{
   let lock  = Arc::new(Mutex::new( () ));
   let lock2 = lock.clone();

   let (mut tx, mut rx) = ring_channel( NonZeroUsize::new(1).unwrap() );

   thread::spawn( move ||
   {
      async_std::task::block_on( async
      {
         let _guard = lock.lock();

         for i in 0..1000
         {
            tx.send( i ).unwrap();
         }
      });
   });

   // std::thread::sleep( std::time::Duration::from_millis(100) );

   let mut count = 0;

   while let Some(num) = rx.next().await
   {
      let _guard = lock2.lock();
      count += num;
   }

   assert_eq!( 999, count );
}

The count will often be over 1000 when not sleeping here. I don't understand why as for me the sequence of events is:

    • if rx.next().await runs first, should block as the channel is empty
    • if the sender runs first, it will send everything at once, protected by mutex. If in the mean time the rx has pulled the first item out and is waiting on the mutex, it should only have the first value (0), then block waiting for the sender to finish and then pull 999 out the buffer.
  1. sender gets lock first
  2. sender sends all numbers up to 999
  3. now rx get's to run and should only see 999 as the channel has bounded size of 1.

What do you make of it? Maybe it's nothing to do with ringchannel but I'm missing some subtle timing issue of the mutexes. Just trying to understand exactly how this is possible.

if the sender runs first, it will send everything at once, protected by mutex. If in the mean time the rx has pulled the first item out and is waiting on the mutex, it should only have the first value (0), then block waiting for the sender to finish and then pull 999 out the buffer.

I think what may be throwing you off are the bits I highlighted. The mutex certainly prevents that the receiving thread increments count, but it does not prevent that it pulls a value from the channel and stores it in num. The mutex thus doesn't tell us much about when the receiving thread actually pulls the first value from the channel, other than it happens somewhere between the sending thread sends 0 and 999 and that can be any value.

Let's look into the two main variants of the execution order in more detail.

  1. Sending thread reaches lock.lock() and acquires the lock before the receiving thread reaches rx.next().await;
    a. Sending thread reaches tx.send( i ).unwrap() and starts sending values between 0 and 999;
    b. Receiving thread reaches rx.next().await and pulls the current value between 0 and 999 from the channel into num;
    c. Receiving thread reaches lock2.lock() and blocks waiting for the lock to be released;
    d. Sending thread sends 999 and releases the lock;
    e. Receiving thread acquires the lock, increments count by num;
    f. Receiving thread pulls 999 from the channel, increments count by it and exits the loop;

  2. Receiving thread reaches rx.next().await and blocks (in the asynchronous sense) before the sending thread acquires the lock and reaches tx.send( i ).unwrap().
    a. Sending thread reaches lock.lock() and acquires the lock;
    b. Sending thread reaches tx.send( i ).unwrap() and starts sending values between 0 and 999;
    c. Receiving thread awakes and pulls the current value between 0 and 999 from the channel into num;
    d. Receiving thread reaches lock2.lock() and blocks waiting for the lock to be released;
    e. Sending thread sends 999 and releases the lock;
    f. Receiving thread acquires the lock, increments count by num;
    g. Receiving thread pulls 999 from the channel, increments count by it and exits the loop;

Please let me know if this helps to clarify it.

Thanks alot for clarifying. I feel so dum now... ;-)

At least it's good to know it's not ring-channel doing something inconsistent. I will have to write my tests somewhat different. Thanks again for taking the time to look into this.