disruptor-net/Disruptor-net

High CPU usage in MultiProducerSequencer.NextInternal(int)

shenzhigang opened this issue · 7 comments

In the production environment, I found 100% CPU usage, and then I DUMP 4 time points, found that thread 29 occupied abnormal time, and the MultiProducerSequencer.NextInternal function is seen in the stack.
In addition, in the AggressiveSpinWait.SpinOne(), I wonder why Thread.Sleep(0) is used instead of Thread.Sleep(1).
Thanks.

The following is the stack of thread 29 at 4 time points.

0:000> ~29e!clrstack -a
OS Thread Id: 0x32e0 (29)
        Child SP               IP Call Site
000000cb4dd3f3d8 00007fff8c956964 [InlinedCallFrame: 000000cb4dd3f3d8] System.Threading.Thread.YieldInternal()
000000cb4dd3f3d8 00007fff7b4466c8 [InlinedCallFrame: 000000cb4dd3f3d8] System.Threading.Thread.YieldInternal()
000000cb4dd3f3b0 00007fff7b4466c8 DomainNeutralILStubClass.IL_STUB_PInvoke()

000000cb4dd3f460 00007fff24d55f6b Disruptor.AggressiveSpinWait.SpinOnce()
    PARAMETERS:
        this (<CLR reg>) = 0x000000cb4dd3f4b8

000000cb4dd3f490 00007fff22bff3af Disruptor.MultiProducerSequencer.NextInternal(Int32)
    PARAMETERS:
        this (<CLR reg>) = 0x000002cbac10fcf8
        n (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
0:000> ~29e!clrstack -a
OS Thread Id: 0x32e0 (29)
        Child SP               IP Call Site
000000cb4dd3f440 00007fff233ac68b Disruptor.Util.GetMinimumSequence(Disruptor.ISequence[], Int64)
    PARAMETERS:
        sequences (<CLR reg>) = 0x000002cbac358790
        minimum (<CLR reg>) = 0x000000000518aae7
    LOCALS:
        <CLR reg> = 0x0000000000000003
        <no data>

000000cb4dd3f490 00007fff22bff3a0 Disruptor.MultiProducerSequencer.NextInternal(Int32)
    PARAMETERS:
        this (<CLR reg>) = 0x000002cbac10fcf8
        n (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <no data>
        <no data>
        <no data>
        <CLR reg> = 0x000000000518aae8
        <no data>
        <no data>
0:000> ~29e!clrstack -a
OS Thread Id: 0x32e0 (29)
        Child SP               IP Call Site
000000cb4dd3f338 00007fff8c956724 [HelperMethodFrame: 000000cb4dd3f338] System.Threading.Thread.SleepInternal(Int32)
000000cb4dd3f430 00007fff7b3d6e8a System.Threading.Thread.Sleep(Int32)
    PARAMETERS:
        millisecondsTimeout = <no data>

000000cb4dd3f460 00007fff24d55f64 Disruptor.AggressiveSpinWait.SpinOnce()
    PARAMETERS:
        this (<CLR reg>) = 0x000000cb4dd3f4b8

000000cb4dd3f490 00007fff22bff3af Disruptor.MultiProducerSequencer.NextInternal(Int32)
    PARAMETERS:
        this (<CLR reg>) = 0x000002cbac10fcf8
        n (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
0:000> ~29e!clrstack -a
OS Thread Id: 0x32e0 (29)
        Child SP               IP Call Site
000000cb4dd3f3d8 00007fff8c956964 [InlinedCallFrame: 000000cb4dd3f3d8] System.Threading.Thread.YieldInternal()
000000cb4dd3f3d8 00007fff7b4466c8 [InlinedCallFrame: 000000cb4dd3f3d8] System.Threading.Thread.YieldInternal()
000000cb4dd3f3b0 00007fff7b4466c8 DomainNeutralILStubClass.IL_STUB_PInvoke()

000000cb4dd3f460 00007fff24d55f6b Disruptor.AggressiveSpinWait.SpinOnce()
    PARAMETERS:
        this (<CLR reg>) = 0x000000cb4dd3f4b8

000000cb4dd3f490 00007fff22bff3af Disruptor.MultiProducerSequencer.NextInternal(Int32)
    PARAMETERS:
        this (<CLR reg>) = 0x000002cbac10fcf8
        n (<CLR reg>) = 0x0000000000000001
    LOCALS:
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>

I guess SpinOne is called when the queue is full, and this SpinOne takes up CPU time.

Spin waiting when queue is full will take cpu time and slow down the consumer thread.Does this lead to avalanche effect?

It will slow down the producing thread as Next() will be called by the producer thread.

It seems that the ringBuffer is full or the consumer thread is not enough both will cause this problem?

    internal long NextInternal(int n)
    {
        long current;
        long next;

        var spinWait = default(AggressiveSpinWait);
        do
        {
            current = _cursor.Value;
            next = current + n;

            long wrapPoint = next - _bufferSize;
            long cachedGatingSequence = _gatingSequenceCache.Value;

            if (wrapPoint > cachedGatingSequence || cachedGatingSequence > current)
            {
                long gatingSequence = Util.GetMinimumSequence(Volatile.Read(ref _gatingSequences), current);

                if (wrapPoint > gatingSequence)
                {
                    spinWait.SpinOnce();
                    continue;
                }

                _gatingSequenceCache.SetValue(gatingSequence);
            }
            else if (_cursor.CompareAndSet(current, next))
            {
                break;
            }
        } while (true);

        return next;
    }

@RockNHawk

This is an expected behavior of the Disruptor: Next busy spins when the ring buffer is full.

Thread.Sleep(1) is not used because 15.6ms pauses or even 1ms pauses are not acceptable for many low latency use cases. Thread.Sleep(0) does not reduce CPU usage but it allows the thread to give up its time-slice, thus preventing starvation.

If this behavior is an issue in your use case, you can easily use TryNext in a loop an apply your own waiting strategy, for example by using a SpinWait that will invoke Thread.Sleep(1).

Please open a new issue if you have more questions on the subject.

@ocoanet
Thank you!
I will use TryNext and use my own waiting strategy, Thank you!

I know this is by design, maybe Disruptor can consider add wating strategy for Next can make more flexible, anyway Disruptor is good enought.

@RockNHawk

I tried to improve the code comments on sequence claiming methods to make the busy-spinning more explicit: a905e92.

Feel free to open an issue if you are interested in improving the API by adding new sequence claiming methods like Next<TWaiter>().