clbq aka ConcurrentLinkedBlockingQueue
ConcurrentLinkedBlockingQueue is an experimental BlockingQueue implementation built on top of ConcurrentLinkedQueue for scenarios that require low-latency handoff, at the cost of increased overhead when consumers transition between active and blocked (waiting) states. This overhead may or may not be amortized by the lower overall latency.
I am not the original author of this code; I merely saved it from the net some time ago and merged the unbounded & bounded implementation into a single class.
USAGE
-
Call ConcurrentLinkedBlockingQueue() with or without capacity, just like LBQ. A capacity obviously implies bounded size.
-
The main benchmark/test driver is "QueueTest", which in turn will run separate drivers for different individual queue implementations. Simply run it without parameters and it tries to do some rule-of-thumb thread pool/producer/consumer auto-sizing.
-
Alternatively pass "numConsumer=x", "numProducer=y" and optionally "capacity=z" properties to see the performance tradeoffs. With more producers than (one or two) consumers the performance should always be better than LinkedBlockingQueue; this takes a turn for the worse as more consumers are added.
-
The individual drivers can also be run on their own.
IDEAS
As stated above this is an experiment in providing a lower-latency alternative to LBQ for certain scenarios. The impact of "user-level" (aka nonnative) blocking/signaling might still be reduced further, so if anybody wants to explore these ideas further, please feel free:
-
Consumer state transition (active -> potentially-blocking) currently implies a necessary second hit into the internal CLQ, which (as far as I can tell) is responsible for the performance impact with multiple consumers, probably due to excessive CAS. I'd be curious to hear about alternative approaches.
-
I'm looking for reduced allocation of ThreadMarkers. One possible way might be to use ThreadLocals and merely flip a signal, but I have not yet thought that through (I generally dislike ThreadLocals unless I can also control any interacting threads).
-
Instead of crapping on the console write to CSV so that the results can be more easily graphed/compared.
Patches & further thoughts welcome!