gerritjvv/kafka-fast

What are the advantages over native kafka classes?

Closed this issue · 7 comments

Perhaps you could expound on the motivation a bit in the README?

I'll add some of my views.
this was created before the new kafka connector came out, and was then more performant (I've not tested this against the new connector).

For clojure devs the advantage is having 100% clojure data structures.
For Java or Clojure def the most advantageous feature is better scaling on the partitions i.e the consumers of kafka-fast are not limited to a one partition one consumer restriction as with the standard consumer.

Interesting.

I can see where this could be nice on the read side. 

One of the primary problems with kafka is that the api, even the async api, is fundamentally blocking. With async producers it spins up its own thread you can’t control so it doesn’t play that nicely in a full async environment. For instance it would be nice to have akka actors send to kafka and dynamically allocate the number of threads necessary based upon load with routers. However when you have many producer actors each of them gets their own thread for the kafka producer instead of relying on the underlying concurrency of the actors. I was wondering if your implementation is able to sidestep this limitation or if it requires changes from the kafka core devs to implement correctly. I am coming at this from a Scala bias if you can’t tell already haha.

Ryan Braley  |  Founder 
http://traintracks.io/
US: +1 (206) 866 5661

CN: +86 185 1129 5661

Coding the future. Decoding the game.

On Tue, Oct 28, 2014 at 4:14 PM, Gerrit notifications@github.com wrote:

I'll add some of my views.
this was created before the new kafka connector came out, and was then more performant (I've not tested this against the new connector).
For clojure devs the advantage is having 100% clojure data structures.

For Java or Clojure def the most advantageous feature is better scaling on the partitions i.e the consumers of kafka-fast are not limited to a one partition one consumer restriction as with the standard consumer.

Reply to this email directly or view it on GitHub:
#4 (comment)

yup saw the scala.

kafka-fast uses clojure.core.async channels, and handles the threads in the background. Obviously threads are created and all IO is done in threads, but the fundamental data structure you interface with kafka-fast when sending or receiving messages is the clojure.core.async Channel.

at the moment I use the blocking >!! call, but in practice you could use the channel directly in a go block and have full async non blocking code. If your sending from Actors then you can have as many of them as you want, and share a single kafka-fast instance between them, all calls will to to a Channel, which are buffered and then sent.

to expand on this, in terms of how many threads:
One connection is opened per kafka broker, no matter how many topics or partitions you have.
Each connection has N number of threads that more or less remain constant during the application runtime.
I've done this to make the producer more efficient in memory and it works fast also. Scaling is done via buffering/batching messages.

If I am reading this correctly that sounds beautiful. Does that mean I can have each instance of the actor own their own kafka-fast producer and it will play nice? It sounds like I would need to wrap it in clojure though seeing as you are mentioning modifying it in go blocks. 

Actually reading it a second time it seems you are suggesting sharing an instance between all my actors instead of giving each actor their own. That could work, but I would need to read up on how clojure handles channels and whether it would be resilient to clustering and the like. Still seems silly that there is no direct akka api for kafka. It is written in Scala and akka actors already have mailboxes for buffering messages on their own without need for separate Channels. We could also use nice stuff like reactive-streams for back pressure. Still async channels seem an improvement over hardcoding a thread since it means we can multiplex onto it easily.

What is the back pressure story for kafka-fast?

Also did you reimplement the kafka protocol in Clojure or did you piggyback on the upstream versions?

Ryan Braley  |  Founder 
http://traintracks.io/
US: +1 (206) 866 5661

CN: +86 185 1129 5661

Coding the future. Decoding the game.

On Tue, Oct 28, 2014 at 4:56 PM, Gerrit notifications@github.com wrote:

yup saw the scala.
kafka-fast uses clojure.core.async channels, and handles the threads in the background. Obviously threads are created and all IO is done in threads, but the fundamental data structure you interface with kafka-fast when sending or receiving messages is the clojure.core.async Channel.

at the moment I use the blocking >!! call, but in practice you could use the channel directly in a go block and have full async non blocking code. If your sending from Actors then you can have as many of them as you want, and share a single kafka-fast instance between them, all calls will to to a Channel, which are buffered and then sent.

Reply to this email directly or view it on GitHub:
#4 (comment)

I think the Java api from the actors would work just fine, the channels are buffered and will only block if they are full, this is the same behaviour you get when using Akka Actors with a bounded mail box (http://stackoverflow.com/questions/16500352/how-to-use-akka-boundedmailbox-to-throttle-a-producer).

To translate the kafka-fast producer to Akka terms, you can think of the producer as an Actor, its mail box is a bounded mail box implemented by the clojure.core.async Channel. Sending messages to the mail box will block only if the full, thus applying back pressure.

I implemented the kafka protocol 100% in clojure + netty, didn't want to write yet another wrapper.

Good on ya. I may need to fold this into our project. Looking forward to your benchmark comparison. Thanks for putting this out there.

Ryan Braley  |  Founder 
http://traintracks.io/
US: +1 (206) 866 5661

CN: +86 185 1129 5661

Coding the future. Decoding the game.

On Tue, Oct 28, 2014 at 5:22 PM, Gerrit notifications@github.com wrote:

I think the Java api from the actors would work just fine, the channels are buffered and will only block if they are full, this is the same behaviour you get when using Akka Actors with a bounded mail box (http://stackoverflow.com/questions/16500352/how-to-use-akka-boundedmailbox-to-throttle-a-producer).
To translate the kafka-fast producer to Akka terms, you can think of the producer as an Actor, its mail box is a bounded mail box implemented by the clojure.core.async Channel. Sending messages to the mail box will block only if the full, thus applying back pressure.

I implemented the kafka protocol 100% in clojure + netty, didn't want to write yet another wrapper.

Reply to this email directly or view it on GitHub:
#4 (comment)