YotpoLtd/metorikku

feature - Specify Kafka Topic Partitions

Closed this issue · 4 comments

Hello,

Can you please add the ability to be able to specify topic partitions to consume from when the source is kafka?
For example, if there is a kafka topic with 10 partitions, to be able to specify some of the partitions only.

Thank you

Hello,

Why would you like to have such an ability?

Hi Irenez753,

Thank you for getting back to me.
Correct me if I'm wrong, but, the main reason is to be able to achieve parallelism when reading a kafka topic. Splitting up multiple clusters reading from different topic partitions would help with performance tuning.
Can parallel reading be achieved by some other configuration?

Thank you

Hi @Rap70r so consuming from the different partitions is handled by consumer groups feature in kafka.
And parallelism is achieved by using multiple executors, so the more executors you run the more partitions will be consumed by different executors. This is capped by the number of partitions for the topic.

Metorikku is basically using spark below to do this, so nothing special needs to happen.

Thank you for your response. I'm new to these technologies.
I'm running using an emr cluster. How can I run multiple executors at the same time?

Thank you