TwitterkafkaConnect

This is the sample project which can be used to make sure we are able to connect Kafka and Java code.

This Project will have sample codes to :

Connect JAVA application to Twitter API to recieve tweets.
Will Push the code to Kafka Topics - twitter_tweets.
Will Consume the data from the Kafka topic.
Push the data into Bonsai search(Elastic Search).

#Partition Count and Replication factor 'Try to Plan the config in the beginning itself.'

##More partition implies >> Better parallelism and better Throughput, Ability to run more consumers.

Small Cluster (< 6 Brokers) == Number of partition = Number of Brokers X 2.
Big Cluster (> 12 Brokers) == NUmber of partition = Number of Brokers X 1

Replication factor

Minimum 2, Usually 3, Maximum 4 Never set to 1

Important Details

These detais will have common problems and solutions

###There are 4 major use cases of Kafka:

Data source to Kafka -- For this we can use Kafka Connect APIs
Kafka to Kafka -- For this we can use Kafka Streams API
Kafka to Sink -- For this we can use Kafka Connect Sink API
Kafka to App -- Same as kafka Connect

Kafka connect

Kafka Connect architecture is as below :

To show the example of kafka Connect API, we are using https://github.com/jcustenborder/kafka-connect-twitter

He has already written a connector, we will use this to read the data from twitter. All Connectors are under kafka-connect Directory. We need to Run the connect profile and will do the job for us :).

#Kafka Streams Kafka Streams is a java library that provides easy data processing and transformation within Kafka.

Check the module : kafka-stream-filter-tweets for more information on Kafka Streams.

#Schema Registry Schema registry should be a seperate component from kafka The producers and Consumers should be able to talk to these schema registries. The Schema registry must be able to reject bad data. A common data format must be agreed upon.

Check confluent Schema registry.