Usually it takes too long to prepare the environment every time we need to study something related to Kafka, basically because if you want to run tests in a environment with the minimum requirements for high availability (HA) you need at least 3 servers for zookeeper and 3 servers for kafka, all to ensure HA and avoid split brain.
If you are new to kafka, I strongly recommend going through kafka documentation, you can also take a look in this course, which is basic but extremely helpful.
To make it pratical, I also prepared the zookeeper and kafka to be both managed by systemd, which means you can manage the services like this:
$ sudo systemctl start zookeeper
$ sudo systemctl stop zookeeper
$ sudo systemctl restart zookeeper
$ sudo systemctl start kafka
$ sudo systemctl stop kafka
$ sudo systemctl restart kafka
I'm assuming that you are an experienced GNU/Linux user, and that you have ansible, vagrant and virtualbox installed on your machine. As a host, I use fedora, but you can choose whatever GNU/Linux distro you prefer.
- Create and boot virtualbox instances
- Provisioning of hosts with all necessary basic tools
- Provisioning of a cluster for zookeeper
- Provisioning of a cluster for kafka
- SSH key generation to access your hosts through vagrant commands
😱
$ tree provisioning/
provisioning/
├── kafka-playbook.yml
├── roles
│ ├── general
│ │ ├── handlers
│ │ │ ├── main.yml
│ │ │ └── restart-mdns.yml
│ │ └── tasks
│ │ ├── main.yml
│ │ ├── packages.yml
│ │ └── security.yml
│ ├── kafka
│ │ ├── files
│ │ │ └── kafka.service
│ │ ├── handlers
│ │ │ ├── main.yml
│ │ │ └── restart-kafka.yml
│ │ ├── tasks
│ │ │ ├── create-configuration.yml
│ │ │ ├── create-npa.yml
│ │ │ ├── install-kafka.yml
│ │ │ ├── main.yml
│ │ │ └── manage-service.yml
│ │ ├── templates
│ │ │ └── server.properties.j2
│ │ └── vars
│ │ └── main.yml
│ └── zookeeper
│ ├── files
│ │ └── zookeeper-3.4.10.tar.gz
│ ├── handlers
│ │ ├── main.yml
│ │ └── restart-zookeeper.yml
│ ├── tasks
│ │ ├── create-configuration.yml
│ │ ├── create-npa.yml
│ │ ├── install-zookeeper.yml
│ │ ├── main.yml
│ │ └── manage-service.yml
│ ├── templates
│ │ ├── zoo.cfg.j2
│ │ └── zookeeper.service.j2
│ └── vars
│ └── main.yml
└── zookeeper-playbook.yml
16 directories, 28 files
After clone my repo, navigate to the project folder and update the lines 3 and 4 from Vagrantfile to match your main network (the one with internet access) then, use vagrant to provision your own environment.
$ git clone https://github.com/fabiogoma/kafka-ansible-virtualbox.git
$ cd kafka-ansible-virtualbox
$ vi Vagrantfile
...
#Replace eno1 with the apropriate name for your host main interface
config.vm.network 'public_network', bridge: "eno1"
...
$ vagrant up
In a few minutes you should have have access to a cluster containing 3 hosts for zookeeper and 3 hosts for kafka, if you need more hosts you can change the host variables (zookeeper_boxes and kafka_boxes) on Vagrantfile.
As soon as you have your topology up and running, download the kafka binaries to your desktop.
You need kafka binaries on your host to execute a basic test and make sure the topology is working so the tarball need to be exploded to be used.
$ wget http://apache.mirror.triple-it.nl/kafka/0.10.2.1/kafka_2.12-0.10.2.1.tgz
$ tar -xvzf kafka_2.12-0.10.2.1.tgz
$ cd kafka_2.12-0.10.2.1/bin
From now on we're gonna use 2 terminals, I recommend tilix (former terminix) to make it more productive
Be careful with the sequence, because here it matters:
- Create topic (First terminal)
- Connect to the topic and wait for new messages (Second terminal)
- Send some test messages (First terminal)
./kafka-topics.sh --create \
--topic MyTopic \
--replication-factor 1 \
--partitions 1 \
--zookeeper zookeeper1.local:2181,zookeeper2.local:2181,zookeeper3.local:2181
./kafka-console-consumer.sh --bootstrap-server kafka1.local:9092,kafka2.local:9092,kafka3.local:9092 \
--topic MyTopic
./kafka-console-producer.sh --broker-list kafka1.local:9092,kafka2.local:9092,kafka3.local:9092 \
--topic MyTopic
Here, type some "Hello World" messages, hit enter and monitor the second terminal.
📙 Keep in mind that this is my personal laboratory, you can prepare your production environment following this steps, but make sure you know what you're doing.