kafka-connect-adl is kafka connect sink connector designed to be used to copy data between Kafka and azure data lake.
kafka-connect-adl is used normally like other connectors.
For using it Clone this repo on your system.Ensure maven is installed on your system for building it.
Go to root directory of the project and run
mvn clean install
In the target folder you will see kafka-connect-adl-4.1.1-package directory which contains the connector jars in the way kafka maintains it.
Copy the contents of this directory to KAFKA_HOME or KAFKA_BROKER_HOME .
Note If you are using confluent copy the contents in CONFLUENT_HOME directory.
To verify you have correctly installed the connector run
curl http//:<<KAFKA_BROKER_HOST>>:8083/connector/connector-plugins
Note Replace KAFKA_BROKER_HOST with the IP_ADDRESS of the host.
You will see jk.bigdata.tech.AdlSinkConnector listed in the output.
For running the connector you need to prepare a configuration file for kafka-connect. You can look out for sample configuration files in config folder.
- adl.data.lake.name=<< AZURE DATA LAKE NAME >>.azuredatalakestore.net
- adl.client.id=<< AZURE ACTIVE DIRECTORY CLIENT ID >>
- adl.client.secret=<< AZURE ACTIVE DIRECTORY APP KEY >>
- adl.token.endpoint=<< AZURE ACTIVE DIRECTORY TOKEN ENDPOINT >>
These are configuration parameters most importantly required by kafka-connect-adl connector.
For creating Azure data lake refer here
For generating client ID and secret and getting token endpoint refer here
After filling in the values in the configuration file.Run this
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" --data @<<FILE_NAME>>.json http://<<KAFKA_BROKER_HOST>>:8083/connectors
Congratulation your connector is running.Check its status using
curl http://<<KAFKA_BROKER_HOST>>:8083/connectors/<<CONNECTOR_NAME_IN_CONFIGURATION_FILE>>/status
- Issue Tracker: https://github.com/jksinghpro/kafka-connect-adl/issues
The project is licensed under the Apache 2 license.