/kafka-connect-github-source

Get a stream of issues and pull requests for your chosen GitHub repository

Primary LanguageJavaMIT LicenseMIT

Learning

This project is a companion repository to the Apache Kafka Connect course on Udemy.

https://links.datacumulus.com/kafka-connect-coupon

Kafka Connect Source GitHub

This connector allows you to get a stream of issues and pull requests from your GitHub repository, using the GitHub Api: https://developer.github.com/v3/issues/#list-issues-for-a-repository

Issues are pulled based on updated_at field, meaning any update to an issue or pull request will appear in the stream.

The connector writes to topic that is great candidate to demonstrate log compaction. It's also a fun way to automate your GitHub workflow.

It's finally aimed to be an educative example to demonstrate how to write a Source Connector a little less trivial than the FileStreamSourceConnector provided in Kafka.

Contributing

This connector is not perfect and can be improved, please feel free to submit any PR you deem useful.

Configuration

name=GitHubSourceConnectorDemo
tasks.max=1
connector.class=com.simplesteph.kafka.GitHubSourceConnector
topic=github-issues
github.owner=kubernetes
github.repo=kubernetes
since.timestamp=2017-01-01T00:00:00Z
# I heavily recommend you set those two fields:
auth.username=your_username
auth.password=your_password

Running in development

Note: Java 8 is required for this connector. Make sure config/worker.properties is configured to wherever your kafka cluster is

./build.sh
./run.sh 

The simplest way to run run.sh is to have docker installed. It will pull a Dockerfile and run the connector in standalone mode above it.

Deploying

Note: Java 8 is required for this connector.

TODO