/kafka-connect-hackernews

A Kafka Connector to read items from Hacker News and stream it into Kafka. Because why not ¯\_(ツ)_/¯

Primary LanguageJavaDo What The F*ck You Want To Public LicenseWTFPL

kafka-connect-hackernews

A Kafka Connector to read items from Hacker News and stream it into Kafka. Because why not ¯\_(ツ)_/¯

Overview

This source connector reads items (stories, comments, jobs, Ask HNs, polls) from Hacker News via https://github.com/HackerNews/API. Items are read serially starting from initial.start.item (defaults to 1). Currently, only a single connector task is supported.

Installation

Run mvn clean package from the repo's root and then copy and unzip the zip archive created in target/components/packages/ to any directory on your Connect worker's plugin path.

Configuration

These are the supported configs :-

Name Description Type Importance
kafka.topic Topic to write to String High
poll.interval.ms Interval between polls (ms) Long High
initial.start.item Hacker News item id to start reading from Long Medium
max.items Maximum number of items to read from Hacker News or less than 1 for unlimited Long Medium

An example config for this connector :-

{
  "name": "HN",
  "connector.class": "com.github.yashmayya.kafka.connect.hackernews.HackerNewsSourceConnector",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "value.converter.schemas.enable": "false",
  "kafka.topic": "hn-items",
  "poll.interval.ms": "100",
  "initial.start.item": "1"
}

TODO

  • Implement offset tracking and recovery
  • Support dynamic reloading of max item id so that the connector can run forever
  • Add support for schemas