/UniversalDataCrawler

The release version of DufeDataCrawler

Primary LanguageJava

Distributed Universal Data Crawler

Features

  • Extensible plugin architecture (support runtime add-on)
  • Customized commands for plugins
  • Wired with Kafka and HDFS easily
  • WebUI Panel for controlling
  • Distributed deployment over clusters
  • Enhancement based on WebCollector

Usage

  • Fork the repository and add all libs into your classpath
  • Develop your parsing logic by extends the Plugin class
  • Pack your plugin class as jar file, and add into the plugin path
  • Register your plugin in config file

Note that The project has stopped maintenance